Student, School of Information Technology and Engineering, Kazakh-British Technical University, Kazakhstan, Almaty
HIERARCHICAL TWO-STAGE SYSTEM FOR ANIMAL BREED RECOGNITION
ABSTRACT
The current research proposes a two-stage hierarchical classification system using deep learning models to classify various types of animal breeds. In this classification system an animal will be classified into type (cat, dog, or horse) in the first classification stage and then classified into breed in the second stage based on the type classification. The two classification stages used YOLOv8 classification models. Custom datasets containing various animal breeds were employed to conduct a study to investigate both animal and breed classification results - the classification results for animal types yielded an accuracy of 99.38% and the classification results for breed types yielded accuracy up to 95.95% depending on the types with lower breed classification accuracies produced for visually similar classes. The findings demonstrate that classifying animal breeds hierarchically reduces the complexity of classification and improves performance in classifying multiple breeds of animals.
АННОТАЦИЯ
В данной статье представлена иерархическая система распознавания животных и пород на основе моделей глубокого обучения. Предлагаемый метод разделяет задачу классификации на два этапа: классификацию животных и классификацию пород, где на первом этапе определяется вид животного (кошка, собака или лошадь), а на втором - прогнозируется порода в пределах выбранной категории. Система реализована с использованием классификационных моделей YOLOv8 и оценивается на предоставленных пользователями наборах данных, содержащих различные породы животных. Модель классификации животных достигает точности 99,38%, в то время как модели классификации пород достигают до 95,95% в зависимости от категории, при этом более низкая производительность наблюдается для визуально схожих классов. Результаты показывают, что иерархический подход снижает сложность классификации и улучшает производительность при решении задач распознавания животных нескольких видов.
Keywords: hierarchical classification, animal recognition, breed classification, YOLOv8, deep learning, computer vision.
Ключевые слова: иерархическая классификация, распознавание животных, классификация пород, YOLOv8, глубокое обучение, компьютерное зрение.
Introduction
Animal identification is a key component of ecological monitoring and computer vision methodologies. Camera trapping systems generate numerous photograph datasets requiring annotation. Analyzing this dataset manually consumes considerable time and requires specialist expertise, while the results may include bias and inconsistency when completed by humans [12]. Early methodologies relied on manually designed features for animal recognition, such as color and shape [4]. Deep learning methods eliminate this step by learning features directly from raw data while improving classification performance. These models require large datasets and significant computational power. Human re-identification accuracy can reach approximately 99%; however, for animal tasks, performance is lower due to pose variation, lighting conditions, and limited labeled data [12].
Fine-grained classification adds complexity due to subtle visual differences among animal breeds. Models must differentiate between similar visual features, while intra-class variation remains high in complex scenarios [11]. Object detection methods focus on localization of objects in images. Fast R-CNN uses region-based processing, while SSD and YOLO-based models such as YOLO and YOLO9000 perform detection in a single stage [1, 2, 5, 6]. These methods achieve high speed and accuracy, and recent architectures such as YOLOv7 further improve performance [10], but they are not optimized for fine-grained classification.
Recent works apply deep learning to camera trap data, enabling automatic species identification from large datasets [7, 8]. However, these approaches still face challenges in fine-grained recognition.
This work proposes a hierarchical classification system based on YOLOv8 models [3], where the first stage predicts the animal category and the second stage applies a specialized breed classifier. Unlike existing approaches, this study applies YOLOv8 in classification mode within a hierarchical pipeline for fine-grained animal breed recognition. The novelty lies in the use of lightweight YOLOv8n-cls models for multi-stage classification under limited data conditions, where each stage is optimized for a reduced set of classes.
Materials and Methods
-
- Dataset
The database utilized in this study was comprised of three animal classes (cat, dog, horse) and was obtained from publicly available Kaggle datasets for animal and breed classification [13–15]. It consisted of 1,254 cat images, 397 dog images, and 294 horse images in the training dataset, and 313 cat images, 100 dog images, and 74 horse images in the validation dataset. Each breed-level subset follows a similar structure with three breeds per category. The cat dataset includes Bengal, British Shorthair, and Russian Blue [14], the dog dataset includes Beagle, Doberman, and Rottweiler [15], and the horse dataset includes Akhal-Teke, Arabian, and Friesian [13]. The cat breeds contain approximately 400 images per class, while the dog and horse breeds are more constrained, with around 100–150 images per class.
-
- Data Preprocessing
All images were resized to 224×224 pixels before training the YOLOv8 classification model, and pixel values were normalized automatically during training.
Data augmentation was applied using built-in methods from the Ultralytics framework [3]. It included horizontal flipping and small geometric transformations, and was applied only to the training dataset.
-
- Model Architecture and Inference Pipeline
As illustrated in Figure 1. the two-tiered hierarchical classification framework is presented. Process-wise, each input image is processed through a sequential pipeline, with the outcomes from the first tier being used to dictate whether or not to execute the second tier. Based on the input image, a classifier will classify that image as one of the following: cat, dog, or horse. As a result of this analysis, a coarse classification separates distinct animal types and reduces the overall complexity of the classification system. The outcome of the first tier classification is used as a decision variable that determines the processing path for the remainder of the classification.
/Oraz.files/image001.png)
Figure 1. Two-stage hierarchical classification pipeline
In the first stage, rather than scanning every possible class, the animal identification system first narrows the selection down to a single specific class based on the predicted group. Using a classifier from the selected class, the model first filters out irrelevant options and focuses only on breeds that match the input image.
In the second stage, the specific breed is determined. Since each branch is trained on its own dataset, this advantage allows it to recognize distinctive features such as texture, shape, and patterns. For both steps, the system uses only the original input images without any additional transformations, yielding the final breed label.
Thus, this complex classification task is simplified into two stages. First, the animal type is isolated, and then classification occurs within that subset. In this way, the model prevents the mixing of unrelated categories, which improves overall accuracy.
-
- Models
The classification mode of the YOLOv8 architecture [3] was utilized throughout all the models developed in this research. The models were developed using the PyTorch framework [9]. All models use the YOLOv8n-cls (lightweight) architecture.
The models were trained using RGB images resized to 224×224 pixels, for consistent input throughout each experiment; all models used the same configuration.
Four independent models were trained: one for animal classification and three for breed classification. Each model was trained separately on its corresponding dataset with no parameter sharing. The reason for using separate models is to give each classifier the ability to learn a smaller number of categories and to enhance the discriminative characteristics of an image when recognizing objects in fine detail under limited data conditions.
Results and Discussion
The results of the proposed system are presented in Table 1. The animal classification model achieved an accuracy of 99.38% with a validation loss of 0.01198, indicating reliable separation of the three animal categories.
Table 1.
Model performance results
|
Stage |
Task |
Accuracy (%) |
Loss |
Inference (ms) |
|
Stage 1 |
Animal classification |
99.38 |
0.01198 |
8 |
|
Stage 2 |
Cat breeds |
83.71 |
0.39092 |
6 |
|
Stage 2 |
Dog breeds |
94.00 |
0.19561 |
5 |
|
Stage 2 |
Horse breeds |
95.95 |
0.07973 |
5 |
The confusion matrix for animal classification is shown in Figure 2. The model correctly classifies most samples, with values close to 1.00 along the diagonal. A small confusion (0.02) is observed between dog and horse classes, indicating high but not perfect separability.
/Oraz.files/image002.jpg)
Figure 2. Normalized confusion matrix for animal classification
In contrast, the baseline single-stage model performed classification over all the classes at once using the YOLOv8n-cls configuration, achieving an accuracy of 88.89%, which is 10.49 percentage points lower than the hierarchical approach. This finding indicates greater inter-class confusion in the single-stage setting, especially for visually similar breeds.
In addition, there are differences in model performance across the different breed classifications. The classifiers for dogs and horses obtained accuracy values of 94.00% and 95.95%, respectively, while the classifier for cats achieved an accuracy value of 83.71%. These differences show that fine-grained classification of visually similar classes is more difficult.
Training dynamics are illustrated in Figure 3. The training loss decreases sharply during the early epochs and stabilizes below 0.05, while validation loss shows a similar trend. The accuracy reaches values close to 99.5% and remains stable, indicating that the model converges without overfitting.
/Oraz.files/image003.png)
Figure 3. Training and validation metrics of the animal classification model
The classification accuracy of cats is reduced due to the fact that many breeds of cats, such as the British Shorthair and the Russian Blue, exhibit similar color and texture features. Conversely, the selected dog and horse breeds have more distinct visual characteristics, which allows for easier classification.
Example inference results are presented in Table 2 and Figure 4. In the first step of classification, the system consistently infers the correct animal class and selects a breed classifier using high confidence values (≈0.99) for all instances tested.
Table 2.
Example inference results
|
Image |
Stage 1 (Animal) |
Stage 2 (Breed) |
Confidence |
|
Bengal-Cat.jpg |
Cat |
Bengal |
0.99 |
|
Doberman-Dog.jpg |
Dog |
Doberman |
0.99 |
|
Arabian-Horse.jpg |
Horse |
Arabian |
0.99 |
/Oraz.files/image004.jpg)
Figure 4. Example predictions of the proposed system: (a–b) cat breeds, (c–d) dog breeds, and (e–f) horse breeds
Conclusion
Using two tiers of classification based on YOLOv8, a hierarchical classification system is proposed to recognize animal breeds. The method decomposes the problem into two tasks (animal and breed) and creates two separate models that facilitate recognition across coarse and fine-grained levels. The data analysis showed that the prediction models performed well, with 99.38% accuracy for animal classification and up to 95.95% depending on the breed category; lower breed classification accuracy (e.g., 83.71% for cat breeds) is due to high similarity among classes such as British Shorthair and Russian Blue.
Additionally, the confusion matrix supports that a high level of separability exists between different animal categories, with only minor errors occurring between visually similar classes. Finally, the results showed that the hierarchical classification approach produced more accurate and stable predictions than a single-stage baseline model (88.89%), as reducing the number of candidate classes at each stage improves classification performance. Thus, the hierarchical classification system can be extended to larger datasets with additional animal categories.
References:
- Redmon J., et al. You Only Look Once: Unified, Real-Time Object Detection // Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR). – 2016. – P. 779–788.
- Redmon J., Farhadi A. YOLO9000: Better, Faster, Stronger // Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR). – 2017. – P. 7263–7271.
- Ultralytics. YOLOv8 Documentation and Source Code // GitHub repository. – Available: https://github.com/ultralytics/ultralytics (accessed: 15.03.2026)
- LeCun Y., Bengio Y., Hinton G. Deep Learning // Nature. – 2015. – Vol. 521. – P. 436–444.
- Girshick R. Fast R-CNN // Proc. IEEE Int. Conf. Comput. Vis. (ICCV). – 2015. – P. 1440–1448.
- Liu W., et al. SSD: Single Shot MultiBox Detector // Eur. Conf. Comput. Vis. (ECCV). – 2016. – P. 21–37.
- Norouzzadeh M. S., et al. Automatically identifying wild animals in camera-trap images // Proc. Natl. Acad. Sci. – 2018. – Vol. 115. – P. E5716–E5725.
- Tabak M. A., et al. Machine learning to classify animal species in camera trap images // Methods Ecol. Evol. – 2019. – Vol. 10. – P. 585–590.
- Paszke A., et al. PyTorch: High-Performance Deep Learning Library // Adv. Neural Inf. Process. Syst. (NeurIPS). – 2019. – Vol. 32.
- Wang C.-Y., et al. YOLOv7: Trainable Bag-of-Freebies for Real-Time Object Detection // arXiv preprint arXiv:2207.02696. – 2022.
- Zhang Y., et al. Hyper-class Augmented Deep Learning for Fine-Grained Classification // 2020.
- Schneider S., et al. Deep Learning Object Detection for Camera Trap Data: A Review // 2020.
- Horse Breeds Dataset // Kaggle. – Available: https://www.kaggle.com/datasets/olgabelitskaya/horse-breeds (accessed: 15.03.2026)
- Cat Breeds Dataset // Kaggle. – Available: https://www.kaggle.com/datasets/doctrinek/catbreedsrefined-7k (accessed: 15.03.2026)
- Stanford Dogs Dataset // Kaggle. – Available: https://www.kaggle.com/datasets/jessicali9530/stanford-dogs-dataset (accessed: 15.03.2026)