POSE-INVARIANT FACE RECOGNITION WITH TEMPLATE ADOPTION

РАЗРАБОТКА СИСТЕМЫ РАСПОЗНАВАНИЯ ЛИЦ С ИНВАРИАНТНОСТЬЮ К ПОЗЕ НА ОСНОВЕ АДАПТАЦИИ ШАБЛОНА

Ashkeyev A. Abdiakhmetova Z.

29.05.2025 360

5(134)

10. Информатика, вычислительная техника и управление

Цитировать:

Ashkeyev A., Abdiakhmetova Z. POSE-INVARIANT FACE RECOGNITION WITH TEMPLATE ADOPTION // Universum: технические науки : электрон. научн. журн. 2025. 5(134). URL: https://7universum.com/ru/tech/archive/item/20135 (дата обращения: 08.03.2026).

Прочитать статью:

DOI - 10.32743/UniTech.2025.134.5.20135

ABSTRACT

Face recognition systems have become increasingly important in various applications, ranging from security to personal identification. Traditional face recognition systems based on one-to-one face verification have restrictions with variations in pose, leading to reduced accuracy in real-world scenarios. This research work proposes a Pose Invariant Face Recognition system that addresses this limitation. The core focus of this work is to develop a recognition system that remains robust and consistent despite alterations in facial pose, particularly within crowded frames. The proposed methodology leverages template adoption to enhance the resilience of the recognition system. By adopting templates that encapsulate diverse pose variations, the system becomes adept at accommodating a broader array of pose scenarios. Ultimately, this research presents a practical solution for pose-invariant face recognition in crowded environments. Experimental evaluation on different datasets demonstrates the significant impact of them on the accuracy of the system.

АННОТАЦИЯ

Важность системы распознавания лиц растет с каждым днем в различных приложениях от безопасности до идентификации личности. Ограничения в распознавании лиц возникают в традиционных системах, где людям нужно поместить лицо в специальные рамки и быть неподвижны. Такой подход приводит к снижению точности в сценариях повседневной жизни. Для борьбы с этими ограничениями была представлена данная работа, которая позволяет быть независимой от расположения лица системе распознавания. При работе было уделено большое внимание на надежность и последовательность системы в условиях изменения расположения лица, особенно в многолюдных кадрах. Методология адаптирование шаблонов, используемая в данной работе, оказывает влияние на повышение устойчивости системы, так как адаптация шаблонов охватывает различные варианты расположения лиц, что приводит к приспособленности системы к более широкому спектру сценариев. Результатом данного исследования является практическое решение для распознания, не зависящего от расположения лица, в условиях многолюдных окружений. Различные наборы данных значительно влияют на точность системы.

Keywords: Face Recognition, Computer Vision, Pose-invariant, Template Adoption.

Ключевые слова: Распознавание лиц, компьютерное зрение, инвариантность позы, адаптация шаблона.

Introduction

The usage of face recognition systems is wide; it includes applications like security, access control, and identity verification. They are based on comparing of the person facial template with a database of known faces to find a match.

This research seeks to address the challenge of enabling recognition systems to process multi-person face features simultaneously and to enhance the accuracy of pose-invariant face recognition (PIFR). By resolving these issues, the technology could reach its full potential, particularly in real-world applications that rely on passive biometric technology. This advancement is crucial in refining the precision and reliability of face recognition systems in diverse and dynamic environments, thereby contributing to enhanced security and identification processes. Most current approaches are based on one-to-one face verification. Such a system becomes very constrained and not practically usable in many areas when a face recognition system is needed. For crowded places such as city streets, airports, or even shopping malls it is crucial to work with a lot of face features. The International Air Transport Association's (IATA) states that about 75 percent of passengers now favor biometrics [2] over traditional passports and boarding passes. This preference reflects a growing trend where travelers are more inclined to share their personal information, especially when it leads to a more seamless and enhanced travel experience. Pose invariant face recognition is significant in criminal investigations because it can help law enforcement agencies identify suspects who may be trying to avoid detection by changing their facial pose or angle. Criminals often try to hide their identity by changing their appearance, including their facial pose. Traditional face recognition algorithms may not be able to identify individuals if the face is in a different pose or angle than the reference images in the database. One of the important events is the content at Washington Dulles International Airport in August 2018 where facial recognition technology was used for the first time [7].

The motivation for this research emanates from the urgent necessity to adapt and evolve face recognition technology to meet the challenges posed by real-world, dynamic environments, where conventional methods may falter. Being able to recognize many faces in different poses at the same time is more than just a technical improvement, it is an important step toward making places safer in our closely connected world.

Face Detection

Face detection is the initial step in a face recognition system. At this stage, the system aims to locate and identify the presence of a face in a frame. The primary objective of this step is to determine whether or not a face is present in the provided input. Once a face is detected, the system can proceed with further analysis, such as feature extraction and recognition. This process involves identifying the position and boundaries of the face within the image, effectively isolating it from the background for subsequent processing stages. There are several well-known methods for face detection, including Haar Cascade Classifiers 14], Histogram of Oriented Gradients (HOG)[15], Convolutional Neural Networks (CNNs) [16], RetinaNet [18], and Multi-Task Convolutional Neural Networks (MTCNN)[19]. Each of these approaches has unique features and strengths, contributing to advancements in the field of face detection. Unlike other research works Pose-Invariant Face Recognition via Facial Landmark Based Ensemble Learning [3] is about Pose-Invariant Face Recognition while other works are about frontal face recognition. PIFR methods can be grouped into 4 categories: pose-robust feature extraction approaches, multi-view subspace learning [24] approaches, face synthesis approaches, and hybrid approaches [20]. In [10, 12, 13, 24] researchers compare these categories and analyze performance in each of them.

The usage of an inbuilt pyramidal hierarchy [22] present in a DCNN [21], instead of creating an image pyramid for face detection at different scales that reduces the processing time.

Face Verification and recognition

Metric-learning approach. It proposed a face recognition system that uses deep metric learning with FaceMaskNet-21 for masked face recognition [1]. Most metric learning methods are based on point-to-point distance metrics which can not effectively exploit the reconstruction residual information(RRI) of training samples to learn the distance metric. The authors of the research[9] address the challenge of recognizing faces with occlusions, such as masks, by using information reconstruction techniques.

Template based approach. Traditionally most face recognition systems focused on one-to-one face verification. However, the IJB-A face recognition dataset unifies the evaluation of one-to-many face verification. Authors of research work [23] made template adoptions, a form of transfer learning to the set of media in a template combining deep convolutional network features with template-specific linear SVMs.

Feature learning approach. Masked face recognition differs from traditional face recognition not only in part of data preparation and face detection. In the verification process, it would identify that the masked and not masked person is the same. For such a problem in Masked Face Recognition with Latent Part Detection [4] it was implemented by two branches of CNN. The first is for discriminative global feature learning and the partial branch is for latent part detection and discriminative partial feature learning.

Despite significant progress in PIFR technology, there are still a lot of things for further improvement, and the effectiveness of current approaches must be assessed through testing on actual real-world databases. PIFR algorithms ought to be able to function independently without any requirement for manual annotation of facial landmarks. It should be able to account for a wide range of potential variations in facial pose that could occur in the image, encompassing yaw, pitch, and combinations of both. PIFR algorithms must possess the capability to recognize an individual's non-frontal face using only a single gallery image per person. This is the most difficult, yet frequent scenario encountered in real-world applications.

Earlier facial recognition techniques that relied on deep networks utilized a classification layer trained on a predetermined set of known facial identities, and subsequently leveraged an intermediate bottleneck layer as a representation to facilitate recognition beyond the original training set [11]. However, this approach is indirect and inefficient, as it relies on the hope that the bottleneck representation will be effective in generalizing to new faces. Moreover, since this method employs a bottleneck layer, the representation size required per face is typically quite large. Template-based approaches often have difficulty in handling large datasets.

DCNN systems are typically trained on a specific dataset and are effective for similar testing sets. However, networks trained on one domain may not perform optimally for others. At present, training convolutional neural networks (CNNs) is a time-consuming process that can take days. Consequently, there is a requirement for the development of more efficient architectures and implementations of CNNs that can be trained more quickly.

Above it described different approaches and technologies for face recognition systems. Also, it described the cons of reviewed approaches. Pose-invariant face recognition system that we developed is crucial because such technology has become popular, and it is closer to real-life usage of face recognition technology. The combination of the PIFR system with the template adoption method will produce technology that can work in crowded places with people facing from different angles.

Materials and methods

Face recognition system during the process uses facial features to identify or verify an individual. This process consists of several steps such as face detection, face alignment, feature extraction, and finally, face matching or identification.

Face Detection

The initial step of every face recognition system is face detection. It involves locating a face in an image or video frame. In general, this process includes an algorithm that analyzes the image to find regions that are likely to contain faces. The “detectMultiScale” function in OpenCV's Haar Cascade Classifier employs a scaled window approach to detect objects. The function (1) scales the image multiple times to find faces of various sizes.

= (1)

In each region, the Haar features are calculated (2), which are then used to identify whether a face is present or not.

=– (2)

The output of this step is a bounding box that surrounds the detected face. Before face detection, it was used the OpenCV method that enhances the contrast of an image by equalizing its histogram. This research, focused on detecting multiple faces in crowded frames, using the Haar cascade classifier, specifically the “haarcascade_frontalface_default.xml” model. It compares the analyzed features in the image with those of the object being detected. The cascade classifier helps quickly determine whether a region contains a face or not. Each stage of the cascade classifier applies a series of feature tests; if the region passes, it moves to the next stage.

Face Recognition

As the final stage it compares extracted facial features of the input frame and database of known faces for the recognition process. For identifying the person, a template adoption method was used. By modifying a pre-existing face template to match the facial patterns of a new subject, the template adoption process was achieved. This helps create a new template that can be used for further recognition or identification tasks. The cv2.matchTemplate function in OpenCV is used to match a template image over an input image, and it returns a correlation map. In the code cv2.TM_CCOEFF_NORMED applies the Normalized Cross Correlation (NCC) formula (3).

(3)

f(x,y) and g(x,y) represent the pixel intensities at location (x,y) of images f and g respectively.are the mean pixel intensities of images f and g respectively.

The system matches the result template with stored templates in the database. If the face is presented at an angle or pose that is different from those in the database, the system will use template adoption to adjust the pre-existing template to match the new pose or angle.

Dataset. Initially, it was used in the Labeled Faces in the Wild dataset [6]. However, this dataset does not work well for template-based adoption methods. Template adoption methods for pose-invariant face recognition typically require a dataset with images of the same person captured from multiple poses. The dataset should have a variety of poses, such as frontal, profile, and semi-profile views, captured under different lighting conditions, occlusion, and facial expressions. For this reason, we created our database to experiment with our developed face recognition system. The dataset contains named folders of popular people such as actors, and singers with photos from different angles and poses. The main difference from the LFW dataset is it consists of a smaller number of persons and a larger number of photos of each of them. It used famous actors' photos from the internet to create the dataset.

Template matching. Before the recognition process, it is important to load a pre-existing dataset described in the previous section. The system creates grayscale face templates for a set of images in a specified directory and stores them in a dictionary for further processing and recognition tasks. Face identification is done by multi-view face recognition in combination with the template adoption method. Faces can appear different due to angle, lighting, and other factors. To address this problem, multi-view face recognition is used, which helps recognize a person's face from images taken at different angles.

One approach to multi-view face recognition is to use multiple templates or views the face of a person. These templates can be acquired by capturing images of the person from different angles or by using pre-existing templates. However, using multiple templates can be computationally expensive and requires storing and matching a large number of templates. To address this issue, a template adoption method can be used. This approach involves selecting a single template for each person and adapting it to different poses. This reduces the computational cost of matching and storage while still allowing for accurate recognition across multiple views. The system uses template matching with multiple scales to each template, to find the best match for the current face in the reference image. It first resizes the face template to different scales and then applies template matching using the face region of the current face in the reference image. It selects the template with the highest correlation coefficient as the best match. Finally, it draws a bounding box around the matched face in the reference image and labels it with the name of the matched person. All these steps are illustrated in Figure 1 flowchart of the implemented system for the Pose-Invariant Face recognition system with template adoption.

Figure 1. PIFR system with template adoption flowchart

Results and discussions

LFW dataset [6] for face detection is used as a baseline for comparing template matching system results. It consists of images collected from the internet, containing a wide variety of people's faces in various poses, lighting conditions, and backgrounds. The dataset is widely used to evaluate and compare the performance of face recognition algorithms. The dataset has 5749 individuals and in total 13000 images. The named folders make it easier to build labeled boxes around the face and compare with ground truth labels .csv file. The manually created dataset for this research considers only 15 individuals, but 290 images. Each of the identities consists of photos from different angles, poses, and lighting. Such a dataset as we created for this research, grows the accuracy of face recognition systems. Table 1 shows the comparison of the LFW dataset and our dataset.


a) One person frame	b) Two person frame
*Figure 2. Input images for face verification*

Table 1.

Results of PIFR system with template adoption

№	Faces	Figure	Dataset	TP	FN	FP	TPR	Precision	Recall	F1	Accuracy
1	1	Figure 2(a)	custom	1	0	0	1.00	1.00	1.00	1.00	1.00
2	1	Figure 2(a)	LFW	1	0	0	1.00	1.00	1.00	1.00	1.00
3	2	Figure 2(b)	custom	1	1	2	0.50	0.33	0.50	0.40	0.25
4	2	Figure 2(b)	LFW	0	2	2	0.00	0.00	0.00	N/A	0.00
5	6	Figure 3	custom	6	0	1	1.00	0.86	1.00	0.92	0.86
6	6	Figure 3	LFW	1	5	2	0.17	0.33	0.17	0.22	0.13

(4)

(5)

(6)

This PIFR system with template matching is directed to crowded frames, for this reason, it will be more interesting to analyze the result of Figure 3.

Figure 3. Six people frame

In the case of Figure 3 the system recognized in the frame all 6 people out 6 using a custom data set. True Positives(TP) in this example is 6 against 1 which was recognized by the same code, but using the LFW dataset. True Negative(TN) is 0 in all examples because it did not annotate any non-face things. False Negative(FN) is 0 for system recognition using its dataset in a frame with 6 people because it identifies all six persons. However, a system that uses the LFW dataset has FN = 5, meaning 5 people out of 6 were wrongly identified. The system with the LFW dataset incorrectly identified 2 things as persons. So, FP = 2. All these data are clearly shown in Figure 4 which was generated by an auxiliary program.

Figure 4. Six people frame compared with annotation

Figure 5 shows that for the template matching approach, it is better to use a custom dataset that contains photos of concrete people with different lighting and poses. Such a dataset increases the accuracy in the recognition process rather than datasets that contain a large number of individuals with a smaller number of photos of each person.

Figure 5. Number of people recognized by different dataset

In addition to identifying the face, the system also read the annotation made in the .csv file and compared the recognized person with the data from the ground truth labels. As a result, if the name and coordinates of the face frame matched the annotation, then a green square was drawn near the face and the name of the recognized person. However, if there is a mismatch in the names, then a red square is drawn with a red name marker near the face. Such a visual result facilitates the calculation of further indicators such as precision, recall, F1 score, and accuracy. These indicators are calculated by equations 4, 5 and 6 respectively. The accuracy of the PIFR system with template adoption using a custom Database on the frame with 6 persons was 86%. It means systems with such accuracy can be used where security level is not crucial, but it is important to work with crowded frames.

As it was mentioned above, incorporating pose invariance in face recognition systems with template matching improves matching accuracy, enhances system robustness, expands application possibilities, and reduces the need for strict pose control during image capture. It enables more reliable and effective face recognition in real-world environments with pose variation.

In this paper, a Pose-Invariant Face Recognition system with a template adoption method was proposed. The contribution of the research is the development of a multi-person recognition system that can identify persons in the frame in different poses which makes the system more adapted to the real-life environment.

It is acceptable to use our PIFR system in a non-critical access control environment or for systems that work with crowded frames such as outdoor video surveillance where the primary purpose is to track movements of the individual or analyze crowd behavior. Also, our system is best suited for attendance tracking. Usually, such systems are needed for universities, schools, and colleges. They have a small dataset of individuals and can capture photos of students from different angles. Further, it can be used in classrooms for tracking attendance.

Pose Invariant face recognition systems with high accuracy can be used in systems where security levels should be respectively. These systems are Surveillance and Security, Access Control, and Biometric identification. Pose-invariant face recognition based on the template-adoption method holds great promise in various applications, especially in scenarios where recognizing individuals under varying poses is crucial. Challenges related to privacy, accuracy, ethics, and security should be carefully managed for successful implementation.

Conclusion

The proposed PIFR system in this research shows significant potential for improving face recognition accuracy, particularly in environments with varying poses. This technology offers promising applications in non-critical access control, outdoor video surveillance, and attendance tracking, making it highly suitable for universities, schools, and other educational institutions.

Based on the gained knowledge during this research, the accuracy can be increased by using the next steps. This can be achieved by using a more robust face detection algorithm or cascade classifier to accurately detect faces in images, especially in situations with variations in pose, lighting, or scale. One of the key factors that affects the recognition process is the quality of the dataset.

A good challenge for future investigation is to compare other template matching techniques, such as Scale-Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF), or Local Binary Patterns (LBP), to detect key facial features and improve matching accuracy. It was proven in this research work in the example of our data set with LFM. LFM consists of a lot of person folders, but most of them have one or two photos, such a dataset is not enough to apply template matching technique. For this reason, the Left-Front-Right dataset for pose-invariant face recognition in the wild [5] can help to improve the accuracy level. During the next research it will be interesting to investigate other methods of PIFR such as convolutional neural networks for face recognition.

References:

Golwalkar, R., & Mehendale, N. Masked-face recognition using deep metric learning and FaceMaskNet-21. // Appl Intell – 2022. – № 52. – S. 13268–13279.
International Air Transport Association. Global passenger survey 2023. // Retrieved from International Air Transport Association website: –2023.
Shinfeng, D. L., & Paulo, E. L. Pose-Invariant Face Recognition via Facial Landmark Based Ensemble Learning. // IEEE Access. –2023. – №11. – S. 44221-44233.
Ding, F., & Peng, P., & Huang, Y., & Geng, M., & Tian, Y. Masked Face Recognition with Latent Part Detection. // In Proceedings of the 28th ACM International Conference on Multimedia, MM '20. – 2020. – S. 2281-2289.
Elharrouss, O., & Almaadeed, N., & Al-Maadeed S. LFR face dataset:Left-Front-Right dataset for pose-invariant face recognition in the wild. 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT). – 2020. – S. 124-130.
Erik, L. A., & Jurie F. Labeled Faces in the Wild Dataset. // Retrieved from University of Massachusetts website. – 2018.
Jake, P. Facial Recognition Success Stories Showcase Positive Use Cases of the Technology. Retrieved from Security Industry Association website. // – 2020.
Hana, b. F., & Souhir, S., & Chokri, S. An Efficient Face Recognition Method Using CNN. // 2021 International Conference of Women in Data Science at Taif University (WiDSTaif ). – 2021. – S. 1-5.
He, H., & Liang, J., & Hou, Z., & Liu, H., & Zhou, X. Occlusion recovery face recognition based on information reconstruction. // Machine Vision and Applications. – 2023. – №34. – 74 s.
Eker, O., & Murat, B. A Comparative Analysis of the Face Recognition Methods in Video Surveillance Scenarios. // – 2022
Amrani, M. Deep convolutional multi-informative metric correlation analysis with bottleneck attention module for face recognition in the wild. // Multimedia Tools and Applications. – 2024. – №83. – S. 1-29.
Karthikeyan, S. Hybrid Framework for a Robust Face Recognition System Using EVB_CNN. // Journal of Cases on Information Technology. – 2021. – № 23. – S. 43-57.
Rahul, D., & Vishnu, N. B. 3DFaceFill: An Analysis-By-Synthesis Approach to Face Completion. // 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). – 2022. – S. 1224-1233.
Javed Mehedi Shamrat, F. M., & Majumder, A., & Antu, P. R., & Barmon, S. K., & Nowrin, I., & Ranjan, R. Human Face Recognition Applying Haar Cascade Classifier. In: Ranganathan, G., Bestak, R., Palanisamy, R., Rocha, Á. (eds) Pervasive Computing and Social Networking. Lecture Notes in Networks and Systems. – 2022. – № 317. – S. 143-157.
Sulayman, A., & Mondher, F., & Taha, D. H., & Javad, R. Face Recognition System using Histograms of Oriented Gradients and Convolutional Neural Network based on with Particle Swarm Optimization. // 2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE). – 2021. – S. 1-5.
Muhamad, I., & Rosilah, H., & Mohammad, H. K., & Meng, C.L. The Process of Using Face Detection Through Convolutional Neural Network. // 2022 International Conference on Business Analytics for Technology and Security (ICBATS). – 2022. – S. 1-5.
Moh, W., & Ahmad, A., & Ardacandra, S., & Wahyono, W. Human Face Detection and Tracking Using RetinaFace Network for Surveillance Systems. // IECON 2021 - 47th Annual Conference of the IEEE Industrial Electronics Society. – 2021. – S. 1-5.
C. Jiang, H. Ma & L. Li. IRNet: An Improved RetinaNet Model for Face Detection. // 2022 7th International Conference on Image, Vision and Computing (ICIVC). – 2022. – S. 129-134.
Zhang, N., & Luo, J., & Gao, W. Research on Face Detection Technology Based on MTCNN. // 2020 International Conference on Computer Network, Electronic and Automation (ICCNEA). – 2020. – S. 154-158.
Li, Y., & Wang, Z., & Hou, J. Pose-robust Feature Learning for Pose-Invariant Face Recognition. // 2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP). – 2020. – S. 6-10.
Majumder, S., & Tripathi, R. Deep Convolutional Neural Network-Based Recognition of Profile Rotated Face Images. // 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS). – 2022. – №1. – S. 673-678.
Jin, M., & Li, H. Feature-Aligned Feature Pyramid Network and Center-Assisted Anchor Matching for Small Face Detection. // 2023 4th International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE). – 2023. – S. 198-204.
Sanap, D. S., & Narwade, M. S., & Goge, M. S., & Pandit, M. K. Face Recognition Based Attendance System Using Histogram of Oriented Gradients and Linear Support Vector Machine. // European Journal of Theoretical and Applied Sciences. – 2023. – № 1, – S. 904-915.
Zhao, F., & Li, J., & Zhang, L., & Li, Z., & Na, S. Multi-view face recognition using deep neural networks. Future Generation Computer Systems. –2020. – № 111. – S. 375-380.