REAL-TIME OBJECT DETECTION AND TRACKING FOR MOBILE ROBOT USING YOLOV8 AND STRONG SORT

ОБНАРУЖЕНИЕ И ОТСЛЕЖИВАНИЕ ОБЪЕКТОВ В РЕАЛЬНОМ ВРЕМЕНИ ДЛЯ МОБИЛЬНОГО РОБОТА С ИСПОЛЬЗОВАНИЕМ YOLOV8 И STRONG SORT

Le B.C. Nguyen D.D.

27.11.2023 218

11(116)

10. Информатика, вычислительная техника и управление

Цитировать:

Le B.C., Nguyen D.D. REAL-TIME OBJECT DETECTION AND TRACKING FOR MOBILE ROBOT USING YOLOV8 AND STRONG SORT // Universum: технические науки : электрон. научн. журн. 2023. 11(116). URL: https://7universum.com/ru/tech/archive/item/16223 (дата обращения: 05.05.2024).

Прочитать статью:

DOI - 10.32743/UniTech.2023.116.11.16223

ABSTRACT

This research paper presents an approach that addresses the challenge of devising a proficient object detection and tracking system for a robotic agent to track individuals by amalgamating the YOLOv8 algorithm with the Strong SORT algorithm. The YOLOv8 algorithm, renowned for its object detection capabilities, is employed for the identification of objects within the robot's environment, providing fast results and requiring a small amount of computation. Subsequently, the Strong SORT algorithm is utilized to assign distinctive identifiers to detected objects, thereby enabling the robot to track them effectively.

By merging YOLOv8 and Strong SORT, the resultant algorithm empowers the robot to discern and pursue objects even under circumstances where said objects are partially concealed or briefly absent from the visual field for a few consecutive frames. Experimentation of the proposed algorithm is conducted on a versatile 2-wheeled differential mobile robot platform. The experimental results demonstrate the algorithm's prowess in facilitating the robot in closely tailing individuals. Notably, the algorithm ensures seamless tracking without inducing abrupt movements and excels in sustaining consistent object tracking when confronted with rapid object motion. This research thus underscores the algorithm's potential for enhancing person-following capabilities in real-world robotic applications.

АННОТАЦИЯ

В данной статье представлен подход к решению задачи обнаружения и отслеживания людей для роботов путем объединения алгоритмов YOLOv8 и Strong SORT. Алгоритм YOLOv8, известный своими возможностями обнаружения объектов, используется для идентификации объектов (людей), обеспечивая быстрые результаты и требуя небольшого объема вычислений. Впоследствии алгоритм Strong SORT используется для присвоения отличительных идентификаторов обнаруженным объектам, что позволяет роботу эффективно отслеживать их.

Объединив YOLOv8 и Strong SORT, полученный алгоритм позволяет различать и преследовать объекты даже в условиях, когда указанные объекты частично скрыты или кратковременно отсутствуют в поле зрения в течение нескольких последовательных кадров. Эксперименты предложенного алгоритма проводятся на универсальном двухколесном дифференциальном мобильном роботе. Результаты экспериментов демонстрируют способность объединенного алгоритма помогать роботу следить за людьми. Примечательно, что алгоритм обеспечивает плавное отслеживание без резких движений и превосходно обеспечивает стабильное отслеживание людей при их быстром движении. Таким образом, это исследование подчеркивает потенциал алгоритма для улучшения возможностей слежения за человеком в реальных роботизированных приложениях.

Keywords: Object detection and tracking, YOLOv8, Strong SORT, mobile robot.

Ключевые слова: обнаружение и отслеживание объектов, YOLOv8, Strong SORT, мобильный робот.

1. Introduction

Within the domains of robotics and artificial intelligence, object detection and tracking technologies hold significant prominence. These technologies assume a pivotal role in endowing robots with the capability to astutely discern and engage with both human entities and their encompassing environmental milieu. From there, the robot can be applied in different fields such as: in the military (transportation robot to support soldiers on the battlefield), in the medical field (transportation robot can help transport goods, pharmaceuticals, or medical equipment according to the patient or medical staff), in tourism (humanoid mobile robots can be used to guide visitors in attractions or museums), etc.

While object detection helps the robot to automatically classify and process information from images, creating practical applications such as face detection, medical data analysis, and vehicle assistance. In self-driving and many other applications, object tracking is a higher level problem than object detection because the object being handled by the object tracking problem is not a simple task. Image which is a sequence of images. Object tracking should have the ability to ensure that the identifier (ID) of each object is always fixed across frames, as well as re-detection when the object is obscured or disappeared for several frames.

Object detection methods include image processing and computer vision techniques, using algorithms such as neural networks and deep learning to enhance object classification and positioning. However, object detection still faces many challenges, including system accuracy, algorithmic complexity, and security and privacy concerns. In addition, limitations in sensor technology and hardware devices also affect object detection.

In the world, there are many universities, organizations and individuals that have researched and built object detection and tracking programs for robots [1, 4, 5, 6, 8], but the combination of detection algorithms and object sticking together with limited computational resources makes the results not as good as expected. The experimental findings show that the system reacts slowly to changes in the environment, not responding in systems that require working in real-time mode.

In light of a comprehensive assessment of the global research landscape and a focused examination within the context of Vietnam, the authors advocate for the synergistic integration of the YOLOv8 and Strong SORT algorithms. This integration is envisioned as a means to formulate an advanced object detection and tracking algorithm tailored for real-time mobile robot operations, specifically aimed at facilitating dynamic human tracking endeavors.

The remainder of the paper is organized as follows. Section 2 presents system hardware description. Section 3 is devoted to real-time object detection and tracking for mobile robots. Experimental results are presented in section 4. Finally, conclusions and some future work proposals wrap-up the paper.

2. System hardware description

Within this research work, the evaluation of the proposed algorithms is executed utilizing an operational model of an active 2-wheel differential mobile robot, as depicted in Figure 1.

Figure 1. Model of an active 2-wheel mobile robot

The propulsion mechanism of the robot is managed through the utilization of two Brushless Direct Current (BLDC) motors, each equipped with a Hall sensor. These motors are subjected to control by means of BLDC Speed ASTA 4820 controllers. These controllers are connected to Mitsubishi FX3u-64MT/DS programmable logic controller through the FX2N-DA DAC converter module. All of the above components are powered by 24 VDC.

The block diagram of the proposed hardware control system is shown in Figure 2. The image processing and object tracking algorithms are implemented on the laptop. The output of the image processing program is the position deviations in the X and Y directions. The voltage supply to control the 2 motors is implemented on the programmable logic controller (PLC). The laptop communicates with the PLC via RS-485 by the Modbus RTU method. Consequently the user can send the 2-wheel speed values to the PLC and the DAC module adjusts the voltage for the driver to control the motors.

Figure 2. Hardware system connection diagram on robot

The control system of the robot can be divided into 2 classes:

Lower layer control system includes PLC programmable logic controller, drivers responsible for motor control (left, right);
The upper layer control system includes a Laptop computer connected to the camera that is responsible for implementing image processing algorithms, calculating and giving 2-wheel speed values, and then transmitting to the PLC via RS-485 standard.

Within the scope of this paper, the examination of motor control through Programmable Logic Controllers (PLCs) has been omitted, as it has been extensively investigated and documented in existing literature. Instead, the focal point of this study lies exclusively in the deployment and realization of object detection and tracking algorithms.

3. Real-time object detection and tracking for mobile robots

The conceptual genesis of directing the robot emerges from a pragmatic necessity, wherein the robot's operational requirements dictate the imperative to trail individuals during specific tasks such as transportation and assisting individuals with disabilities. To solve this problem, a camera should collect images, detect an object, and then control the robot to follow that object. The robot is considered to follow the object well when the center of the image is "relatively" coincident with the center of the object to be grasped.

Flowchart of object detection and tracking algorithm is shown in Figure 3.

Figure 3. Flowchart of object detection and tracking algorithm

When the robot moves, the image is taken by camera. Then the garthered image data will be the input to the “Image processing” module. In this module, the image is processed according to the steps of the YOLOv8 algorithm such as: meshing, extracting features, calculating the probability of object occurrence, determining the object bounding frame, removing the maximum frame. to improve accuracy then output the detected object bounding frames. The YOLOv8 model architecture is shown in Figure 4.

Figure 4. Structure of the YOLOv8 algorithm

YOLOv8 uses a convolutional neural network which can be divided into two main parts: Backbone and Head [2, 9]. Backbone's architecture consists of 53 convolutional layers and uses partial connections between stages to improve the flow of information between the different layers. The Head section of YOLOv8 consists of multiple convolutional layers followed by a series of fully connected layers. These classes are responsible for predicting bounding boxes, feature scores and class probabilities for detected objects in an image.

YOLOv8 is developed with many different models with differences in processing speed and accuracy. According to the study, the test results of the models are given in the following Table 1:

Table 1.

Test results of the models

Model	Size (pixels)	mAP^val 50-95	Speed^CPU ONNX (ms)	Speed A100 TensorRT (ms)	Params (M)	FLOPs (B)
YOLOv8n	640	37.3	80.4	0.99	3.2	8.7
YOLOv8s	640	44.9	128.4	1.20	11.2	28.6
YOLOv8m	640	50.2	234.7	1.83	25.9	78.9
YOLOv8l	640	52.9	375.2	2.39	43.7	165.2
YOLOv8x	640	53.9	479.1	3.53	68.2	257.8

Hence, it becomes evident that YOLOv8n represents a model that, albeit not attaining the same level of precision as certain alternative models, excels in processing speed, a crucial attribute for scenarios demanding real-time computational capabilities [3].

Figure 5. Performance comparison of object tracking algorithms

After detecting the object with YOLOv8, we combine it with the object tracking algorithm to assign an ID to the object. In the research content of the article, Strong SORT algorithm is used because this algorithm has relatively good performance compared to other object tracking algorithms and is capable of tracking many objects at the same time with high accuracy (Figure 5).

Strong SORT (Strongly Supervised Online Real-time Tracking) is a multi-object tracking algorithm that uses convolutional neural networks (CNN) to extract features of objects and track them in video. It uses strong supervised learning to train convolutional neural networks on correctly labeled datasets. The model is trained to classify and track objects in the video with high accuracy. This algorithm uses Kalman filter to predict the position of objects in the next frames. It combines these predictions with features extracted by the CNN network to determine the exact location of objects [7].

One advantage of Strong SORT is that it is capable of tracking many objects at the same time with high accuracy. It is also capable of video processing with high speed and good reliability. However, a drawback of Strong SORT is that it requires a high-quality and full-label training dataset to achieve maximum accuracy. In addition, it also has a higher computational complexity than some other tracking algorithms.

After detecting the object and assigning an ID to that object, we will let the robot follow the object with ID = 1 and determine the center of the object to be tracked as follows:

if id==1 :

xmin = bbox[0]

ymin = bbox[1]

xmax = bbox[2]

ymax = bbox[3]

xc = (xmin+xmax)/2

yc = (ymin+ymax)/2

cv2.circle(im0, (int(xc),int(yc)), radius=5, color=(0, 0, 255), thickness=-1)

center_human = "(" + str(xc) + " - " + str(yc) + ")"

cv2.putText(im0, center_human, (int(xc)-10, int(yc)-20), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), thickness=2)

# error

x_error = x0c- xc

y_error = 475 - ymax

Here, the X-axis deviation is calculated as the difference between the center of the tracked object and the center of the frame (which coincides with the center axis of the robot). This deviation is used to calculate the deflection angle of the robot with the target object. The Y-axis deviation is calculated by the bottom edge of the object frame sticking to the right line corresponding to the pixel value Y = 475. The y deviation is used to determine the robot's relative distance from the object, serving to calculate the robot's velocity. This value can be customized so that the robot maintains a suitable distance from the object.

Upon obtaining the error values along the X and Y axes, the subsequent step involves computing the motor transmission speeds within the PID_Base_Control() function. Subsequently, this calculated value is communicated to the Programmable Logic Controller (PLC) using the RS-485 communication protocol, following the Modbus RTU methodology.

4. Experimental results

The image processing protocol is executed through the utilization of the Python 3.8 programming language, in conjunction with the YOLOv8 module and the Strong SORT algorithm. This computational process takes place on a laptop computer, yielding the outcomes depicted in Figure 6.

z4614719126911_54d75cf644c69e56ac32250b822f0cff

Figure 6. Results of object detecting and ID assigning to an object

Derived from the observations illustrated in Figure 6 it is discernible that the YOLOv8 algorithm effectively ascertains the centroid of the designated object (represented by green coordinates), with concurrent execution of the Strong SORT algorithm attributing an identification label (ID = 1) to the targeted object for robot tracking. In scenarios involving the presence of multiple objects within the frame, the program produces outcomes akin to those depicted in Figure 7.

As shown in Figure 7, when two objects appear in a frame, the program detects and assigns each object a different ID (ID = 1 and ID = 2).

z4807000979116_7eb8b1fe53c98d5dbf62f4bc75d4edf0

Figure 7. Result of detecting and assigning ID to many objects

Experimenting with object tracking robot gives the results as shown in Figure 8.

z4807000969702_0af76901c9b60d1eebc64c70bda10519

Figure 8. Robot follows people

Experimentally when the object is obscured from the camera's view (< 4s), the proposed algorithm gives positive results: the robot re-detects the correct ID of the object and follows. This is necessary for the real object tracking problem.

The results of the X-axis deviation when the robot follows the object are shown in Figure 9.

Figure 9. X-axis tracing process

The Y-axis deviation results when the robot follows the object is shown in Figure 10.

Figure 10. Y axis tracking process

From the results in Figure 9 and Figure 10, it can be seen that the robot has correctly performed the task of following the object. Good tracking results when the subject moves at a speed of less than 1 m/s. According to the Y axis, the robot maintains a relatively good distance, but according to the X axis, the robot grips the angle is not good, there is still the phenomenon of oscillation (shaking) around the main axis direction.

The cause of the above phenomenon is due to the limited computing resources of the personal computer, the parameters of the motor control PID have not been optimally selected, and the influence of the transmission speed between the Laptop and the PLC. These shortcomings will be overcome by the authors in future studies.

5. Conclusion

By synergistically amalgamating the YOLOv8 object detection algorithm with the Strong SORT object tracking algorithm, an object detection and tracking framework emerges, tailored to enable mobile robots to track individuals, thereby fulfilling the real-world demands of transport robots. This integration yields favorable outcomes, mitigating the inherent limitations of these algorithms when employed in isolation. Consequently, the robot can pursue the target entity even amid extraneous influences, such as temporary disappearance or occlusion of the object. The resultant trajectory remains smooth and devoid of perturbations.

The research results of the article can be used to develop models of battlefield robots to support soldiers on the battlefield, as well as medical robots to assist disabled people, and assist medical staff in transporting drugs. yeast, necessities.

References:

Bhawana Tyagi, Swati Nigam, Rajiv Singh (2023, July). Human Detection and Tracking Based on YOLOv3 and DeepSORT. Communication and Intelligent Systems. doi: 10.1007/978-981-99-2100-3_11.
Brief summary of YOLOv8 model structure [Электронный ресурс]. - Режим доступа: URL: https://github.com/ultralytics/ultralytics/issues/189 (дата обращения: 18/6/2023).
Evolution of YOLO Object Detection Model From V5 to V8 [Электронный ресурс]. - Режим доступа: URL: https://www.labellerr.com/blog/evolution-of-yolo-object-detection-model-from-v5-to-v8/ (дата обращения: 10/6/2023).
Ge Yang, Siping Chen (2023, May). Visual detection and tracking algorithms for human motion. Multimedia Tools and Applications. doi: 10.1007/s11042-023-15231-1.
Peng Cheng, Zinan Xiong, Yajie Bao, Yajie Bao, Ping Zhuang, Yunqi Zhang, Erik Blasch, Genshe Chen (2023, August). A Deep Learning-Enhanced Multi-Modal Sensing Platform for Robust Human Object Detection and Tracking in Challenging Environments. Electronics. Vol. 12 Issue 16.
Rusakova L, Shapova N (2023, May). The method of the real-time human detection and tracking. Artificial Intelligence. Vol. 28 Issue 1. pp. 66-73. doi: 10.15407/jai2023.01.066.
Strong SORT [Электронный ресурс]. - Режим доступа: URL: https://github.com/dyhBUPT/StrongSORT (дата обращения: 10/7/2023).
Tahira Irshad, Muhammad Asif, Arfa Hassan, Umair Bin Ahmad, Toqeer Mahmood, Rehan Ashraf, Nadeem Faisal (2023, March). A Deep Learning based Human Detection and Tracking for Security Surveillance Systems. Applied and Computational Engineering. Vol. 2 Issue 1. pp. 569-577. doi: 10.54254/2755-2721/2/20220606.
What is YOLOv8? The Ultimate Guide [Электронный ресурс]. - Режим доступа: URL: https://blog.roboflow.com/whats-new-in-yolov8/ (дата обращения: 13/6/2023).