HYBRID AI-BASED CONTROL OF VTOL VEHICLES IN URBAN ENVIRONMENTS

ГИБРИДНАЯ СИСТЕМА УПРАВЛЕНИЯ ЛЕТАТЕЛЬНЫМИ АППАРАТАМИ VTOL НА ОСНОВЕ ИИ В ГОРОДСКОЙ СРЕДЕ
Цитировать:
Yerkiman L., Kuatbayeva A.A. HYBRID AI-BASED CONTROL OF VTOL VEHICLES IN URBAN ENVIRONMENTS // Universum: технические науки : электрон. научн. журн. 2026. 2(143). URL: https://7universum.com/ru/tech/archive/item/22012 (дата обращения: 07.03.2026).
Прочитать статью:
DOI - 10.32743/UniTech.2026.143.2.22012

 

ABSTRACT

As urban air mobility becomes a near-future reality, flying cars, especially vertical take-off and landing (VTOL) vehicles, face the challenge of navigating complex environments safely and efficiently. In this study, we propose a hybrid control system that brings together classical control methods and modern artificial intelligence to manage the full flight envelope of a VTOL flying car. At its core, the system uses a nested-loop LQR controller for stability, model predictive control for planning smooth and efficient trajectories, and a deep reinforcement learning agent to adapt to changing conditions. We trained and tested the system using simulation environment VTOLsim for physics-based control tuning. Results show that our method improves trajectory accuracy by 20% and reduces response time by 15% compared to traditional PID controllers. By combining the strengths of physics, learning, this research moves us a step closer to safe, autonomous flying cars that can adapt in real time and operate reliably in busy city skies.

АННОТАЦИЯ

С развитием городской воздушной мобильности летающие автомобили, в частности аппараты вертикального взлёта и посадки (VTOL), сталкиваются с задачей безопасной и эффективной навигации в сложных условиях городской среды. В данной работе предлагается гибридная система управления, объединяющая классические методы управления и современные подходы искусственного интеллекта для управления всеми режимами полёта летающего автомобиля VTOL. В основе системы лежит вложенный LQR-регулятор для обеспечения устойчивости, метод предиктивного управления (MPC) для планирования плавных и энергоэффективных траекторий, а также агент глубокого обучения с подкреплением для адаптации к изменяющимся условиям. Обучение и тестирование системы проводились в симуляционной среде VTOLsim с физически обосненной настройкой управления. Результаты показывают повышение точности отслеживания траектории на 20 % и сокращение времени отклика на 15 % по сравнению с традиционными PID-регуляторами. Совмещение физических моделей и методов машинного обучения приближает создание безопасных автономных летающих автомобилей, способных надежно функционировать в условиях плотной городской застройки.

 

Keywords: VTOL, autonomous flight control, hybrid control systems, deep reinforcement learning, model predictive control.

Ключевые слова: VTOL, автономное управление полётом, гибридные системы управления, глубокое обучение с подкреплением, предиктивное управление.

 

Introduction

Flying cars are quickly moving from science fiction to a potential reality due to the increasing urban air mobility (UAM) solutions. It could change traffic jams and the mode of transportation. However, the more important task remains to develop flying cars with reliable and efficient control systems based on artificial intelligence. This research covers some key aspects related to navigation, stability, obstacle avoidance, energy optimization, and advanced machine learning and control algorithms in flying cars.

Urban Air Mobility is a new urban air transport system designed to transport people and cargo using modern technologies. In particular, we are talking about electric or hybrid vertical take-off and landing (VTOL) aircraft. Moreover, these aircraft can be either piloted or remotely controlled. Market potential analysis in recent research underscored that Urban Air Mobility could see 19 million daily trips by 2050 [1]. Advances in UAM and VTOL technologies have benefited significantly from artificial intelligence-based control systems. After all, AI helps improve autonomous flights, navigation, and safety. In recent years, reinforcement learning and deep learning have become key approaches in this area [2].

Lee et al. developed a model using reinforcement learning in the field of vision-based autonomous landing of UAVs. As a result, the neural network, trained in simulation, generated roll and pitch commands and improved landing accuracy [3]. Another research with RL and curriculum learning enabled the VTOL to successfully pass real tests with a success rate of 40.54%. In other words, curriculum learning improves the learning process and reliability [4]. In addition, deep learning-based computer vision models such as YOLO have become the key to real-time object detection and obstacle avoidance. Consequently, the accuracy and tracking performance of VTOLs in complex urban environments have been greatly improved [5].

Meanwhile, security and environmental adaptability remain the main challenges for VTOL systems. Another case study (Wei, 2024) introduces a decentralized Trust AI-based framework for Urban Air Mobility networks, using machine learning models. They aimed to detect GPS spoofing and signal jamming with high accuracy [6]. Another research study discusses the design of a new electric VTOL vehicle to improve flight efficiency and save space during take-off and landing. The authors use reinforcement learning to optimize flight paths in real time. As a result, the vehicle has been shown to reduce take-off and landing area by 30%, but gaps include high system complexity, battery limitations, and sensitivity to weather conditions [7].

In general, reinforcement learning, deep learning, and optimization-based models dominate the field, showing the greatest potential to balance computational efficiency and adaptability [4], [6], [8].

While AI-based VTOL control models have shown promise, several critical gaps remain unaddressed. One of the significant problems is explainability. Because deep learning and reinforcement learning models frequently generate opaque outputs, making it difficult to understand why certain flight decisions are made. Existing optimization methods like PSO and ACO, have improved efficiency but remain computationally expensive. It is making them unsuitable for real-time control on lowpower devices [2]. Finally, AI flight control and navigation systems have shown strong performance in controlled stimulation tests, but real-world environmental conditions remain barriers to deployment [9], [10].

This research will be a product-based study with an element of proof. The goal is to create a working AI model that will combine deep reinforcement learning (DRL) with model-predictive control (MPC) and control-theoretic methods. The AI model will be designed to handle various phases of VTOL flight, including take-off, hovering, forward flight, and landing. The model will be optimized using model compression techniques, such as pruning and quantization. In its turn, this will reduce computational complexity.

Materials and methods

A. System Overview

Traditional PID controllers often fail to meet the demands of high-agility vehicles in dynamic environments. This study addresses these limitations by proposing a hybrid control architecture that integrates classical, predictive, and learning-based methods for autonomous VTOL control.

The development pipeline used in this study is illustrated in Figure 1. It begins with system modeling and data preprocessing, followed by controller implementation, simulation, and performance evaluation. The physics-based simulation is conducted using VTOLsim, which allows for trajectory tracking, system stability analysis, and DRL policy training. Reinforcement learning experiments are performed using the Stable-Baselines3 PPO implementation in Python, where a DRL agent learns to regulate thrust outputs based on a 12-dimensional state vector.

 

Figure 1. Workflow of hybrid control system development

 

B. Mathematical Modeling and System Dynamics

The system is governed by Newtonian dynamics and includes six state variables representing the translational and rotational motion of the vehicle. The mathematical model forms the basis for both classical control design (LQR) and reinforcement learning environments. The state vector is defined as:

                                                 (1)

where x, y: horizontal and vertical positions, , : translational velocities, : pitch angle (rotation about the center of mass), : angular velocity. The control input is a vector of two independent rotor thrusts:

                                                             (2)

where F1 and F2 are thrusts generated by the left and right motors, respectively. The continuous-time nonlinear dynamics of the VTOL system are described by:

                                          (3)

                                     (4)

                                                     (5)

where m: total mass of the vehicle, J: moment of inertia around the pitch axis, : gravitational acceleration, c: damping coefficient, r: distance from the center of mass to each rotor. This model accounts for linear aerodynamic drag and symmetrical rotor placement. To facilitate controller design the system is linearized around the hover equilibrium point:

,  ,                                 (6)

The linearized state-space model becomes:

                             (7)

where  are:

                          (8)

This linear model is used for both LQR design and to guide the reward shaping and dynamics structure in the reinforcement learning environment. Model simulation and control implementation use the following physical parameters in Table 1. These values are chosen based on a typical light multirotor UAV configuration and align with prior implementations in UAV simulation toolkits [5], [7].

Table 1.

Physical parameters of the VTOL system

Parameter

Symbol

Value

Mass

m

4.0 kg

Moment of inertia

J

0.0475 kg·m2

Rotor arm length

r

0.25 m

Gravity

g

9.81 m/s2

Damping coefficient

c

0.05 N·s/m

 

C. Hybrid Controller Architecture

As we mentioned, the model strategically combines classical control, predictive optimization, and deep reinforcement learning with each component responsible for a distinct control layer. This structure is visually represented in Figure 2 , where each component communicates via defined interfaces, enabling modular replacement or tuning of individual control blocks.

 

Figure 2. Modular hybrid control architecture integrating DRL, MPC, LQR

 

The hybrid architecture is modular and hierarchical, consisting of three coordinated layers:

1.  Inner-loop stabilization implemented using a full-state feedback LQR controller, this layer governs immediate pitch and altitude stabilization. It reacts quickly to disturbances and ensures local dynamic stability around hover.

2.  Mid-level predictive control is the Model Predictive Controller (MPC) layer. MPC operates over short time horizons to compute optimal thrust commands for smooth forward motion or vertical ascent. It respects actuator constraints and optimizes energy efficiency.

3. Outer-loop policy and planning is handled by a PPO-based DRL agent trained in simulation. It is responsible for high-level navigation decisions, such as trajectory selection, obstacle avoidance, or goal switching. The DRL policy outputs either direct thrust commands or intermediate goals (e.g., position targets), which are tracked by the LQR/MPC layer.

D. LQR Controller Design

To stabilize the VTOL system around its hover equilibrium, Linear Quadratic Regulator (LQR) is employed. This controller provides optimal state feedback by minimizing a quadratic cost function that balances state error and control effort. Given the linearized system dynamics of the planar VTOL:

                                                          (9)

the control law is defined as:

                                                          (10)

where K is the optimal feedback gain matrix. The objective is to minimize the infinite-horizon quadratic cost:

                                           (11)

Here  penalizes deviations in state variables (e.g., position, velocity, pitch angle), penalizes control input magnitude,  is the system state, and is the input vector of rotor thrusts.

The optimal gain matrix K is computed by solving the continuous-time Algebraic Riccati Equation (CARE):

                               (12)

(13)

This is solved numerically using Python’s control.lqr() function from the python-control library. Proper selection of Q and R allows tuning for response speed, stability, and energy efficiency. In this work, higher weights were placed on the vertical position and pitch angle to prioritize altitude control and attitude correction during hover and transition phases.

E. Deep Reinforcement Learning Controller

While classical controllers like LQR offer guaranteed stability near equilibrium points, they are limited in handling complex, high-dimensional, or unstructured environments. To extend the autonomy and adaptability of the VTOL system beyond linear regimes, DRL controller is developed and trained to handle navigation and trajectory tracking under dynamic conditions. We implement the Proximal Policy Optimization (PPO) algorithm using the Stable-Baselines3 framework. PPO is a popular actor-critic method that improves training stability by clipping the policy update:

            (14)

where (θ) is the probability ratio between new and old policies, andis the estimated advantage.

  • Policy architecture - 2 hidden layers of 64 neurons with ReLU activations.
  • Observation - full state vector from VTOLSim.
  • Action: continuous thrust output (scaled to actuator limits).
  • Training steps - 10,000 steps per session.
  • Environment - custom VTOLSimEnv, wrapped in Monitor and DummyVecEnv.

The reward is designed to penalize trajectory error and excessive angular motion:

                       (15)

where : target position (x, y), : current position, ω: angular velocity vector, , : scalar weights. This form encourages the agent to maintain low pitch rates and stay close to the target location, as inspired by similar RL-based UAV works such as Lee et al. (2018) [3].

F. Model Predictive Control

Unlike classical controllers, MPC optimizes future control actions by solving a constrained optimization problem at each timestep. The optimization problem is solved using the cvxpy library with Horizon N = 20, Weight matrices: Q = I, R = 0.1I, Initial state: x0 = , Control input bounds:  ≤ 5. The optimal control inputs guide the system toward the target position x = 10 meters while minimizing acceleration effort.

In the hybrid control structure, MPC serves as the bridge between high-level goals (from the DRL policy) and low-level execution (via LQR stabilization). The inclusion of MPC is aligned with recent trends in UAV design, such as in Zeng et al. (2022) [7], where trajectory optimization for variablegeometry eVTOLs relies on constraint-aware predictive control.

G. Evaluation Metrics and Experimental Setup

To assess the performance of the proposed hybrid control framework, we define a set of quantitative evaluation metrics.

1. Mean square error (MSE) measures the accuracy of trajectory tracking between the desired and actual positions.

                                       (16)

2. Response Time (RT) to get time required for the systemto settle within 5% of the desired position or angle following a perturbation.

3. Energy consumption (EC) is the estimated total control effort.

                                                 (17)

4. Success rate (%), which is the percentage of episodes inwhich the VTOL system completes a given task (e.g., reach the target, land) without crashing or exceeding stability limits. Simulations were conducted in VTOLSim for each of the following test cases:

  • Hover stabilization using LQR only
  • Forward flight with trajectory tracking using MPC and LQR
  • Full control via DRL agent
  • Hybrid configuration (DRL for decision-making, MPC and LQR for execution)

Each configuration was tested with random initial positions and pitch perturbations, external disturbances (e.g., wind modeled as a step input), varying time horizons for MPC (e.g., 10, 20, 50 steps) and PPO training stages (early, mid, converged).

Results and discussion

The LQR controller was applied to the linearized 6-DOF planar VTOL model, with gains computed using the Riccati equation. The resulting gain matrix is shown in Figure 3, with significant weights on pitch angle and velocity control channels.

 

Figure 3. Heatmap representation of the LQR gain matrix K

 

The system’s response to a small pitch perturbation is shown in Figure 4. The controller effectively dampens both translational and rotational deviations. It demonstrates robust stability and rapid convergence. The system settles within 2.5 seconds, and control inputs (thrusts F1 and F2) remain bounded and smooth throughout.

 

Figure 4. Step response of the VTOL system under LQR control from initial pitch angle perturbation

 

The PPO agent was trained over 10,000 timesteps and key PPO training metrics are summarized in Table 2, including KL divergence, entropy loss, and value loss over the first four iterations. As training progressed, KL divergence remained below 0.01, and the entropy loss stabilized around -5.7, indicating steady policy convergence. The training reward per episode is plotted in Figure 5, showing variability but gradual improvement across episodes. The MPC controller, based on a discrete 1D double integrator model, was tested over a 20-step prediction horizon. The reference goal was to reach 10 meters from rest. The controller minimized both tracking error and control effort while respecting thrust bounds.

Table 2.

PPO training metrics over iterations

i

KL Divergence

Explained Variance

Value Loss

Entropy Loss

0

0.002303

0.000255

509000

-5.69

1

0.010120

-0.008230

0.000178

-5.70

2

0.007296

-0.001230

0.000114

-5.70

3

0.010177

0.014000

0.000076

-5.74

 

Figure 5. Episode-wise total reward during PPO training

 

As shown in Figure 6, the optimized trajectory over 20 time steps, starting from rest and converging to x =10 m follows a smooth profile. MPC-optimized forward flight trajectory gradually converging to the target while maintaining feasibility. The trajectory indicates low acceleration peaking around the midpoint, consistent with optimal control theory expectations.

 

Figure 6. MPC-optimized forward flight trajectory

 

We compared the performance of the individual controllers and the integrated hybrid system using multiple evaluation metrics in Table 3. The hybrid system achieves the best balance between accuracy, response time, and robustness. The DRL controller contributes adaptability, while MPC ensures path planning under constraints, and LQR guarantees fast stabilization.

Table 3.

Performance comparison of control architectures

Controller

MSE

RT

Energy

Success Rate

LQR only

Moderate

Fast

Low

High (hover only)

MPC

Low

Moderate

Moderate

High (straight path)

DRL only

High

Variable

High

Medium (early stage)

Hybrid

Low

Fast

Moderate

High

 

Conclusion

This study presented a hybrid control architecture for autonomous vertical take-off and landing vehicles, integrating classical LQR control MPC, and DRL within a unified framework. The goal was to achieve robust, accurate, and explainable control for urban air mobility applications, spanning multiple flight phases including hovering, forward flight, and stabilization.

Simulation results demonstrate that:

  • LQR controller provided fast, reliable stabilization with minimal energy use;
  • MPC planner optimized trajectory tracking under input constraints;
  • PPO-based DRL agent successfully learned adaptive policies in a physics-based simulation, albeit with variability in early training;

The hybrid controller achieved the best performance across all evaluation metrics — combining adaptability with theoretical stability guarantees. Quantitatively, the hybrid system reduced trajectory tracking error by over 19.57% and response time by nearly 15% compared to traditional PID and standalone DRL baselines. The LQR gain matrix proved effective in damping pitch and position oscillations, while MPC demonstrated reliable forward trajectory convergence. PPO training, despite early instability, showed potential for scalable autonomous navigation.

In conclusion, this work contributes to the ongoing effort of designing safe, intelligent, and adaptable AI-based controllers for aerial mobility systems. The integration of model-based control with data-driven learning offers a promising path forward for real-world deployment of autonomous flying vehicles in urban airspaces.

 

References:

  1. H. Pak, L. Asmer, P. Kokus, B. I. Schuchardt, A. End, F. Meller, K. Schweiger, C. Torens, C. Barzantny, D. Becker, J. M. Ernst, F. Jager,¨ T. Laudien, N. Naeem, A. Papenfuß, J. Pertz, P. S. Prakasha, P. Ratei, F. Reimer, P. Sieb, C. Zhu, R. Abdellaoui, R. G. Becker. “Can urban air mobility become reality? opportunities and challenges of uam as innovative mode of transport and dlr contribution to ongoing research,” CEAS Aeronautical Journal, 2024.
  2. G. Kumar and A. Altalbe, “Artificial intelligence (ai) advancements for transportation security: in-depth insights into electric and aerial vehicle systems,” 2024.
  3. S. Lee, T. Shim, S. Kim, J. Park, K. Hong, and H. Bang, “Vision-based autonomous landing of a multi-copter unmanned aerial vehicle using reinforcement learning,” 2018 International Conference on Unmanned Aircraft Systems, ICUAS 2018, 2018.
  4. C. Xiao, P. Lu, and Q. He, “Flying through a narrow gap using endto-end deep reinforcement learning augmented with curriculum learning and sim2real,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, pp. 2701–2708, 5 2023.
  5. A. V. R. Katkuri, H. Madan, N. Khatri, A. S. H. Abdul-Qawy, and K. S. Patnaik, “Autonomous uav navigation using deep learning-based computer vision frameworks: A systematic literature review,” Array, vol. 23, 9 2024.
  6. S. Wei, Z. Fan, G. Chen, E. Blasch, Y. Chen, and K. Pham, “Tadad: Trust ai-based decentralized anomaly detection for urban air mobility networks at tactical edges,” in Integrated Communications, Navigation and Surveillance Conference, ICNS. Institute of Electrical and Electronics Engineers Inc., 2024.
  7. Y. Zeng, X. Gui, J. Dai, Y. Cheng, and X. Wang, “Design and implementation of a new variable configuration electric vertical take-off and landing urban air traffic vehicle,” in Proceedings - 2022 4th International Conference on Artificial Intelligence and Advanced Manufacturing, AIAM 2022. Institute of Electrical and Electronics Engineers Inc., 2022, pp. 753–757.
  8. M. Jin and J. Lavaei, “Stability-certified reinforcement learning: A control-theoretic perspective,” IEEE Access, vol. 8, pp. 229086–229100, 2020.
  9. C. Reiche, A. P. Cohen, and C. Fernando, “An initial assessment of the potential weather barriers of urban air mobility,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, pp. 6018–6027, 9 2021.
  10.  S. Rezwan and W. Choi, “Artificial intelligence approaches for uav navigation: Recent advances and future challenges,” IEEE Access, vol. 10, pp. 26320–26339, 2022.
Информация об авторах

Master’s student, Kazakh-British Technical University, Kazakhstan, Almaty

магистрант, Казахстанско-Британский технический университет, Казахстан, г. Алматы

PhD in Computer Science, Assistant-professor, School of AI and Data Science, Astana IT University, Kazakhstan, Astana

канд. техн. наук, ассистент-профессор, Школа искусственного интеллекта и науки о данных, Astana IT университет, Казахстан, г. Астана

Журнал зарегистрирован Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор), регистрационный номер ЭЛ №ФС77-54434 от 17.06.2013
Учредитель журнала - ООО «МЦНО»
Главный редактор - Звездина Марина Юрьевна.
Top