STRATEGIC IT PROJECT MANAGEMENT FRAMEWORK FOR AI-POWERED EDTECH ECOSYSTEMS: A HYBRID AGILE-MLOPS APPROACH

СТРАТЕГИЧЕСКАЯ СТРУКТУРА УПРАВЛЕНИЯ IT-ПРОЕКТАМИ ДЛЯ AI-ОБРАЗОВАТЕЛЬНЫХ ЭКОСИСТЕМ: ГИБРИДНЫЙ AGILE-MLOPS ПОДХОД

Myrzakhan A.

28.04.2026 90

4(145)

10. Информатика, вычислительная техника и управление

Цитировать:

Myrzakhan A. STRATEGIC IT PROJECT MANAGEMENT FRAMEWORK FOR AI-POWERED EDTECH ECOSYSTEMS: A HYBRID AGILE-MLOPS APPROACH // Universum: технические науки : электрон. научн. журн. 2026. 4(145). URL: https://7universum.com/ru/tech/archive/item/22514 (дата обращения: 28.05.2026).

Прочитать статью:

DOI - 10.32743/UniTech.2026.145.4.22514

Статья поступила в редакцию: 03.04.2026

Принята к публикации: 14.04.2026

Опубликована: 28.04.2026

ABSTRACT

The integration of Artificial Intelligence (AI) into educational platforms has shifted from a competitive advantage to a fundamental necessity. However, the management of such projects often fails due to the rigid nature of traditional IT frameworks. This paper presents a specialized IT project management framework that harmonizes Agile software development with MLOps (Machine Learning Operations). We introduce a formal multi-objective optimization model to balance system quality, scalability, and ethical constraints. The study was conducted over 12 months involving 20 schools in Kazakhstan. Results indicate a 35% reduction in deployment latency, 73% improvement in deployment cycles (from 45 to 12 days), and 122% growth in user retention (from 35% to 78%). Model accuracy stabilized at 89.5% through bi-weekly retraining cycles. The proposed framework addresses key challenges including data quality issues, seasonal drift, and cultural adaptation, providing practical lessons for managing AI-powered educational systems.

АННОТАЦИЯ

Интеграция искусственного интеллекта (ИИ) в образовательные платформы превратилась из конкурентного преимущества в фундаментальную необходимость. Однако управление такими проектами часто терпит неудачу из-за жесткости традиционных IT-фреймворков. В данной работе представлена специализированная структура управления IT-проектами, гармонизирующая гибкую разработку ПО с MLOps (Machine Learning Operations). Предложена формальная модель многокритериальной оптимизации для баланса качества системы, масштабируемости и этических ограничений. Исследование проводилось в течение

12 месяцев с участием 20 школ Казахстана. Результаты показывают сокращение задержки развертывания на 35%, улучшение циклов развертывания на 73% (с 45 до 12 дней) и рост удержания пользователей на 122% (с 35% до 78%). Точность модели стабилизировалась на уровне 89,5% благодаря двухнедельным циклам переобучения. Предложенная структура решает ключевые проблемы, включая качество данных, сезонный дрейф и культурную адаптацию, предоставляя практические уроки для управления AI-образовательными системами.

Keywords: IT Project Management, EdTech, Artificial Intelligence, MLOps, Agile Methodologies, Microservices Governance, Digital Transformation, Kazakhstan.

Ключевые слова: управление IT-проектами, образовательные технологии, искусственный интеллект, MLOps, Agile методологии, управление микросервисами, цифровая трансформация, Казахстан.

Introduction

The contemporary landscape of K-12 and higher education is undergoing a paradigm shift, catalyzed by the democratization of adaptive learning algorithms and neural networks. Unlike standard enterprise resource planning (ERP) systems, AI-powered EdTech platforms are "living" entities that evolve based on the data they consume. As noted in recent digital transformation reports,

this inherent volatility necessitates a departure from classical Waterfall or even standard Agile methodologies, which often assume a static codebase and deterministic outputs [1].

In the context of school education, the complexity is compounded by fragmented digital infrastructures and stringent data privacy regulations (e.g., GDPR, COPPA). Project managers often face the "Cold Start" problem: implementing sophisticated AI tools in environments where baseline data is noisy, biased, or non-existent. Traditional management approaches often ignore the long-term maintenance of model accuracy, leading to "model drift" and rapid system obsolescence within months of deployment [2, p. 2503].

The intersection of AI and pedagogy has been explored primarily through Intelligent Tutoring Systems (ITS). Holmes et al. emphasized that AI in education (AIEd) must be "pedagogically driven" rather than "technology-led"[3]. However, the main barrier to AI adoption in schools is not the lack of algorithms, but the lack of an integrated management framework that ensures data interoperability and ethical compliance [4; 5].

This paper contributes to the field of IT management by: (1) proposing a dual-track management lifecycle that synchronizes software engineering with data science cycles; (2) formalizing project success criteria through a multi-variable objective function; (3) analyzing the role of microservices in mitigating technical debt in AI systems; and (4) providing a comprehensive risk taxonomy for AI implementation in public schools based on a 12-month case study in Kazakhstan.

Materials and methods

Research Design and Theoretical Foundation

The study employed a mixed-methods approach combining mathematical modeling, framework development, and longitudinal case study evaluation. The research was conducted over 12 months (2024-2025) in a district-wide pilot involving 20 schools in Kazakhstan, with 87 educators and administrators participating in the study.

Mathematical Modeling Approach

Strategic management of an AI project can be modeled as a constrained optimization problem. We define the global success index J as:

(1)

where Q(t) represents the accuracy of AI-driven recommendations, S(t) denotes architectural scalability [6], U (t) is the user adoption rate, and R_i represents risk vectors including data leakage and algorithmic bias [5].

AI models are subject to performance degradation due to "concept drift"[7]. We model the quality Q(t) as:

Q(t) = Q₀ · e^−λt + ∆_retrain (2)

where λ is the decay constant and ∆_retrain is the gain achieved through MLOps-driven updates [8].

Following the logic of Sculley et al., we define technical debt D_tech in an EdTech AI system as a function of data coupling C_d and infrastructure complexity I_c [2]:

(3)

Proposed Dual-Track Framework

We propose a synchronized framework where the project is split into two concurrent tracks. The Software Track follows standard Scrum [9], while the AI Track focuses on data engineering and experimental training. This approach addresses the "data work"challenges identified in recent research [10, p. 1].

The Software Track operates on two-week sprints with clearly defined deliverables: API endpoints, user interface components, and integration modules. In contrast, the AI Track runs on experiment cycles rather than sprints, as model training depends on data quality which often reveals itself only during experimentation.

Governance involves a tripartite committee: Pedagogical Experts, IT Managers, and Ethics Officers. Using principles of ITIL 4 [11] and ISO 38500 [12], our framework employs a service-mesh architecture (Figure 1).

Figure 1. Proposed Microservices Architecture for AI EdTech Integration

Risk Assessment Methodology

As emphasized by Floridi et al., ethical risks are paramount in education [5]. Our framework adopts a multi-tier risk assessment matrix developed through expert consultation and validated through the pilot implementation (Table 1).

Таблица 1.

Comprehensive Risk Matrix for AI EdTech

Risk Factor	Probability	Impact	Mitigation Strategy
Data Silos	High	High	Unified Data Lake API
Algorithmic Bias	Medium	Critical	Periodic XAI Audits
Teacher Resistance	High	Medium	Stakeholder Sprints
Technical Debt	Medium	High	Automated Refactoring
Model Drift	High	High	Continuous Monitoring
Privacy Violations	Low	Critical	Differential Privacy

Technology Stack and Implementation

The technology stack was selected based on technical considerations and practical constraints. We implemented a microservices architecture using Python (FastAPI) for the backend, React for the frontend, and TensorFlow for model serving. For data storage, we implemented a multi-tier system with Kafka streams feeding both real-time dashboards and a data lake using medallion architecture: bronze layer for raw data, silver for cleaned data, and gold for model-ready features.

Data Collection and Analysis

Data collection involved three sources: (1) system logs and performance metrics, (2) teacher surveys and interviews (n=87), and (3) student engagement data. The study began with an intensive six-week "Sprint 0"phase addressing data quality issues including duplicate entries, inconsistent naming conventions, and missing fields across different school systems.

To handle deployment complexity, we utilized an adaptive scaling algorithm inspired by automated lifecycle management concepts in ModelOps [13]:

Algorithm 1 Adaptive AI Project Resource Scaling

1: Input: Accuracy A, Latency L, Budget B

2: while Project is active do

3: if A < Athreshold then

4: Increase Data Engineering resources by 20%

5: end if

6: if L > L_max then

7: Scale Cloud Infrastructure

8: end if

9: if Teacher feedback flags > 50 per week then

10: Trigger emergency model review

11: end if

12: end while

The algorithm operates with human oversight to prevent cost overruns from automatic scaling. Budget guards require human approval for scaling beyond certain thresholds.

Results and discussion

Quantitative Outcomes

The transition from a monolithic legacy system to our proposed architecture over the 12-month pilot period resulted in significant improvements across multiple metrics:

Deployment Cycle: Reduced from 45 days to 12 days, representing a 73% improvement in time-to-production
User Retention: Increased from 35% to 78%, a 122% growth in sustained platform usage
Model Accuracy: Stabilized at 89.5% through bi-weekly retraining cycles, compared to initial accuracy of 76%
Teacher Time Savings: Average 3.2 hours per week on lesson planning activities
Student Engagement: Increased from 42% to 67% on AI-recommended content
System Uptime: Improved from 94% to 99.2%, critical for school operational reliability

These quantitative results demonstrate the effectiveness of the dual-track Agile-MLOps approach in addressing both software delivery and model performance requirements simultaneously.

Qualitative Insights from Educators

Beyond metrics, extensive qualitative feedback was gathered through 87 interviews and classroom observations. Teachers appreciated the reduction in administrative burden but expressed concerns about over-reliance on AI recommendations. One teacher remarked, "It’s great that the system suggests which students need help, but I already knew that from watching my classroom. What I really need is help with what specific interventions to use."

This feedback led to a mid-pilot pivot in AI focus from identifying struggling students (which experienced teachers already knew) to recommending specific learning resources matched to individual student needs. This demonstrates the value of maintaining framework flexibility even within structured processes.

Student feedback revealed that high-performing students appreciated personalized challenge materials, while struggling students sometimes felt overwhelmed by suggestions. We learned that recommendation timing significantly impacts effectiveness—presenting suggestions at lesson start differs from presenting them during independent work time.

Implementation Challenges and Adaptations

Several unexpected challenges emerged during implementation. Internet connectivity issues affected three schools severely, forcing implementation of aggressive local caching and offline-first design patterns not originally planned. While adding technical complexity, this benefited all schools, not just those with connectivity problems.

Seasonal data drift presented another challenge. Student engagement patterns differed dramatically across the academic year, with models trained on September data performing poorly in May.

We addressed this by implementing seasonal model variants with automatic switching based on the academic calendar.

Cultural events impacted model behavior in unanticipated ways. In schools with significant populations observing Ramadan, engagement patterns shifted noticeably during that month. Initial models misinterpreted this as requiring intervention, when it was normal and expected. We added cultural calendar awareness to prevent such misinterpretations. Data quality issues proved more severe than anticipated. One school recorded grades as letters (A, B, C), another used percentages, and a third used a 5-point scale. Harmonizing these required extensive manual intervention during the extended Sprint 0 phase.

The teacher time savings metric, while impressive, masked significant variation. Some teachers saved six hours weekly, while others saw minimal benefit. This difference correlated with teaching style—teachers already using data-driven approaches benefited most, while traditional-style teachers struggled to incorporate AI recommendations effectively.

Shadow IT as User Feedback

We discovered "shadow IT"practices where teachers created workarounds when the official system didn’t meet their needs. This occurred when AI recommendations were too slow or inaccurate, leading teachers to use manual methods while still logging into the system to satisfy administrative requirements.

Rather than blocking these practices, we implemented a feedback loop where teachers could flag problematic AI recommendations directly in the interface. These flags triggered immediate human review and fed back into the retraining pipeline. This mechanism reduced shadow IT practices by approximately 60% over six months, demonstrating that shadow IT should be treated as valuable user feedback rather than a problem to eliminate.

Ethical Considerations and Algorithmic Authority

The ethical implications ran deeper than initially apparent. We encountered a case where the AI recommendation system, trained on historical data, began subtly reinforcing existing inequalities. Students who had historically received less challenging material continued receiving less challenging material because the model learned that pattern.

Breaking this cycle required actively debiasing the training data and implementing fairness constraints that sometimes reduced overall accuracy but produced more equitable outcomes. This raises important questions about the trade-off between model performance and social equity in educational contexts.

The question of algorithmic authority in classrooms proved philosophically complex. When AI recommendations conflicted with teacher judgment, establishing who should prevail created challenges. We established that teachers always have final authority, but if teachers frequently override the AI, the system learns its recommendations are untrustworthy, potentially creating a negative feedback loop.

We addressed this by distinguishing between "recommendation error"(AI was genuinely wrong) and "context override"(AI was reasonable but teacher had additional contextual information). Only the former feeds back into retraining, preserving model learning while respecting teacher expertise.

Key Lessons for Practitioners

Five critical lessons emerged that benefit others undertaking similar projects:

Start with Data, Not Models. Our biggest early mistake was focusing too quickly on sophisticated models before ensuring data quality. More time should be invested in data infrastructure during Sprint 0. Clean, reliable data provides greater benefit than slightly more accurate models working with messy data.

Build for Interpretability from Day One. Initially treating model interpretability as a "nice to have"feature was wrong. When teachers didn’t understand recommendations, they didn’t trust them. Retrofitting interpretability into black-box models proved far harder than building interpretable models initially.

Plan for Model Decay. Every ML model degrades over time as contexts change. Retraining should be budgeted as a maintenance requirement, not an optimization to improve performance. This is not optional for production systems.

Cultural Change Requires Time. Technical implementation was straightforward compared to changing how teachers think about and use data. No amount of technical sophistication compensates for inadequate change management and training programs.

Monitor Beyond Technical Metrics. Beyond standard metrics (accuracy, latency, error rates), we track prediction distribution shift, feature drift, teacher override rate, and student engagement following recommendations. These human-centered metrics often reveal problems that technical metrics miss.

Conclusion

This study established a robust framework for managing AI-powered educational projects by integrating Agile software development with MLOps practices. The dual-track approach successfully addresses the unique challenges of AI projects while maintaining the structure needed for educational institutional contexts.

The 12-month case study involving 20 schools in Kazakhstan demonstrates that this approach delivers measurable improvements: 73% faster deployment cycles, 122% growth in user retention, and stabilized model accuracy at 89.5%. However, quantitative metrics tell only part of the story. Qualitative insights revealed that successful implementation requires cultural adaptation, flexible response to seasonal patterns, and careful navigation of ethical considerations around algorithmic authority.

Five key lessons emerged for practitioners: prioritize data quality over model sophistication, build interpretability from the start, budget for continuous model decay, allocate sufficient time for cultural change, and monitor human-centered metrics alongside technical performance indicators.

The integration of AI into education is inevitable, but success depends critically on project management approaches that accommodate both technological requirements and pedagogical realities. This framework provides a validated starting point, though each implementation will require adaptation to local contexts, constraints, and educational philosophies. Future research should explore federated learning approaches to enhance privacy protection, AutoML integration to democratize AI development for resource-limited schools, and multi-year longitudinal studies to assess long-term learning outcomes beyond immediate engagement metrics.

References:

McKinsey & Company. The state of AI in 2022 // McKinsey Global Institute. – 2022. – Available at: https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
Sculley D., Holt G., Golovin D., et al. Hidden Technical Debt in Machine Learning Systems // Proc. of NIPS. – 2015. – P. 2503–2511.
Holmes W., Bialik M., Fadel C. Artificial Intelligence in Education: Promises and Implications for Teaching and Learning // Center for Curriculum Redesign. – Boston, 2019. – 172 p.
Zawacki-Richter O., Mar´ın V. I., Bond M., Gouverneur F. Systematic review of research on artificial intelligence applications in higher education // International Journal of Educational Technology in Higher Education. – 2019. – Vol. 16, No. 1. – P. 1–27.
Floridi L., Cowls J., Beltrametti M., et al. AI4People—An Ethical Framework for a Good AI Society: Opportunities, Risks, Principles, and Recommendations // Minds and Machines. – 2018. – Vol. 28. – P. 689–707.
Fowler M., Lewis J. Microservices: a definition of this new architectural term // MartinFowler.com. – 2015. – Available at: https://martinfowler.com/articles/microservices.html
Baier L., Jo¨hren F., Seebacher S. Challenges in the Deployment and Operation of Machine Learning in Practice // Proc. of ECIS. – 2019. – Research Paper 163.
Kreuzberger D., Ku¨hl N., Hirschl S. Machine Learning Operations (MLOps): Overview, Definition, and Architecture // IEEE Access. – 2023. – Vol. 11. – P. 31866–31879.
Schwaber K., Sutherland J. The Scrum Guide: The Definitive Guide to Scrum // Scrum.org. – 2020. – 14 p.
Sambasivan N., Kapania S., Highfill H., et al. "Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI // Proc. of CHI. – 2021. – P. 1–15.
AXELOS. ITIL Foundation: ITIL 4 Edition // The Stationery Office. – London, 2019. – 216 p.
ISO/IEC 38500:2015. Information technology — Governance of IT for the organization // International Organization for Standardization. – 2015.
Hummer W., Muthusamy V., Rausch T., et al. ModelOps: Cloud-based lifecycle management for reliable and trusted AI // Proc. of IC2E. – 2019. – P. 113–120.