INTELLIGENT INFORMATION SYSTEM FOR CLASSIFYING STUDENT ACADEMIC ACHIEVEMENTS USING MACHINE LEARNING METHODS

ИНТЕЛЛЕКТУАЛЬНАЯ ИНФОРМАЦИОННАЯ СИСТЕМА ДЛЯ КЛАССИФИКАЦИИ АКАДЕМИЧЕСКИХ ДОСТИЖЕНИЙ СТУДЕНТОВ С ИСПОЛЬЗОВАНИЕМ МЕТОДОВ МАШИННОГО ОБУЧЕНИЯ

Kabdyzhan Z.Z. Imankulov T.S.

28.04.2026 257

4(145)

10. Информатика, вычислительная техника и управление

Цитировать:

Kabdyzhan Z.Z., Imankulov T.S. INTELLIGENT INFORMATION SYSTEM FOR CLASSIFYING STUDENT ACADEMIC ACHIEVEMENTS USING MACHINE LEARNING METHODS // Universum: технические науки : электрон. научн. журн. 2026. 4(145). URL: https://7universum.com/ru/tech/archive/item/22471 (дата обращения: 28.07.2026).

Прочитать статью:

DOI - 10.32743/UniTech.2026.145.4.22471

Статья поступила в редакцию: 05.04.2026

Принята к публикации: 14.04.2026

Опубликована: 28.04.2026

ABSTRACT

Student academic achievement classification is crucial for evaluating performance and guiding educational decision-making. Traditional grading methods often fail to capture the full scope of a student’s abilities, motivating the use of intelligent systems. This paper proposes an intelligent information system that uses machine learning (ML)–in particular, decision trees and neural networks –to classify students’ academic achievements more accurately and fairly. The system integrates multiple ML models to analyze diverse features such as grades, attendance, assignments, and participation. We implemented and tested the approach on real student data, using performance metrics including accuracy, precision, recall, and F1-score for evaluation. Preliminary results indicate that the decision tree model achieved the highest classification accuracy (~85\%), effectively identifying high- and low-performing students, while the neural network and other models showed slightly lower accuracy. The system demonstrated high precision for top and bottom achievers, though moderate confusion occurred in classifying borderline (average) students. These findings suggest that combining interpretable models with powerful learners can improve the assessment of student performance. The proposed system can assist educators by providing timely, data-driven insights into student achievements, paving the way for more personalized and adaptive learning strategies.

АННОТАЦИЯ

Классификация академических достижений студентов имеет важное значение для оценки успеваемости и принятия образовательных решений. Традиционные методы оценивания часто не отражают в полной мере способности студентов, что стимулирует использование интеллектуальных систем. В данной работе предлагается интеллектуальная информационная система, использующая методы машинного обучения (ML) – в частности, деревья решений и нейронные сети – для более точной и справедливой классификации академических достижений студентов. Система объединяет несколько моделей машинного обучения для анализа различных характеристик, таких как оценки, посещаемость, выполненные задания и участие в учебном процессе. Предложенный подход был реализован и протестирован на реальных данных студентов с использованием метрик оценки качества, включая точность (accuracy), прецизионность (precision), полноту (recall) и F1-меру. Предварительные результаты показали, что модель дерева решений достигла наивысшей точности классификации (~85%), эффективно выделяя студентов с высокими и низкими результатами, в то время как нейронная сеть и другие модели продемонстрировали немного более низкую точность. Система показала высокую прецизионность при определении лучших и отстающих студентов, однако наблюдались умеренные ошибки при классификации студентов со средними результатами. Эти результаты свидетельствуют о том, что сочетание интерпретируемых моделей с мощными алгоритмами обучения может повысить качество оценки успеваемости студентов. Предложенная система может помочь преподавателям, предоставляя своевремые аналитические данные об успеваемости студентов, что открывает возможности для более персонализированного и адаптивного обучения.

Keywords: Student academic achievement, Classification, Machine learning (ML), Decision tree, Neural networks, Educational data analysis, Student performance evaluation

Ключевые слова: Академическая успеваемость студентов, Классификация, Машинное обучение, Дерево решений, Нейронные сети, Анализ образовательных данных, Оценка успеваемости

Introduction

Evaluating student performance is essential in education, yet traditional methods such as GPA and letter grading often fail to reflect the full range of student abilities[1]. These approaches typically rely on limited indicators and do not consider individual learning patterns or external factors [2]. With the rapid growth of digital educational data, there is an increasing need for intelligent, data-driven methods to improve assessment accuracy and fairness[1].

Machine learning (ML) techniques have shown strong potential in predicting student performance by identifying patterns in educational data [3]. They also enable early detection of underperforming students, supporting timely interventions and improved outcomes[4]. However, many existing models prioritize accuracy while lacking interpretability, making them difficult for educators to trust and apply in practice[5].

To address these limitations, this study proposes an intelligent information system that combines decision trees and neural networks. The hybrid approach ensures both interpretability and high predictive performance, enabling more accurate and adaptive classification of student academic achievements.

Materials and methods

The proposed system applies a hybrid supervised machine learning approach to classify students into achievement categories (High, Average, Low). The pipeline includes data collection, preprocessing, feature selection, model training, evaluation, and deployment. The core models are Decision Tree and Neural Network, supported by Logistic Regression as a baseline. This combination ensures a balance between interpretability and predictive performance.

Data Collection and Preprocessing:

Student data are collected from academic records, including grades, attendance, and participation. The dataset is preprocessed by handling missing values, normalizing numerical features, encoding categorical variables, and removing outliers to ensure data quality.
Feature Selection:

Relevant features are selected based on their contribution to prediction accuracy and interpretability. Redundant or weakly correlated variables are removed to reduce noise and improve model efficiency.

Model Training:

Three models are trained: Logistic Regression (baseline), Decision Tree, and Neural Network (MLP). The dataset is split into training (80%) and testing (20%) sets. The Decision Tree is optimized for interpretability, while the Neural Network captures complex patterns in the data.

Model Evaluation:

Models are evaluated using accuracy, precision, recall, and F1-score. Cross-validation is applied to ensure generalization, and confusion matrices are analyzed to identify classification errors.

Deployment:

The best-performing model is integrated into the system, which supports continuous updates using new data. The system provides interpretable results through a user-friendly interface, enabling educators to make data-driven decisions.

Figure 1. Feature importance as determined by the Decision Tree model. Grades and attendance had the highest impact on classification

Results and discussion

Experimental Setup:

The proposed system was evaluated on a dataset of 200 undergraduate student records, including features such as grades, attendance, assignments, and participation. Students were classified into three categories (High, Average, Low) based on institutional criteria. The dataset was split into 80% training and 20% testing sets using stratified sampling to preserve class distribution. To address class imbalance, oversampling of minority classes was applied during training. All models were trained and evaluated under the same conditions.

Performance Metrics:

Model performance was assessed using accuracy, precision, recall, and F1-score. A confusion matrix was used to analyze misclassification patterns, particularly for borderline cases.

Results:

The Decision Tree model achieved the highest accuracy (~85%), outperforming the Neural Network (~80%) and Logistic Regression (~78%). It also demonstrated strong precision in identifying High and Low achievers, while slightly lower recall indicated minor misclassification of these groups as Average. The Average class showed lower precision and recall, reflecting challenges in classifying borderline students. Overall, the hybrid approach confirmed that combining interpretable and predictive models improves classification performance while maintaining practical usability.

Figure 2. Performance comparison of machine learning models used for classifying student academic achievements. The Decision Tree model achieved the highest accuracy and balanced precision, recall, and F1 score.

Conclusion

This study presents an intelligent information system for classifying student academic achievements using a hybrid machine learning approach. By combining decision trees and neural networks, the system achieves a balance between interpretability and predictive accuracy. The results demonstrate that the proposed approach improves classification performance (up to ~85% accuracy) and effectively identifies both high- and low-performing students. At the same time, the system provides interpretable insights that help educators understand the factors influencing student outcomes.

The proposed system supports data-driven decision-making in education by enabling early detection of struggling students and recognition of high achievers. Its adaptive design allows continuous improvement through new data, making it suitable for dynamic educational environments.

References:

Rahman A.M.J.M.Z. A review on data mining techniques and factors used in educational data mining to predict student amelioration // 2024.
Aziz A.A., Rizhan W.M., Idris W., Hassan H., Jusoh J.A. Intelligent system for personalizing students’ academic behaviors – a conceptual framework // International Journal on New Computer Architectures and Their Applications (IJNCAA). – 2024. –
Wen X. Relevance vector machine – functional linear network for predicting student’s academic performance // 2024.
Alalawi K., Chiong R., Athuada R. Early detection of underperforming students using machine learning algorithms // Proceedings of the IEEE Conference on Innovative Technologies in Intelligent System and Industrial Application (CITISIA). – 2021.
International Conference on Information Systems and Computer Networks (ISCON) // IEEE. – 2019.

Информация об авторах