CREATION OF AN INTELLIGENT ROBOT FOR BANKING CONTACT CENTER TASKS WITH COMMUNICATION IN KAZAKH AND RUSSIAN LANGUAGES

СОЗДАНИЕ ИНТЕЛЛЕКТУАЛЬНОГО РОБОТА ДЛЯ ЗАДАЧ БАНКОВСКОГО КОНТАКТ ЦЕНТРА, С ОБЩЕНИЕМ НА КАЗАХСКОМ И РУССКОМ ЯЗЫКАХ

Turdalin N.M.

28.04.2026 321

4(145)

10. Информатика, вычислительная техника и управление

Цитировать:

Turdalin N.M. CREATION OF AN INTELLIGENT ROBOT FOR BANKING CONTACT CENTER TASKS WITH COMMUNICATION IN KAZAKH AND RUSSIAN LANGUAGES // Universum: технические науки : электрон. научн. журн. 2026. 4(145). URL: https://7universum.com/ru/tech/archive/item/22547 (дата обращения: 28.07.2026).

Прочитать статью:

Статья поступила в редакцию: 12.04.2026

Принята к публикации: 14.04.2026

Опубликована: 28.04.2026

ABSTRACT

Kazakhstan's digital banking services are expanding quickly, necessitating scalable and affordable solutions to manage high client inquiries. The architecture for an intelligent conversational robot intended for banking contact center operations that can converse in both Kazakh and Russian is presented in this research. The study evaluates current multilingual NLP tools in the context of the Kazakh language and looks at the main elements of such a system, including automatic speech recognition (ASR), natural language understanding (NLU), dialogue management, natural language generation (NLG), and text-to-speech synthesis (TTS). The suggested architecture combines retrieval-augmented generation, a rule-governed dialogue state machine, and transformer-based language models with core banking APIs. Important design choices, language-specific difficulties with Kazakh morphology, and future options for empirical assessment are all covered in this work. The suggested approach fills a realistically important void in the Central Asian fintech industry's automation of bilingual financial services.

АННОТАЦИЯ

Стремительный рост цифровых банковских услуг в Казахстане требует масштабируемых и экономически эффективных решений для обработки большого количества обращений клиентов. В этой статье предлагается архитектура интеллектуального разговорного робота, который может вести разговор на русском и казахском языках для задач контакт-центра банка. В исследовании рассматриваются основные элементы системы, такие как автоматическое распознавание речи (ASR), понимание естественного языка (NLU), управление диалогом, генерация ответов (NLG) и синтез речи (TTS). Кроме того, рассматриваются многоязычные инструменты обработки естественного языка, используемые для казахского языка. Трансформерные языковые модели, метод дополненной генерации и конечный автомат состояний диалога, связанный с банковскими API, включены в предложенную архитектуру. Основные проектные решения, языковые особенности казахской морфологии и перспективы будущей эмпирической оценки системы обсуждаются.

Keywords: intelligent robot, conversational AI, natural language processing, Kazakh language, banking automation, contact center, speech recognition, multilingual NLP.

Ключевые слова: интеллектуальный робот, разговорный ИИ, обработка естественного языка, казахский язык, автоматизация банковского обслуживания, контакт-центр, распознавание речи, многоязычный NLP.

Introduction

Over the past ten years, Kazakhstan's banking industry has experienced a substantial digital change. From traditional banks, organizations like Kaspi.kz have developed into all-inclusive financial super-apps that serve millions of users every day in the payment, lending, e-commerce, and marketplace verticals [1]. Contact center operations, whose call counts can approach hundreds of thousands per day, are severely strained by the consequent volume of client interactions, which includes balance inquiries, card and loan management, payment support, and complaint resolution.

Three main limitations are introduced by traditional contact centers' reliance on human operators: high operating costs, restricted round-the-clock availability, and inconsistent service quality [2]. One well-researched way to overcome these limitations is to use intelligent conversational systems to automate repetitive, high-volume questions. However, implementing such systems in Kazakhstan necessitates addressing a linguistic reality that sets the local market apart from the majority of previous work: a significant segment of the clientele communicates on a daily basis in Kazakh, a Turkic language with agglutinative morphology and relatively little representation in mainstream NLP research [10].

In this regard, current AI-powered banking assistants have a number of significant drawbacks. First, most prioritize languages that are widely spoken throughout the world and provide little to no assistance for Kazakh [10]. Second, existing voice recognition models sometimes have trouble with contextual subtleties and banking-specific vocabulary, which results in incorrect intent classification and unsuccessful self-service resolutions [5]. Third, when handling sensitive client data, AI-driven financial systems must ensure compliance with data protection laws; this requirement limits the usage of third-party cloud-based natural language processing services [9].

This paper proposes an architecture for an intelligent conversational robot that addresses these limitations by providing high-accuracy bilingual support for Kazakh and Russian within a secure, modular system designed for integration into banking infrastructure. The following tasks are addressed: (1) review of existing multilingual NLP approaches relevant to Kazakh; (2) identification of core architectural components for the banking contact center domain; (3) proposal of an integrated system design; (4) discussion of language-specific challenges and directions for empirical validation.

Materials and methods

Architectural design study and a thorough literature assessment on multilingual conversational AI are combined in the research technique. Sources were identified through Google Scholar, Semantic Scholar, and arXiv, covering publications from 2015 to 2024, using search terms including "Kazakh NLP", "multilingual chatbot banking", "speech recognition low-resource languages", "retrieval-augmented generation", and "dialogue management". Ten publications were selected as directly relevant to the proposed architecture [1–10].

Three criteria were used to evaluate architectural choices: (a) compatibility with low-resource language support, particularly Kazakh; (b) viability of integration with core banking systems under enterprise security constraints; and (c) flexibility that allows for incremental deployment and enhancement. These standards guided the construction of the suggested architecture, which was cross-validated against specifications found in published contact center automation studies [2, 7].

Transformer-based neural architectures for natural language understanding [8], retrieval-augmented generation (RAG) for knowledge-grounded response synthesis [4], finite-state dialogue management for business logic enforcement [5], transfer learning with domain-specific fine-tuning for Kazakh and Russian banking vocabulary [6], and microservices-based deployment for scalability and fault isolation [9] are the methodological pillars upon which the system design is built.

Results and discussion

As shown in Figure 1, the suggested system design is made up of six functional elements linked in a processing pipeline. The ASR module receives incoming client calls and converts spoken input into text. The NLU module receives the transcription and uses it for entity extraction and intent categorization. The Dialer component routes the conversation to either the automated Robot handler or a human agent (for complex or escalated instances) based on identified intent and dialogue context. The NLG component is called by the Robot module to create a response, which the TTS module subsequently translates into voice and delivers to the caller. In order to obtain or update account information in real time, the robot interacts with essential banking APIs.

Figure 1. Proposed architecture of the intelligent bilingual banking contact center robot

The suggested design uses Whisper [3] as the basis model for the ASR component, which was refined using domain-specific speech data in Russian and Kazakh. While Whisper's multilingual pretraining offers a solid basis for accurate transcription, fine-tuning is necessary to handle proper nouns, banking-specific language, and the acoustic features of telephone-quality audio. Compared to Russian, Kazakh ASR has fewer training corpora available, which makes data augmentation methods like dialect labeling and background noise injection necessary to increase robustness.

A multilingual BERT-based model optimized for named entity recognition (NER) and intent classification is used in the NLU layer. Because of its pretraining tailored to Kazakh, KazBERT [6] is the model of choice for the Kazakh channel. XLM-RoBERTa [7] is used for the Russian channel and as a multilingual backup. Sequence-level NER for entity types such as account identities, monetary amounts, dates, and product names is handled via a BiLSTM-CRF architecture. Morphological normalization is a crucial preprocessing step for Kazakh. The language's agglutinative structure generates numerous inflected surface forms from a single root, which are normalized to base representations prior to classification to enhance generalization on sparse training data.

A retrieval-augmented generation (RAG) component and a finite-state machine (FSM) are combined in the dialogue management layer. The FSM maintains security restrictions and business standards, such as requiring verified sessions before disclosing account information, verifying transaction limits, and escalating to human agents when confidence scores drop below a certain point. In order to ground generated responses in verified information and lower the danger of hallucinations, the RAG component collects pertinent passages from a curated banking knowledge source for open-domain product and regulatory queries [4].

For speech synthesis, the TTS module combines WaveGlow and FastSpeech2 [3]. An established ecosystem of superior pretrained voice models is beneficial to Russian synthesis. The scarcity of studio-quality Kazakh voice recordings further restricts Kazakh synthesis, necessitating specialized voice data collection activities. The module allows you to change the speech cadence according to the difficulty of the message and swap between male and female voice profiles.

Table 1.

Comparison of NLP tools and models for the proposed bilingual pipeline

Tool / Model	Pipeline Role	Availability	Notes
Whisper (OpenAI)	ASR — speech to text	Open-source	Requires fine-tuning on Kazakh speech data
XLM-RoBERTa	NLU — multilingual fallback	Open-source	Strong cross-lingual transfer; lower Kazakh-specific accuracy
KazBERT	NLU — intent & NER	Open-source	Trained on Kazakh corpora; preferred for Kazakh channel
FastSpeech2 + WaveGlow	TTS — speech synthesis	Open-source	Russian: mature; Kazakh: limited voice data available
BiLSTM-CRF	NER — entity extraction	Open-source	Sequence labeling; effective with domain-specific training data
Morphological Analyzer	Preprocessing — normalization	Research impl.	Essential for Kazakh agglutinative morphology handling

The shortcomings found in other methods are addressed by the suggested hybrid FSM-RAG design. Pure rule-based algorithms do not generalize across the lexical richness of spontaneous consumer speech, but they do achieve excellent precision on specific, predetermined intents [2]. In a financial setting where inaccurate account information or improper regulatory advise could directly hurt customers, pure LLM-based techniques generate fluent responses but introduce hallucinatory risks that are intolerable [4]. The hybrid design maintains the flexibility required for organic, open-ended communication while limiting the system to validated business logic.

The suggested component selections for Russian make use of a well-established NLP environment and are anticipated to produce performance that is close to production standard. Data shortage will limit Kazakh's performance at every pipeline level. The main method for incremental improvement after deployment is found to be active learning, which uses system interaction logs to highlight high-uncertainty scenarios for expert annotation [10].

Conclusion

A modular architecture for an intelligent multilingual conversational robot intended for banking contact center operations in Kazakhstan has been proposed in this research. The system combines transformer-based NLU (KazBERT, XLM-RoBERTa, BiLSTM-CRF), hybrid FSM-RAG dialogue management, real-time banking API connectivity, multilingual ASR (Whisper), and TTS synthesis (FastSpeech2 + WaveGlow) into a single pipeline that can process standard customer inquiries in both Kazakh and Russian without the need for human intervention.

The main contribution is a useful design framework that, within the security and dependability constraints of a production banking environment, specifically takes into account the linguistic features of Kazakh, such as agglutinative morphology, limited annotated corpora, and constrained TTS voice resources. Because of the architecture's modular design, phased deployment is possible, beginning with coverage in Russian and gradually expanding to Kazakh as training data and model maturity increase.

Future research will concentrate on empirical evaluation utilizing actual interaction data, including end-to-end customer satisfaction and first-contact resolution rates in a structured pilot, as well as monitoring ASR word mistake rates, NLU intent classification accuracy, and NER F1-scores for both languages. A later publication that reports measured system performance and improved deployment recommendations will be based on these findings.

References:

Kaspi.kz Annual Report 2023. // Almaty: JSC Kaspi Bank. – 2024. – 112 p.
Adamopoulou E., Moussiades L. Chatbots: History, technology, and applications // Machine Learning with Applications. – 2020. – Vol. 2. – P. 100006.
Chorowski J., Bahdanau D., Serdyuk D., Cho K., Bengio Y. Attention-based models for speech recognition // Advances in Neural Information Processing Systems. – 2015. – Vol. 28.
Lewis P., Perez E., Piktus A. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks // Advances in Neural Information Processing Systems. — 2020. – Vol. 33. – P. 9459–9474.
Malik M., Malik M. K., Mehmood K., Makhdoom I. Automatic speech recognition: a survey // Multimedia Tools and Applications. – 2021. – Vol. 80. – P. 9411–9457.
Yeshpanov R., Khassanov Y., Varol H. A. KazNERD: Kazakh named entity recognition dataset // arXiv preprint arXiv:2111.13419. – 2022.
Conneau A., Khandelwal K., Goyal N. et al. Unsupervised cross-lingual representation learning at scale // Proceedings of ACL. – 2020. – P. 8440–8451.
Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding // Proceedings of NAACL-HLT. – 2019. – P. 4171–4186.
Kaur R., Sandhu R. S., Gera A., Kaur T., Gera P. Intelligent voice bots for digital banking // Smart Innovation, Systems and Technologies. – 2020. – Vol. 141.
Alharbi S., Alrazgan M., Alrashed A. et al. Automatic speech recognition: Systematic literature review // IEEE Access. – 2021. – Vol. 9. – P. 131858–131876.