IDENTIFY KEY AREAS FOR IMPROVING THE REPLICATION MECHANISM IN ISLAMIC INSTITUTIONS

ОПРЕДЕЛЕНИЕ КЛЮЧЕВОЙ ОБЛАСТИ ДЛЯ УЛУЧШЕНИЯ МЕХАНИЗМА РЕПЛИКАЦИИ В ИСЛАМСКИХ ИНСТИТУТАХ
Dadamuxamedov A.
Цитировать:
Dadamuxamedov A. IDENTIFY KEY AREAS FOR IMPROVING THE REPLICATION MECHANISM IN ISLAMIC INSTITUTIONS // Universum: технические науки : электрон. научн. журн. 2022. 7(100). URL: https://7universum.com/ru/tech/archive/item/14082 (дата обращения: 20.04.2024).
Прочитать статью:

 

ABSTRACT

In this article, we will compare the replication methods available in database systems. These problems lie in maintaining consistency between the actual real-time state of the environmental object and its images reflected in copies distributed across multiple nodes. Currently, modern applications of devices connected to the Internet experience rapid growth and variability of transactional workloads. Database replication should increase database access to calculate efficiency. The replication algorithm allows for rapid propagation of changes to the database to all replicas, ensuring that all replication is reliable. However, the fragmented routing algorithm is used to continuously balance the load of incoming transactions on existing instances. Shows how it can perform near-linear workload measurements for databases. To extend the idea of large-scale database modeling, we will consider improving data consistency and scalability with the algorithm that is applied and available in the database. Separate levels of iteration to prevent over-use of resources that together help solve the scalability problem for distributed real-time database systems.

АННОТАЦИЯ

В этой статье мы сравним методы репликации, доступные в системах баз данных. Эти проблемы заключаются в поддержании согласованности между реальным состоянием реального времени объекта внешней среды и его образами, отраженными в копиях, распределенных по множеству узлов. Современные приложения устройств, подключенных к Интернету испытывают быстрый рост и изменчивость транзакционных рабочих нагрузок. Репликация базы данных должна увеличить доступ к базам данных для расчета эффективности. Алгоритм репликации позволяет с высокой скоростью распространять изменения в базе данных на все реплики, что обеспечивает надежность всех репликаций. Однако алгоритм фрагментированной маршрутизации используется для постоянной балансировки нагрузки входящих транзакций на существующие экземпляры. Показывает как он может выполнять почти линейные измерения рабочей нагрузки для баз данных. Чтобы расширить идею крупномасштабного моделирования базы данных мы рассмотрим улучшение согласованности и масштабируемости данных с помощью алгоритма который применяется и доступен в базе данных. Отдельные уровни итерации для предотвращения чрезмерного использования ресурсов которые вместе помогают решить проблему масштабируемости для распределенных систем баз данных реального времени.

 

Keywords: Replicated database, Replicated database design, Replicated database protocols, Transactional replication, Data consistency and scalability, Active and passive replication, recognition, method.

Ключевые слова: реплицированная база данных, дизайн реплицированной базы данных, протоколы реплицированной базы данных, репликация транзакций, согласованность и масштабируемость данных, активная и пассивная репликация, распознавание, метод.

 

Introduction

Replication is one of the database scaling techniques. This technique consists in the fact that data from one database server is constantly copied (replicated) to one or more others (called replicas). For the application, it becomes possible to use not one server to process all requests, but several. Thus, it becomes possible to distribute the load from one server to several.

The process of data replication in Islamic institutions is understood to be the storage and processing of many complex computational data in a distributed manner. As a result the reliability and performance of the replication mechanism are factors that affect the efficiency of the entire system. The data replication mechanism must meet a number of requirements to ensure the proper functioning of the system.

High efficiency can be achieved by creating a centralized and distributed data storage and processing system for government and religious institutions, while centralizing and virtualizing the organization's IT infrastructure. In this way, the overall hardware resources of the organization can be used for a variety of tasks.  Statistics show that distributed servers perform at the level of similar hardware platforms [1].

Development of the application flow management infrastructure program architecture of the replication mechanism determines the nature of the interaction of program components in the process of performing information exchange with a single system node. This article will examine the functioning of a system with a similar architecture in the context of parallel server operation with many client applications. Problems arising in this case will also be clarified and ways to solve them will be proposed.

As you know, the software components of the level of applied logic and the level of access to data implement the functions necessary for the operation of many applications that make up the system.  Therefore, in the process of simultaneous operation of many client applications on the server, many copies of software components are created.  A similar statement can be attributed to the simultaneous operation of applications using the functions of a replication mechanism.  Multiple activation of instances of application level components entails multiple activation of instances of data access level components. Thus, the absence of restrictions on the number of activated components leads to an uncontrolled increase in the number of active connections with the database.

Methods

Religious educational institutions in Uzbekistan and the International Islamic Academy of Uzbekistan have introduced centralized server systems for digitization. There were major problems with server interactions and process checks to ensure timely delivery of data. In other words, the processing of data by the server over time has slowed down (Figure 1).

 

Regional Representation

1.                  

Tashkent city

2.                  

Representation of Andijan region

3.                  

Representation of Bukhara region

4.                  

Representation of Jizzakh region

5.                  

Representation of Kashkadarya region

6.                  

Representation of Navoi region

7.                  

Representation of Namangan region

8.                  

Representation of Samarkand region

9.                  

Representation of Surkhandarya region

10.             

Representation of Fergana region

11.             

Representation of Khorezm region

12.             

Representation of the Republic of Karakalpakstan

Figure 1. General location of servers across regions

 

1. Data storage must ensure that information is reliable and non-contradictory in a distribution environment. Fulfillment of this requirement implies the simultaneous solution of several tasks [2].

First of all, there should be no errors in the transmission of information from the supplier to the consumer.

Second, any new information entered into the supplier's node's data warehouse must be delivered to each customer's node in a timely manner.  The latter condition imposes a strict requirement on the uniformity of information analysis in the process of forming a collection of data that prevents the possibility of information passing through certain segments [3].

2. The replication mechanism must be consistent with a high level of reliability and resistance to interruption. This requirement implies that:

First of all, the probability of errors in the work process must be within the allowable limits.

Second, the replication mechanism must be designed correctly in the event of an error in operation. An emergency shutdown operation must be initiated on the server. All incomplete transactions must be properly stopped.  In this case, the client application receives a notification about the accidental termination of the replication operation. Abnormalities and errors that occur during the replication process must be prevented from adversely affecting the performance of the entire system.  In the event of an emergency shutdown, the process must be resumed from the point of interruption. If for some reason it is not possible to restore the work from the point of interruption. After the interruption the application will be launched in the emergency operation of the restoration of the ability to work [4].

3. The replication process must comply with operational requirements. Operational efficiency is one of the most important factors to consider when evaluating the quality of many software products. Time is the most valuable resource in solving any type of practical task. The speed of the replication mechanism plays a very important role. The speed of replication depends to a large extent on the timeliness of updating the information in the consumer node.

4. The software architecture of the replication mechanism should be designed with the scalability of the system in mind. When developing a replication mechanism designed to operate within a distributed computing environment, it is necessary to take into account the possibility of changing the parameters of the system, mainly by increasing its scale. The main condition for successful scaling of a distributed system is the ease of adaptation of its components to new operating conditions by making minor changes to the configuration of distributed applications [5].

In the case of developing a mechanism, it must be taken into account that within the framework of scaling, support for working with various systems used to organize local data storages may be required. This circumstance should be taken into account when developing data replication mechanisms in distributed systems, especially at the stage of determining the solution architecture and choosing a software implementation technology [6].

An analysis of the above requirements allows us to identify the following indicators of the efficiency of the replication mechanism:

• time spent on synchronization of the contents of information storages (T);

• the amount of data transmitted over the network during replication (V);

• the amount of disk space required for the accumulation of data to be replicated (S);

• fault tolerance (R) (expressed as the time required to recover from failure);

• compatibility with data storage systems used for organizing local storage (C) (expressed as the ratio of the number of supported systems to the total number of systems used for local data storage on network nodes);

• stability of operation in multi-user mode (F) (expressed as the maximum number of client applications running in parallel, at which the required level of performance is maintained).

Let us denote the set of parameters describing the internal structure of the replication mechanism by the vector p. Since an increase in the efficiency of replication corresponds to a decrease in the indicators T, V, S and R and an increase in the indicators C and F, the task of increasing the efficiency of the replication mechanism is reduced to finding such a value of the vector p at which the function E(p) will tend to a minimum:

Compliance with the indicated requirements is the key to the successful development of an improved replication mechanism. The choice of strategy for performing replication should also take into account these requirements [7].

1. Adherence to the established requirements is a guarantee of successful operation of the improved replication mechanism. The choice of replication strategy is also made taking into account these requirements.

2. The characteristics of replication are mainly determined by five properties:

  • method
  • type
  • direction
  • time of execution
  • method of analysis

With these five key features, we will consider the relationship of requirements to the mechanism of replication.

3. In terms of the replication method in the developed mechanism, application-level replication is the most effective. The choice in favor of this method is based on the requirements of reliability and speed of operation. The inadmissibility of mass replication in the context of the task at hand is explained, firstly, by the need to minimize the amount of data transmitted across the network, and secondly, by the need to establish reliable interactions of heterogeneous data sources. As mentioned above, there are currently no universal software tools for replicating between different databases. Developing application-level replication allows you to implement only the data exchange scheme required for the application [8].  

4. Simultaneous replication is preferred in terms of high reliability. As mentioned above, the synchronous type of replication allows you to increase the reliability of the data transmission process to the consumer by increasing network traffic between remote software components [9]. In distribution environments based on low-bandwidth channels, it is recommended to minimize the number of reverse calls between remote components, and ensuring the reliability of the replication mechanism is a more important factor, therefore a small increase in network load is allowed. The asynchronous type of replication aimed at minimizing traffic does not provide the required level of reliability, as the server does not receive a notification from the client about the results of storage of data transmitted to the receiving repository [8]. In addition, the development of a synchronous replication mechanism is facilitated by the use of distribution transaction management tools (Figure 2).

 

Figure 2. Server replication

 

5.   In order to solve the task of replication in terms of direction, it is necessary to perform one-way replication. When the conditions of the task require the organization of two-way data transmission, a detailed analysis reveals that this can be solved by developing two opposing one-way streams of replication. In the example discussed above, the payment operator's automated workstation requires, first, to download the reference information from the server and, second, to download the payment information from the server. The program does not involve the reloading of references from the workstation to the server, nor does it involve the transfer of previously received payments from the server to the workstation. In this case, there is a one-way replication. Another proof of the superiority of one-way replication is the need for prompt action [10].

6. Delayed replication can be used for this task in terms of execution time, which is performed after a certain time interval or during the minimum load on the network. As a result, data users will be able to update the information over time after the last replication session. Delayed replication can be a client module initiative, and the application may need to receive updated information during operation. The choice in favor of delayed replication is determined by the terms of the task, which requires minimizing the use of the network for data transmission. For real-time replication, the server would need to develop a mechanism to notify all customers of changes in the database information. This puts additional load on the system server's computing resources and network. The latter contradicts the requirements for operational and enlargement. As long as the task conditions do not require real-time replication, the implementation of such a mechanism is considered average and shows the advantage of using delayed replication [11].

7.  In terms of the method of information analysis, current situation replication is used in the development of a reproducible data set in the development of an improved replication mechanism. This means that the server contains information that allows the identification of small sets of data that are sent to the current user at the start of the replication operation in the database. The most up-to-date and reliable information about this can be obtained by comparing the data in the warehouses of suppliers and consumers. However, such an approach is not acceptable in the context of the task at hand. This type of time-consuming comparison is, in practice, equivalent to loading the full amount of information. In the context of the task, replicating the current situation is the most effective solution. To do this, the information replicated in the supplier's database must have special data structures that store the relevant information. The use of additional information in the process of defining the boundaries of reproducible data sets requires the creation of a mechanism that meets the requirements for reliability, speed, and scalability [12].

The ratio of the requirements of the replication to the basic features of the selected replication strategies is given in Table 1.

Table 1.

Replication strategies

 

Activation

Reliability

Quick activity

Scalability

Method: application-level replication

+

+

+

 

Type: synchronous replication

 

+

 

 

Direction: One-way replication

 

 

+

 

Execution time: delayed replication

 

 

+

+

Method of analysis: replication of the current situation

+

+

+

+

 

Results of experimental research. A properly functioning algorithm is required to determine the amount of data that needs to be transferred between two nodes for the replication system to work properly. This algorithm is one of the main features of replication systems, and the speed and reliability of the replication system directly depends on which algorithm is selected and how it is implemented [13].  A distribution program was developed in the Paython programming environment to solve practical problems using the algorithms described above. In order to test the functionality of the program, the simulation status of the capabilities of 11 servers was checked.

The capabilities of the selected 11 servers made it possible to efficiently distribute and optimize the server based on these icons, and to distribute the data across servers in a short period of time [14].

Problem solving methods. The solution involves evaluating the speed of operation using exponential laws and fragmentation algorithms. The total load on 11 servers is 361. We have optimized the overall state of all servers, that is the incoming load (Figure 3).

 

Figure 3. The overall load on the servers

 

Dispatcher channel functions are limited to queuing requests to the appropriate service item. The time of the request in the multi-channel Public service system is determined by the sum of dispatching and service time:

A formula for calculating the dispatching time of an application, the residence of the application in a dispatching element.

Where  is the average number of requests waiting in the queue - the service flow rate of the multichannel element. After dispatching, the average order time in the system is calculated by the following formula:

[

Where pi is the probability that the i-element of the public service system will be serviced after dispatching. Based on the algorithm given above   the overall distribution load was calculated.

The last expression allows you to calculate the average residence time of the application in the system. It should be considered optimal to consider such a set of values ​ ​ of configuration parameters in which the expression tends to a minimum.

The constant parameters of the model are: Intensity of orders; the intensity of the service flow for a particular channel of the dispatcher; the possibility of redirecting the order to this or that element of the service; the intensity of the service flow for the channel of service elements; the amount of service items [15].

Based on the data on the average duration of read and write operations, the number of service channels in Server1 is set to five, and in Server2 to two. The parameter values for each experiment are given in Table 2.

Table 2.

The parameter values for each experiment

№ Experience

1

2

Input current intensity     

1/0.28

1/0.28

For a dispatcher, the service flow is a single-channel intensity

 

1/0.4

1/0.4

Number of dispatcher channels  

2

2

Flow intensity ratio for dispatcher

0.71

0.71

Dispatcher queue size

30

30

The service flow for Server 1 is a single-channel intensity 

1/2.5

1/2.5

Server 1 number of channels

5

5

Flow rate ratio for Server 1

0.80

0.80

Queue size for server 1

50

50

The service flow for Server 2 is a single-channel intensity  

1/0.95

1/0.95

Number of server 2 channels

2

3

Flow intensity ratio for Server 2 

0.93

0.62

Queue size for server 2

50

50

 

The results of the experiment were as follows:

1) Time of application stay in the system:  

2) Number of applications that left the system without service

3) Number of requests in the dispatcher queue:  

4) Number of requests in Server 1 queue:   

5) Number of requests in Server 2 queue:    

6) time of request stay in dispatcher queue:   

7) Request time in Server 1 queue:  

8)  Request time in Server 2 queue:  

9)  Average number of dispatcher channels involved

10) Average number of Server 1 channels involved 

11)  Average number of Server 2 channels involved

Based on the analysis of the results obtained, it can be argued that for the given parameters, the system works stably, the percentage of applications leaving the system without service is quite small. At the same time, the average duration of an application's stay in the system exceeds 6 seconds. For interactive applications, this response time value is invalid [16].  The data on the duration of the ticket's stay in each of the service elements indicates the fact that the tickets are the longest in Server2.

One possible way to increase the intensity of the aggregate service flow for Server2 is to increase the number of service channels. During experiment No. 2, the number of Server2 channels was increased from 2 to 3 (see Table 2). As a result of the experiment, the following results were obtained:

1) Time of application stay in the system    

2) Number of applications that left the system without service;

3) Number of requests in the dispatcher queue:, ;

4) Number of requests in Server 1 queue: ;

5) Number of requests in Server 2 queue:  ;

6) Time of request stay in dispatcher queue:  ,     ;

7) Request time in Server 1 queue:  ;

8) Request time in Server 2 queue:

9) Average number of dispatcher channels involved:

10) Average number of Server 1 channels involved: 

11)  Average number of channels used Server 2:   

The average number of Server 2 channels involved has remained largely unchanged.  This is because the flow rate of applications to Server 2 in both experiments was the same, and therefore the length of time that the service element Server 2 operated in the incomplete download mode remained approximately the same [17].

Results

The results of the experimental study of the work of the designed system allow us to draw the following conclusions. The p-value is critical to the robustness of the functioning of any Poisson mass service system. As p approaches one, there is an increase in the number of applications in the queue, and as a result, an increase in the time of the application stay in the system. The increase in the number of channels in the multi-channel service device allows in some cases to reduce the p-value of this device so that this radically changes the dynamics of the system operation. The calculated values of parameters p for each element of the designed system can be used as a criterion for assessing the stability of the system.  In case of unsatisfactory performance of the system, it is necessary to determine the system element having the maximum value of the parameter p, and to reconfigure it.

Due to the large number of configuration parameters that can influence the p-value of each system element, as well as the performance and stability of the system as a whole, the designed system has a high degree of adaptability to different environmental conditions. Thus, it can be stated that the approach to the organization of the Mass Service System is a universal and effective way to build infrastructure in Islamic institutions.

Conclusion

Disclosed is a method of ordering access to computing resources of a kernel of a distributed system. The principles of organizing the core infrastructure of a distributed system in the form of a mass service system are described. The basic principles of the organization of the infrastructure of the application level of the system are formulated on the basis of role separation of components and the formation of component packages and server applications. A study of the features of the functioning of the proposed infrastructure was carried out by comparing it with a mathematical model of a mass service system. The main parameters of the mathematical model are determined, which decisively affect the stability of the system.  A theoretical justification for the influence of system parameters on the stability of the functioning process is given. The above mentioned main directions of system parameter values selection were checked. The versatility and adaptability of the proposed model for building the system infrastructure has been confirmed.

 

References:

  1. Таненбаум Э., Ван-Стеен М. Распределенные системы. Принципы и па-радигмы.Спб.: Питер, 2003.877с.
  2. Georgiou, M., Panayiotou, M., Odysseos, L., Paphitis, A., Sirivianos, M., & Herodotou, H. (2021). Attaining Workload Scalability and Strong Consistency for Replicated Databases with Hihooi. Proceedings of the 2021 International Conference on Management of Data.
  3. Белоусов, В. Е. (2005). Алгоритмы репликации данных в распределенных системах обработки информации (Doctoral dissertation, –Пенза: ПГУ, 2005.  –184 с
  4. Нишонбоев Т. Дастурий конфигурацияланган тармоқлар”. Ўқув қўлланма. (Грифли).  Муҳаммад ал-Хоразмий номидаги ТАТУ типографияси. 2017 (186 б.)
  5. Nishanbayev, T.N., Abdullayev, M.M., Maxmudov, S.O. The model of forming the structure of the 'cloud' data center. International Conference on Information Science and Communications Technologies: Applications, Trends and Opportunities, ICISCT 2019 
  6. Белоусов В. Е. Алгоритмы репликации данных в распределенных системах обработки информации. Инт мат: http://diss.rsl.ru/diss/05/0591/050591031 pdf.
  7. Белоусов В.Е. Автоматизация процесса репликации между базами данных, имеющими сходную логическую структуру. // Системный анализ, управление и обработка информации: науч.-тех. сборник статей: 2005 год, вып. № 1. Пенза, ПТУ, 2005. - с. 31-35.
  8. Белоусов В.Е. Особенности построения системы массового обслуживания в рамках среднего уровня распределенной системы обработки информации» // Системный анализ, управление и обработка информации: науч.-тех, сборник статей: 2005 год, вып. № 1, - Пенза, ПГУ, 2005,-с. 23-31
  9. Башарин Г.П, и др. Анализ очередей в вычислительных сетях: теория и методы расчета / Г.П.Башарин, П.П.Бочаров, Я.Н.Коган. - М.: Наука, 1989. -336 с.
  10. Irgashevich, D. A. (2019). Development of national network and corporate networks (in the case of Tas-IX network). International Journal of Human Computing Studies, 1(1), 1-5.
  11. Dadamuhamedov, I. A. (2020). Cloud technologies in islamic education institutions. The Light of Islam, 2 (23).
  12. Dadamuxamedov, A., Mavlyuda, X., & Turdali, J. (2020). Cloud technologies in islamic education institutions. ACADEMICIA: An International Multidisciplinary Research Journal, 10(8), 542-557.
  13. Феллер В, Введение в теорию вероятностен и се приложения. Том I. -М.: Мир, 1967. - 498 с.
  14. Оберг Р.Дж. Технология СОМ+. Основы и программирование.: Пер. с англ.: Уч. пос. - М.: Вильяме, 2000, - 480 с: ил.
  15. Санблэд С, Санблед П. Разработка масштабируемых приложений для Microsoft Windows. Мастер-класс. (Пер. с англ.) - М.: Русская редакция, 2002. -416 с: ил.
  16. Kumar, Sanjay & Sharma, Kunal & Swaroop, Vishnu. Issues in Replicated data for Distributed Real-Time Database Systems. (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (4), 2011, 1364-1371
  17. https://hevodata.com/learn/data-replication-in-distributed-system/
Информация об авторах

Senior Lecturer at the Department of “Modern information and communication technologies” of International Islamic Academy of Uzbekistan, Republic of Uzbekistan, Tashkent

ст. преподаватель, Международная Исламская Академия Узбекистана кафедра «Современных информационно-коммуникационных технологий», Республика Узбекистан, г. Ташкент

Журнал зарегистрирован Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор), регистрационный номер ЭЛ №ФС77-54434 от 17.06.2013
Учредитель журнала - ООО «МЦНО»
Главный редактор - Ахметов Сайранбек Махсутович.
Top