PhD student, Tashkent State Technical University named after Islam Karimov, Uzbekistan, Tashkent
HYBRID ARTIFICIAL NEURAL NETWORK MODEL FOR PREDICTING ETHYL ACETATE PURITY IN A REACTIVE–DISTILLATION SYSTEM
ABSTRACT
This paper presents a hybrid machine-learning approach for modeling and predicting the ethyl acetate purity in a Reactive–Distillation process. The process combines esterification and distillation operations, resulting in strong nonlinear interactions between feed composition, operating conditions, and separation efficiency. To address this complexity, a multilayer perceptron (MLP) neural network was developed using polynomially expanded input features derived from seven key process variables. A dataset of 120 steady-state operating points was used to train and test the model. The neural network, trained with the Adam optimizer and mean squared error loss, achieved high prediction accuracy with an R² of 0.926, RMSE of 0.367, and MAE of 0.296. The actual versus predicted results confirmed a strong correlation between model outputs and experimental data. Sensitivity analysis further revealed that the feed molar ratio and reaction-zone stages are the most influential factors affecting product purity. The results demonstrate that the proposed POLY+ANN framework effectively captures nonlinear relationships in complex chemical processes, offering a reliable foundation for intelligent control and optimization of reaction–distillation systems.
АННОТАЦИЯ
В данной статье представлен гибридный подход машинного обучения для моделирования и прогнозирования чистоты этилацетата в процессе реактивной дистилляции. Процесс сочетает операции этерификации и дистилляции, что приводит к сильному нелинейному взаимодействию между составом сырья, рабочими условиями и эффективностью разделения. Для решения этой сложной задачи была разработана нейронная сеть многослойного персептрона (MLP) с использованием полиномиально расширенных входных признаков, полученных из семи ключевых переменных процесса. Для обучения и тестирования модели использовался набор данных из 120 стационарных рабочих точек. Нейронная сеть, обученная с помощью оптимизатора Adam и среднеквадратичной ошибки потерь, достигла высокой точности прогнозирования с R² 0,926, RMSE 0,367 и MAE 0,296. Сравнение фактических и прогнозируемых результатов подтвердило сильную корреляцию между выходными данными модели и экспериментальными данными. Анализ чувствительности также показал, что молярное соотношение сырья и стадии зоны реакции являются наиболее влиятельными факторами, влияющими на чистоту продукта. Результаты показывают, что предлагаемая структура POLY+ANN эффективно фиксирует нелинейные взаимосвязи в сложных химических процессах, предлагая надежную основу для интеллектуального управления и оптимизации систем реакции-дистилляции.
Keywords: esterification, distillation, artificial neural networks, ethanol, acetic acid, the technological scheme of ethyl acetate production.
Ключевые слова: этерификация, дистилляция, искусственные нейронные сети, этанол, уксусная кислота, технологическая схема производства этилацетата.
Introduction
Industrial wastewater and solvent purification processes play a critical role in sustainable chemical manufacturing, where energy efficiency, product quality, and environmental impact must be simultaneously optimized. Among various chemical products, ethyl acetate is one of the most widely used solvents in coatings, pharmaceuticals, and chemical synthesis. Its production and purification stages require precise control of operating parameters, such as feed composition, column temperature, pressure, reflux ratio, and reaction zone configuration to ensure high purity and yield. Even small deviations in these parameters can lead to significant quality losses, increased energy consumption, or off-spec products that compromise process efficiency and economic performance [1-2].
Traditional control strategies in separation and purification systems rely mainly on empirical operator experience or linear regulatory algorithms, which are often inadequate for highly nonlinear and coupled processes. In ethyl acetate purification, multiple process variables interact simultaneously; changes in the ethanol acetic acid molar ratio or reflux policy, for example, alter mass-transfer efficiency and equilibrium composition across the column. These interdependencies make it difficult to maintain product purity using fixed set points or PID-based control alone. Moreover, direct measurement of outlet purity is typically performed offline in laboratories, introducing delays that prevent real-time optimization and feedback correction [3].
To overcome these limitations, modern process industries increasingly adopt intelligent and data-driven modeling approaches. Instead of relying solely on detailed first-principles equations, data-driven methods such as artificial neural networks (ANNs) can learn the intrinsic nonlinear relationships between operating parameters and process outcomes directly from experimental or historical plant data. Such models act as soft sensors, they can estimate key unmeasured variables (e.g., product purity) from routinely available inputs in real time. This enables faster decision-making, improved process stability, and predictive quality control without additional instrumentation costs. In this context, the present study focuses on developing a neural-network-based predictive model to estimate the final ethyl acetate purity (%) using seven key technological parameters: feed molar ratio of ethanol to acetic acid, number of reaction stages, total trays, column hold-up volume, operating pressure, bottom temperature, and reflux ratio. These parameters capture the chemical, hydraulic, and thermal dynamics governing the separation process. The model aims to predict purity accurately under varying operating conditions, providing a foundation for future intelligent control and optimization strategies in industrial purification systems [4].
The motivation for choosing an ANN approach arises from the complex nonlinear behavior of the system, which cannot be captured effectively by linear regression or static correlations. Neural networks can approximate arbitrary nonlinear mappings between inputs and outputs, adapt to multidimensional process data, and generalize well within trained operating domains. Furthermore, integrating such predictive models into a supervisory control system or digital twin framework can support real-time monitoring, anomaly detection, and adaptive optimization of operating parameters. Thus, this research contributes to the development of a robust, interpretable, and real-time predictive framework for ethyl acetate purification. By linking routinely measured process variables to the final product purity, the model not only improves process transparency but also serves as a key component toward intelligent control of chemical processes ensuring consistent quality, energy savings, and compliance with sustainable production goals [5].
Methodology
Process description and data preparation
Figure 1 shows the technological scheme of ethyl acetate production based on a reaction rectification sequence. In the first column, ethanol and acetic acid are brought into contact under conditions that allow esterification to proceed while, at the same time, part of the light components is separated.
/Maksudov.files/image001.jpg)
Figure 1. The process of ethyl acetate production by the reaction rectification method
The top stream (Distillate 1) is pumped to the second column, where additional rectification is carried out to obtain the final ethyl acetate product, while heavier or unreacted components are returned to the system as recycle. Such a configuration is typical for solvent production units because it combines reaction, separation, and internal recycle in one integrated flowsheet. A consequence of this integration is that the purity of the final product is not controlled by a single variable it depends simultaneously on feed composition, column hydrodynamics, internal reflux, pressure, and thermal regime.
To capture this multivariable dependence, seven process variables that can be measured or calculated online were selected as model inputs. They jointly describe the composition side (feed molar ratio), the reactive/separation structure (number of reaction stages and total trays), the holdup and residence characteristics (liquid hold-up volume), and the operating regime (column pressure, bottom temperature, reflux ratio). The single model output is the final ethyl acetate purity (%) at the outlet of the second column.
A data set of 120 steady-state operating points was compiled to represent feasible and technologically meaningful modes of the process shown in Figure 1. Each record in the data set corresponds to one mode of operation and contains the seven input variables from Table 1 and the corresponding measured (or calculated) outlet purity. The data were randomized and split into two parts: 80% for model training and 20% for model testing. Using a fixed random seed ensured that the split can be reproduced and that the reported procedure is deterministic.
Table 1.
Model variables
|
Symbol / Name |
Description |
Type |
Units |
|
Feed_Molar_Ratio_EtOH_AcOH |
Ethanol/acetic acid molar ratio in the feed stream |
Input |
mol/mol |
|
Rxn_Zone_Stages |
Number of active stages in the reaction zone |
Input |
– |
|
Total_Trays |
Total number of trays (theoretical stages) in the column |
Input |
– |
|
HoldUp_Volume_L |
Liquid hold-up volume of the system |
Input |
L |
|
Column_Pressure_bar |
Operating pressure of the column |
Input |
bar |
|
Bottom_Temperature_C |
Temperature in the bottom section of the column |
Input |
°C |
|
Reflux_Ratio |
Ratio of returned condensate to distillate |
Input |
– |
|
Ethyl_Acetate_Purity_pct |
Final product purity at the outlet |
Output |
% |
Before training, all variables were standardized using the StandardScaler (zero mean, unit variance). This step is mandatory when working with neural networks, because the original variables are in different units (°C, bar, L, dimensionless ratios) and appear with very different magnitudes. Standardization puts all inputs on the same numerical scale and speeds up convergence. In addition, to make the model capable of learning coupled and nonlinear effects that are typical for integrated reaction distillation systems, the original seven inputs were passed through a second-order polynomial feature expansion.
Neural-network modeling, training and evaluation
After the preprocessing stage, a multilayer perceptron (MLP) neural network was developed to establish a predictive relationship between the selected process parameters and the ethyl acetate purity. The model architecture was implemented in PyTorch and consisted of an input layer corresponding to the number of polynomially expanded features, several fully connected hidden layers equipped with ReLU (Rectified Linear Unit) activation functions, and a single linear output neuron producing the predicted purity value. The ReLU activation ensured nonlinear mapping capability and avoided gradient vanishing, while the linear output layer allowed continuous regression prediction suitable for process modeling.
Model parameters were optimized using the Adam algorithm, which adaptively adjusts the learning rate of each weight during training, ensuring fast and stable convergence even with a limited number of samples. The mean squared error (MSE) was chosen as the loss function to penalize larger prediction errors more strongly and facilitate smooth optimization. The training process was conducted for up to 2000 epochs, with simultaneous monitoring of both training and validation losses. To prevent overfitting on the small dataset, an early-stopping mechanism was applied. When the validation loss stopped improving, the model state corresponding to the minimum validation error was stored as the final configuration. This approach ensured that the model not only performed well on the training data but also generalized effectively to unseen operating conditions.
The predictive performance of the final model was evaluated on the independent test set using three standard regression indicators: the coefficient of determination (R²), the root mean square error (RMSE), and the mean absolute error (MAE). The R² value measured the proportion of the variance in purity explained by the model, RMSE quantified the typical prediction error magnitude in the same units as purity (%), and MAE represented the average deviation between the actual and predicted values. In addition to these metrics, an Actual vs. Predicted plot was generated for all 120 data points to visually assess the model’s accuracy. A narrow scatter of points along the 45° reference line in this plot indicated a strong correlation between the predicted and experimental purities.
Finally, to interpret the trained neural model and assess the influence of each process parameter on the final product quality, a sensitivity analysis was conducted. In this analysis, each input variable was independently perturbed by one standard deviation around its mean while keeping the others constant. The resulting change in predicted purity was calculated to determine the system’s sensitivity to that variable. The sensitivity values were represented in a horizontal bar chart, highlighting which parameters exerted the most significant effect on ethyl acetate purity. This interpretation step links the data-driven neural model with the physical understanding of the process and helps identify which operational variables are most critical for process control and optimization.
Result and discussion.
The developed artificial neural network (ANN) model demonstrated strong predictive capability in estimating the final ethyl acetate purity based on the selected process parameters. The network was trained for 2000 epochs using the Adam optimizer, and the learning dynamics of both training and validation sets are presented in Figure 2. During the initial training iterations, both losses rapidly decreased, indicating efficient convergence and successful weight adaptation. The lowest validation loss was achieved at epoch 54, corresponding to a mean squared error (MSE) value of 0.0670. After this point, the validation curve stabilized, while the training curve continued a slight downward trend, suggesting that the model had reached an optimal bias–variance balance without overfitting. This confirms that the adopted early-stopping criterion effectively captured the best-performing model.
/Maksudov.files/image002.jpg)
Figure 2. Training and validation loss versus epoch
The correlation between actual and predicted ethyl acetate purities is illustrated in Figure 3. The data points closely follow the ideal 45° reference line, reflecting high consistency between experimental and model-predicted values. Quantitatively, the model achieved an R² of 0.926, indicating that over 92% of the variation in product purity was successfully explained by the neural network. The RMSE value of 0.367 and the MAE of 0.296 further confirm the model’s precision, with deviations of less than 0.4% in purity prediction.
/Maksudov.files/image003.jpg)
Figure 3. Actual versus predicted ethyl acetate purity (%)
These results demonstrate that the ANN effectively captured the complex nonlinear dependencies between the feed composition, operating conditions, and column hydrodynamics that determine separation efficiency and product quality. Minor scatter observed around the regression line can be attributed to experimental uncertainty and the limited number of training samples, which slightly restricts model generalization beyond the calibrated operating range.
To interpret the model and identify the most influential variables governing ethyl acetate purity, a sensitivity analysis was performed, as shown in Figure 4. Each process variable was perturbed by one standard deviation around its mean, and the resulting change in predicted purity was recorded. The feed molar ratio of ethanol to acetic acid (EtOH/AcOH) exhibited the highest sensitivity, producing a 0.72% increase in purity for a one-sigma increment. This highlights the crucial role of feed composition control in maintaining reaction completeness and minimizing unreacted components in the distillate. The number of reaction-zone stages (0.49%) and the liquid hold-up volume (0.41%) also had a strong impact, indicating that both reaction contact area and residence time significantly affect esterification performance. Moderate influence was observed for the bottom temperature (0.36%), suggesting its role in driving component vaporization and equilibrium shifts. By contrast, column pressure (0.21%), total trays (0.22%), and reflux ratio (0.15%) contributed less significantly, though they remain important for fine-tuning product purity and energy efficiency.
/Maksudov.files/image004.jpg)
Figure 4. Input sensitivity analysis for ethyl acetate purity
The results confirm that the proposed hybrid approach combining polynomial feature expansion with a neural network model effectively enhances nonlinear representation of the process and provides accurate predictions of product purity. The high R² and low RMSE/MAE values demonstrate that the model is suitable not only for performance estimation but also as a foundation for intelligent process control and optimization in Reactive–Distillation systems.
Conclusion. This study presented the development of a hybrid data-driven model for predicting ethyl acetate purity in a Reactive–Distillation process. By combining polynomial feature expansion with a multilayer perceptron (MLP) neural network, the model successfully captured nonlinear interactions among the key operating variables such as feed molar ratio, reaction-zone configuration, hold-up volume, and column temperature that govern product purity. The proposed ANN achieved excellent predictive performance with R² = 0.926, RMSE = 0.367, and MAE = 0.296, confirming its ability to reproduce process behavior with high accuracy. The learning dynamics revealed rapid and stable convergence, while the sensitivity analysis showed that feed composition and reaction-zone design have the most pronounced effects on purity. These findings provide valuable insight for optimizing the process and identifying critical variables for control system design. The developed model demonstrates that integrating feature engineering with artificial neural networks offers a powerful and generalizable framework for intelligent monitoring and optimization of reaction–distillation systems. Future work will focus on extending this framework to real-time control applications and multi-objective optimization, integrating energy consumption and product selectivity into the model.
References:
- A. Sahu, V. Kumar, and N. Kaistha. Conceptual design and plantwide control of an ethyl acetate process, Chem. Eng. Process. - Process Intensif., vol. 126, pp. 45–61, Apr. 2018, doi: 10.1016/j.cep.2018.02.010.
- Шомуродов А.И., Махсумов А.Г., Исмаилов Б.М. Реакция аминометилирования некоторых пропаргиловых эфиров насыщенных карбоновых кислот с диэтаноламином // Ж. Universum: химия и биология электрон. научн. журн., февраль 2023, часть 1, №2(104) - С.59-65. DOI - 10.32743/UniChem.2023.104.2.14927.
- Z. Felfelian and M. Mahdavi. A new ZrC nano powder solid acid catalyst for the esterification synthesis of ethyl acetate, Catal. Commun., vol. 182, p. 106752, Sep. 2023, doi: 10.1016/j.catcom.2023.106752.
- A. Tiwari, A. Keshav, S. Bhowmick, and O. Sahu. Liquid-liquid Equilibria (LLE) of the quaternary mixture (acetic acid + ethanol + ethyl acetate + water) arising out of esterification reaction: Optimization studies, J. Mol. Liq., vol. 231, pp. 86–93, Apr. 2017, doi: 10.1016/j.molliq.2017.01.092.
- H. Gurav and V. V. Bokade. Synthesis of ethyl acetate by esterification of acetic acid with ethanol over a heteropolyacid on montmorillonite K10, J. Nat. Gas Chem., vol. 19, no. 2, pp. 161–164, Mar. 2010, doi: 10.1016/S1003-9953(09)60048-7.