Master Student of School of Information Technology and Engineering at Turan University, Kazakhstan, Almaty
RECOGNIZING DIFFERENT IMAGES IN PYTHON USING TENSORFLOW AND KERAS
ABSTRACT
This paper explores methodologies for image recognition using deep neural networks, implemented with TensorFlow and Keras — two powerful, open-source libraries widely adopted in the machine learning community. TensorFlow and Keras simplify the deployment of sophisticated machine learning algorithms, making it feasible to design and train neural networks for complex tasks, such as image processing and classification. Key components of the study include model architecture selection, data preprocessing, and training and evaluation strategies, each of which plays a critical role in optimizing performance and accuracy in image classification. The purpose of the study is to provide a practical demonstration of the development stages of these neural network-based solutions, showcasing their capabilities in handling real-world image recognition tasks. The research is valuable for practitioners and researchers interested in applying machine learning for image-based applications, offering insights into the workflow and techniques essential for effective model implementation.
АННОТАЦИЯ
В данной статье рассматриваются методики распознавания изображений с помощью глубоких нейронных сетей, реализованные с помощью TensorFlow и Keras - двух мощных библиотек с открытым исходным кодом, широко распространенных в сообществе специалистов по машинному обучению. TensorFlow и Keras упрощают развертывание сложных алгоритмов машинного обучения, делая возможным разработку и обучение нейронных сетей для сложных задач, таких как обработка и классификация изображений. Ключевые компоненты исследования включают выбор архитектуры модели, предварительную обработку данных, стратегии обучения и оценки, каждая из которых играет важную роль в оптимизации производительности и точности классификации изображений. Цель исследования - практическая демонстрация этапов разработки этих решений на основе нейронных сетей и их возможностей в решении реальных задач распознавания изображений. Исследование представляет ценность для специалистов-практиков и исследователей, заинтересованных в применении машинного обучения для приложений, основанных на распознавании изображений, предлагая понимание рабочего процесса и методов, необходимых для эффективной реализации моделей.
Keywords: Deep Neural Networks, TensorFlow, Keras, Image Recognition, Image Classification, Machine Learning Workflow, Data Preparation, Model Creation, Model Training, Model Evaluation, Convolutional Neural Network (CNN), Activation Function, Dropout Layer, Batch Normalization, Pooling Layer, Flatten Function, Dense Layer, Softmax Activation, Adam Optimizer, Epochs.
Ключевые слова: Глубокие нейронные сети, TensorFlow, Keras, распознавание изображений, классификация изображений, рабочий процесс машинного обучения, подготовка данных, создание модели, обучение модели, оценка модели, конволюционная нейронная сеть (CNN), функция активации, выпадающий слой, пакетная нормализация, пулирующий слой, функция Flatten, плотный слой, активация Softmax, оптимизатор Adam, эпохи.
Introduction
In the era of rapid development of artificial intelligence technologies, image recognition and classification tasks have become an integral part of many fields, and TensorFlow and Keras are powerful tools for solving such tasks, providing flexible, easy-to-use, and high-performance tools for building deep learning models. These libraries are particularly popular among data scientists because they allow building and training neural networks with minimal effort [1].
Definitions
Without knowing the basics of image recognition, it will be difficult to fully understand the main part of the work of this system. Therefore, let's define the terminology before we start studying it.
TensorFlow / Keras
TensorFlow is an open source library developed by Google that allows you to build and train deep neural networks to solve various problems Keras is a high-level API that uses TensorFlow as a backend to implement powerful machine learning models Keras provides usability and ease of use so you can focus on solving the problem.
When you consider Keras, it is a high-level API (Application Programming Interface) that uses functions from TensorFlow (and other ML libraries, Theano is not an example). Keras was designed with this application and modularity as the main guiding principles [2].
From a practical standpoint, Keras can implement many very powerful and very complex TensorFlow functions in the simplest way possible, and it is configured to work with the Python programming language without any customizations or modifications [4].
Machine Learning Workflow
Let's go to the image classifier training example and go back in time to understand the workflow and pipeline of machine learning.
The process of training a neural network model is a fairly standard process and consists of four completely different steps.
A training image classifier and an image can be sent to a CNN that makes predictions about the content of that image [5].
Data Preparation
First, you need to collect your data and format it so that the neural network can learn from it. This usually involves collecting images and tagging them. If you upload a dataset prepared by another user, it still needs to be pre-processed or prepared before it can be used for training.
Data preparation is an interesting process in which missing values, corrupted data, distorted data, wrong labels, etc. can be included in various tasks. can be included in various tasks.
In this work we are using pre-processed dataset.
Model creation
Building a neural network model involves selecting different data and sampling them. First, we need to decide how many layers to use in the model and how large the input and output layers will be, what trigger functions to use and whether to use the Dropout function.
Knowing what parameters to use and what data to use outside of them comes with time, so there are a few basic methods you can use first, and we'll look at some of them in the example we've looked at [6].
Model Training
Once the model is built, all you need to do is build the model and fit it to the training data. When training a model, a big consideration is the amount of time required for training. In this case, you need to specify the training duration of the network by specifying the number of training epochs. Thus, the more you train the model, the more efficient it will be, but if you use many training periods, you may overtrain the model.
Choosing the number of training periods means that you will learn detection over time, so you must keep the weights of the neural network constant between training sessions so that you don't have to start over after you have made some progress in training.
Model Evaluation
There are several steps to evaluate a model. The first step is to compare the model's performance with a test dataset: data on which the model has not been trained. Hence, you can test the model's performance with this new dataset and test its performance with different specifications.
There are several ways to measure the performance of a neural network model, so the most common is accuracy, which is the number of correctly classified images divided by the total number of images in the dataset.
After achieving model accuracy in the test dataset, you must go back and retrain your network using slightly modified parameters that failed to satisfy your network's performance during the first training session. You should then continue to tweak and tune the network parameters, re-train the network, and measure its performance until network accuracy is achieved.
Finally, the performance of the network is tested on a test computer.
Perhaps a different test data set is needed. After all, you have already gotten an idea of the accuracy of the model, which was the purpose of the test suite.
The problem is that when working with a test dataset, all the parameter changes made during network setup are combined with retesting that dataset several times; this may cause the network to learn some characteristics of the ensemble. but it also cannot process out-of-sample data well. Therefore, completely new test data must be fed into the network.
The purpose of the test suite is to check for problems such as overfitting to ensure that the model works in the real world [6].
Data preparation
There is one more data import that needs to be done next: the dataset itself.
Figure 1. Importing cifar10 dataset
Figure 2. The cifar10 dataset
Next, let's load the dataset. To do this simply by specifying which variables to load the data into, and then use the load_data() function:
Figure 3. Loading the data
In many cases, data preprocessing will be necessary to prepare the data for use because we are using a pre-packaged, pre-created data set Such processing results in minimal cost. An example of such actions that need to be performed is normalizing input data.
If the input data values are in too wide a range, network performance may be adversely affected. In our example, the input values are image pixels with a value between 0 and 255.
Finally, divide the image values by 255 to normalize the data. To do this, you must first convert the data to floating point format, since it is currently integers. The next step is to use the Numpy astype() command and then declare the desired data type:
Figure 4. Import needed libraries from Keras
What is required is a step-by-step process of preparing the data for the network and converting it into a single code. The details of unitary coding of images are that the neural network cannot use them as they are; they must be encoded first, but it is better to use unitary coding when performing binary classification.
Binary categorization is successfully used, because an image either belongs to a certain class or it doesn't: it can never be in between. For categorical encoding, we always use the Numpy to_categorical() command. Therefore, we import the Keras function np_utils because it contains to_categorical().
Next, to figure out how many neurons to compress the last layer, we need to set the number of classes in the dataset:
Figure 5. Test and Train codes
Model Design
During the design phase of a CNN model, the format to be used for the model should always be determined. There are several different formats (plans) for building Keras models, but Sequential is the most widely used. So we bring it from Keras.
Model generation
Figure 6. Sequential model
The initial layer of the created model is a convolution layer. This layer will take the input data and run it through the convolution layer filters.
In Keras when implementing this, the number of filters we are going to use(32), the filter size(3x3 in this case), the input shape(for the first layer), the indents and the activation function should be specified.
As previously mentioned, the most common function is relu, the padding = 'same' is used to define the indents, so we don't resize the images:
Figure 7. Activation function
Now an exclusion layer is created to get rid of the overtraining problem, by which the connectivity problem between the layers is eliminated. This is followed by batch normalization, it normalizes the incoming data that goes to the next layer by determining that the activation function always creates a network with the same distribution. Another convolutional layer follows and the filter size increases as the network is able to learn complex representations. Finally there is a pooling layer which makes the image classifier more correct so that the relevant patterns are properly learned.
Figure 8. Exclusion layer and Batch normalization
These procedures are the basis of the workflow of the implementation part of the CNN: starting from convolution, activation, exclusion and merging.
The basic number of convolution layers can be varied, but this may increase the computational cost. If necessary, you should increase the filters gradually by applying a value of 2 to the power(2^n), this will help when training the model on GPU.
After all the procedures we need to compress the data by messing up the Flatten function and also the Dense function to create the first densely connected layer by specifying the number of neurons:
Figure 9. Flatten function
Next, the softmax activation function is used which selects the neuron with the best probability assuming that the image fits into that class:
Figure 10. Softmax activation
It remains to compile the created model, specifying the number of required epochs for training and the optimizer we are going to use. The optimizer is used to adjust the weights in the network, so as to get to a point with minimal loss. For high performance, a frequently used Adam's algorithm is chosen and a metric for evaluation is tuned:
Figure 11. Optimizer and model compiling
The fit() function is called to train the model:
Figure 12. Fit function
Taking a training set of 50000 samples and a validation set of 10000 samples will be obtained:
Figure 13. Result
It comes down to evaluating the workability of the model by calling model.evaluate():
Figure 14. Model evaluation
A result was obtained:
Figure 15. Model result
Conclusion
In the course of the experiment it was possible to demonstrate that the use of TensorFlow and Keras to solve the problem of image recognition is an effective and convenient tool for building powerful deep learning models. These libraries provide all the necessary tools to implement modern neural network architectures, which allows solving image classification problems with high accuracy.
References:
- Ian Goodfellow, Yoshua Bengio, Aaron Courville. Deep Learning. MIT Press, 2016.
- Aurelien Geron. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media, 2019.
- S.Haykin. Neural Networks and Learning Machines. 3rd Edition. Pearson, 2018.
- Francois Chollet. Deep Learning with Python. Manning Publications, 2018.
- David Barber. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2012.
- Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 2018.
- L.G.Komartsova, A.V.Maksimov. Neurocomputers. M., Bauman Moscow State Technical University, 2004. 400 с. ISBN 5-7038-2554-7 [in Russian].
- A.I.Galushkin. Neural networks. Fundamentals of the theory. M., Hot Line - Telecom, 2017. 496 с. ISBN 978-5-9912-0082-0. [in Russian].
- V.A.Golovko. Neural networks: training, organization and application. M., IPRZHR, 2002. 256 с. ISBN 5-93108-05-8. [in Russian].
- G.E.Yakhyaeva. Fundamentals of neural network theory. . (Free distance self-study course.) National Open University INTUIT. [in Russian].
- V.V.Kruglov, M.I.Dli, R.Yu.Golunov. Fuzzy logic and artificial neural networks. Fizmatlit, 2001. 224 с. ISBN 5-94052-027-8. [in Russian].
- I.Goodfellow, Y.Bengio, A.Courville. Deep Learning. MIT Press, 2016.
Translation: I.Goodfellow, Y.Bengio, A.Courville. Deep Learning. Translated from English by A.A.Slinkin. 2nd edition, revised. Moscow, DMK Press, 2018. 652 p., colored ill. ISBN 978-5-97060-618-6