FPGA IMPLEMENTATIONS OF NEURAL NETWORKS

РЕАЛИЗАЦИЯ НЕЙРОННЫХ СЕТЕЙ НА БАЗЕ ПЛИС

Safir P.

27.02.2023 192

2(107)

22. Электроника

Цитировать:

Safir P. FPGA IMPLEMENTATIONS OF NEURAL NETWORKS // Universum: технические науки : электрон. научн. журн. 2023. 2(107). URL: https://7universum.com/ru/tech/archive/item/14945 (дата обращения: 13.05.2024).

Прочитать статью:

DOI - 10.32743/UniTech.2023.107.2.14945

ABSTRACT

In this article, I would like to discuss the implementation and using of neural networks on an FPGA[1] based system and also the topology of neural network deployment on FPGA, as well as the advantages of using FPGA-based neural networks over a CPU-based neural network. We will also discuss the requirements for neural networks deployed in an embedded system. Concluding with the advantages and disadvantages of the neural networks produced by the FPGA manufacturers as well as the advantages of using HLS[2] (High Level Synthesis) technology. And why SoC together with FPGA makes an ideal technological solution for embedded systems. We will also discuss which high-level programming languages will make it easier to develop neural networks on FPGA.

АННОТАЦИЯ

В этой статье я хотел бы обсудить реализацию и использование нейронных сетей на базе ПЛИС, топологию нейронных сетей на базе ПЛИС, а также преимущества использования нейронных сетей на базе ПЛИС перед нейронными сетями на основе обычных процессоров. Мы также обсудим требования к нейронным сетям, развернутым на встраиваемых системах. В заключение мы рассмотрим преимущества и недостатки нейронных сетей, выпускаемых производителями ПЛИС, а также преимущества использования технологии HLS (High Level Synthesis). Почему SoC вместе с ПЛИС являются идеальным технологическим решением для встраиваемых систем, а также обсудим, какие языки программирования высокого уровня облегчат разработку нейронных сетей на ПЛИС.

Keywords: FPGA, HLS, embedded system, neural networks, SoC.

Ключевые слова: ПЛИС, встраиваемые системы, нейронные сети, система на кристалле.

Introduction

Neural networks have found applications in many areas, from noise filtering, to object classification. But the optimal requirements for neural networks are very high. The range of neural network applications is very wide, from real-time object classification, to data processing on embedded systems. However, the deployment of neural networks on FPGA and embedded systems has a number of problems which must be addressed such as limited computing power on a number of hardware platforms but also the absence of ready-made libraries, high costs of development and complexity of development. But is this true or not, and how much difficulty is there in developing the right system. In this article we attempt to figure it out and give an objective overview.

FPGA and embedded systems

Modern video processing[3] and computer vision software is developing very fast, consequently, it requires huge computing power. The management of city streets, traffic, government needs and many other daily uses requires the implementation of fast real-time neural networks for the classification of objects captured on video cameras. The main use for image classification is to classify an image into a specific class. With the introduction of convolutional neural networks, the recognition of objects on the image has improved significantly and in recent years has become almost indiscernible in quality to looks more like the human vision. The quality of transmitted video has also improved. However, of course, massive computing power is required to process the infinite flow of data. There are two types of stationary system, which receive power their from the network or generators and can process data using in cloud servers. The second type of device is what is known as an embedded system. These systems work on autonomous devices such as robots, piloted autonomous systems, and other devices where wire access is difficult. For such systems, a specific requirements and characteristics must be defined, including the following:

Light weight design of the device installed.
Minimal power consumption, as the system is only powered by the in-built battery.
The ability to solve the problem with the amount of FPGA heat generated.

Because all data processing takes place within the system using real time data gathering, an important requirement is the need for powerful computing resources that use limited power consumption and cooling. GPUs[4] are used for data processing in neural networks[3], but for use in embedded systems, GPUs are not usable due to several factors. Namely:

GPU cannot operate in extreme environments such as high temperature, high humidity and strong vibration.
GPUs have very high power consumption, which is not appropriate for embedded systems.
GPUs requires additional heat dissipation circuitry, which is not always possible in the limited available space of embedded systems.

To implement neural networks on embedded systems, it is better and more optimal to use FPGA, together with an SoC (System on a Chip)[5]. An SoC system integrates the electronic circuits of various computer components on a single integrated circuit (IC). FPGA and SoC is when both an FPGA and an ARM[6] software processor are co-located on a chip, which allows for combining the power of a hardware parallel computing FPGA, and an ARM software processor combined. The ARM processor will use standard interfaces and protocols such as I2C, SPI, UART, GPIO and many more. But the most important thing is that by using ARM we can run code developed in C/C++ and Python which will reduce the start-up costs of the project and simplify the programming time. Therefore, the SoC is most often used as the computing model on embedded systems using neural networks. And because an on-chip system includes both hardware and software, it consumes less power, has better performance, requires less space, and is more reliable than multi-chip systems.

Employment of neural networks on FPGAs Feed forward artificial neural networks

What is a neuron? A neuron can be thought of as a function, it takes a certain value and returns one.

Figure 1. Neuron

A neuron is connected to a series of signal connections and its output forms one signal value, and this value is transmitted to other neurons with which it is connected. All inputs to the neuron are evaluated taking into account the weight of each connection.

Then the resulting value of E is substituted into some function F(x).

What the function F(x) returns is the output of the neuron signal Y. The function F(x), is called the neuron activation function. There are several neuron activation functions, ReLu, softmax, sigmoid, tanh

A feedforward artificial neural network consists of several layers, namely: input layer, hidden layer and output layer. In a feedforward artificial neural network each neuron of the current layer is connected to another neuron of the next layer and so on. Forward propagation means that the input signal is distributed from input to output without any feedback. Thus, the input layer does not need to do anything, it just enables the connection to the neurons for the next layer. If there are any layers found between the input layer and the output layer, they are called hidden layers. Usually the number of neurons in the output layer corresponds to the number of classes of objects the neural network defines.

Figure 2. Feed forward artificial neural network

FPGA

To implement neural networks in FPGA it is necessary therefore to take into account the specifics of the hardware implementation. A neural network consists of a huge number of parallel neurons, thus it is very important to maximize the parallel execution of simple operations during the hardware implementation phase. Usually, for maximal parallel computing, several CPUs are used which are combined together for solving one task while operating in parallel with the other. In this configuration, the working processors are independent, and only communicate with each other via the agreed communication channels. What is the is the main problem with this parallel architecture? Let us consider, using an example: a 32 bit processor has an operating frequency of 400 MHz and 32-bit communication channels with an operating data transfer rate of 40 MHz. In this example, the communication channels with their slow data transfer actually slow down or even negate the work of this configuration. It is a bit like observing road traffic, where there are a lot of road lanes on the road and suddenly all of them converge into one road lane, thus forming congestion leading to a giant traffic jam. In this configuration bottleneck are channels communication between the parallel processors. Therefore, what happens is the usage of one CPU gives better results, than the configuration using several CPUs with external communication channels between them. This problem can be overcome by FPGA because it contains a large number of logical elements, sufficient for deploying a large number of parallel working neurons. At the same time, the clock frequency of FPGA is very high which allows neurons to calculate and operate functionality at high speed, and because neurons have internal separate communication channels inside the same FPGA, data transfer speeds between them is high and thus the problem of "traffic jam", which is almost always present in parallel connected CPUs with external communication channels, is resolved. The FPGA contains ready-made DSP[7] (Digital Signal Processor) blocks. These blocks perform very fast operations with real time numbers but FPGA is even better at handling logical operations than operations where real numbers are involved. Therefore we have to conclude that FPGA with its logical-hardware platform is better for use in implemented binary neural networks because these networks use algorithms to perform logical operations. Binary neural networks allow for the implementation of neural networks on systems with limited resources. A large number of logical elements allows us to build a lot of physical neurons working in parallel, which communicate inside the FPGA. The communication channels between the logical elements and the internal memory of the FPGA have good speed characteristics. High frequency FPGA can be as much several gigahertz in some FPGA families. Parallelism of hardware logic functioning with good internal and external communication channels contributes to high computation speed for multiple neurons working in parallel. To work with FPGA, the hardware description languages HDL[8] are used. The most widespread of them are VHDL and Verilog. On the basis of these synthesized languages RTL[8] (Register Transfer Lerer) is created. The disadvantages of these languages are the complexity of the transfer of algorithms from high-level languages such as C/C++ or Phyton. Manufacturers have therefore developed a new technology called HLS (High Level Synthesis) which allows the user to generate RTL code from high-level languages such as C/C++. This has made code development much easier and less time-consuming. There are HLS compilers available for hire that can generate code from high-level languages such as C/C++ or Phyton into RTL code, if required. The resulting code is not exactly optimal, but it is possible to optimize it manually.

Review of ready to use neural network implementations

Many FPGA vendors such as Xilinx and Intel provide a ready to use solution for implementing neural networks on FPGAs. For example, for embedded system PyNQ [9], implemented on Xilinx`s FPGA has a ready-made open source code available for everyone to use. Similarly, the excellent project Finn [10] from Xilinx with open source code allows you to run binary neural networks on your FPGA. And Intel has also released their neural networks software for use on their FPGA system called OpenVINO which is also an open source project available to all users.

Conclusion

The use of FPGA for implementation of neural networks enables you to increase the speed of signal processing, in contrast to the software implementation. Effective and efficient parallel hardware processing significantly reduces the processing time and increases the efficiency of the whole neural network. FPGA solves the problem of data transfer speed between neurons inside a single crystal. By the application of high-level languages like C/C++ or Python with code generation in RTL you will reduce the time and simplify the neural network development. In summary, the increasing power of FPGAs allows them to be used not only for the implementation of simple controllers and interface units, but also for digital signal processing, complex intelligent controllers and neural networks. The development of fast FPGAs with ultra-low power consumption opens up great opportunities for their use in mobile communication systems, DSP and many other things. SoC are much more power efficient than stationary processors. The SoC can be powered by batteries for a longer period of time, making it the right choice for any embedded system.

References:

Amos R. Omondi, Jagath C. Rajapakse. FPGA Implementations of Neural Networks 2006th Edition. Springer 2006. P. 87 – 121.
Roger Woods, John McAllister, Gaye Lightbody, Ying Yi. Fpga-Based Implementation of Signal Processing Systems, 2nd Edition. Wiley 2017. P. 56 -64.
Uwe Meyer-Baese. Digital Signal Processing with Field Programmable Gate Arrays (Signals and Communication Technology) 3rd Edition. Springer 2017. P. 27 – 87.
Charu C. Aggarwal. Neural Networks and Deep Learning. Springer 2018. P. 78 – 98.
Cem Unsalan, Bora Tar. Digital System Design with FPGA: Implementation Using Verilog and VHDL 1st Edition. McGraw Hill 2017. P. 45 – 67.
Volnei A. Pedroni. Circuit Design with VHDL, third edition. The MIT Press 2020. P. 89 – 112.
Pong P. Chu. RTL Hardware Design Using VHDL: Coding for Efficiency, Portability, and Scalability 1st Edition. Wiley-IEEE Press 2006. P. 38 – 49.
Steve Kilts. Advanced FPGA Design: Architecture, Implementation, and Optimization 1st Edition. Wiley-IEEE Press 2017. P. 89 – 101.
Ross K. Snider. Advanced Digital System Design using SoC FPGAs: An Integrated Hardware/Software Approach. Springer 2023. P. 34 – 89.
Frank Vahid. Digital Design 1st Edition. Wiley 2006. P. 48 – 67.