Ultra fast cnn based Hardware Computing Platform Concepts for adas visual Sensors and Evolutionary Mobile Robots

Download 3,22 Mb. Pdf ko'rish
bet	54/81
Sana	16.05.2024
Hajmi	3,22 Mb.
	#238917

1 ... 50 51 52 53 54 55 56 57 ... 81

Bog'liq
Alireza Fasih

8.1

CNN Based High Performance Computing for Real Time Image
Processing on GPU

Many of the basic image processing tasks suffer from processing overhead to operate over
the whole image. In real time applications the processing time is considered as a big
obstacle for its implementations. A High Performance Computing (HPC) platform is
necessary in order to solve this problem. The usage of hardware accelerator make the
processing time low. In recent developments, the Graphics Processing Unit (GPU) is being
used in many applications. Along with the hardware accelerator a proper choice of the
computing algorithm makes it an added advantage for fast processing of images. The
Cellular Neural Network (CNN) is a large-scale nonlinear analog circuit able to process
signals in real time [12]. In this research, we develop a new design in evaluation of image
processing algorithms on the massively parallel GPUs with CNN implementation using
Open Computing Language (OpenCL) programming model. This implementation uses the
Discrete Time CNN (DT-CNN) model which is derived from originally proposed CNN model.
The inherent massive parallelism of CNN along with GPUs makes it an advantage for high
performance computing platform [131]. The advantage of OpenCL makes the design to be
portable on all the available graphics processing devices and multi core processors.
Performance evaluation is done in terms of execution time with both device (i.e. GPU) and
host (i.e. CPU).

8.2

Introduction
Image processing is an ever expanding and dynamic area with applications reaching out
into everyday life such as in medicine, space exploration, surveillance, authentication,
automated industry inspection and in many more areas [132]. Real time image processing
using modern processors is limited [52]. Problems in computer vision are computationally
intensive [133]. The tremendous amount of data required for image processing and
computer vision applications present a significant problem for conventional
microprocessors [52]. Consider a sequence of images at medium resolution (512
×
512
pixels) and standard frame rate (30 frames per second) in color (3 bytes per pixel). This
represents a rate of almost 24 million bytes of data per second. A simple feature extraction

85
algorithm may require thousands of basic operations per pixels, and a typical vision system
requires significantly more complex computations.
As we can see, parallel computing is essential to solve such problems [133]. In fact, the
need to speed up image processing computations brought parallel processing into
computer vision domain. Most image processing algorithms are inherently parallel because
they involve similar computations for all pixels in an image except in some special cases
[133]. Conventional general-purpose machines cannot manage the distinctive I/O
requirements of most image processing tasks; neither do they take advantage of the
opportunity for parallel computation present in many vision related applications [121].
Many research efforts have shifted to
Commercial-Off-The-Shelf
(COTS) -based platforms in
recent years, such as
Symmetric Multiprocessors
(SMP) or clusters of PCs. However, these
approaches do not often deliver the highest level of performance due to many inherent
disadvantages of the underlying sequential platforms and “the divergence problem”. The
recent advent of multi-million gate on the
Field Programmable Gate Array
(FPGAs) having
richer embedded feature sets, such as plenty on
–
chip memory, DSP blocks and embedded
hardware microprocessor IP cores, facilitates high performance, low power consumption
and high density [134].
But, the development of dedicated processor is usually expensive and their limited
availability restricts their widespread use and its complexity of design and implementation
also makes the FPGA not preferable. However, in the last few years, the graphic cards with
impressive performance are being introduced into the market for lower cost and flexibility
of design makes it a better choice. Even though they have been initially released for the
purpose of gaming, they also find the scientific applications where there is a great
requirement of parallel processing. Along with the support of hardware platforms there
are some software platforms available like
Compute Unified Device Architecture
(CUDA)
and OpenCL for designing and developing parallel programs on GPU [135]. Out of these
available software platforms OpenCL framework recently developed for writing programs
can be executed across multicore heterogeneous platforms. For instance, it can be executed
on multicore CPU’s and GPU’s and their combination. Usage o
f this framework also
provides an advantage of the portability that is; the developed kernel is compatible with
other devices. Along with the available hardware and software platforms we used the CNN
parallel computing paradigm for some image processing applications.

86
The idea of CNN was taken from the architecture of artificial neural networks and cellular
automata. In contrast to ordinary neural networks, CNN has the property of local
connectivity. The weights of the cells are established by the parameters called the
template. The functionality of the CNN is dependent on the template. So with a single
common computing model, by calculating the templates we can achieve the desired
functionality. The CNN has been successfully used for various high-speed parallel signal
processing applications such as image processing, visual computing and pattern
recognition as well as computer vision [91]. So we thought of implementing it on the
hardware for the need of HPC in real time image processing. Also, the parallel processing
capability of the CNN makes us to implement the CNN architecture on the hardware
platform for its efficient visualization.
In this research, the effort is done to develop a DT-CNN model on the graphics processing
units with the OpenCL framework. An effort is done to make the development of DTCNN
entirely on the kernel which make it executable on every platform. But, it should be noticed
that the GPU is a coprocessor which supports the processor in our system. Hence, the CPU
still executes several tasks, like the transmission of the data to the local memory of the
graphics card and retrieving back. Finally, GPU-based
Universal Machine - CNN
(UM-CNN)
was implemented using the OpenCL framework on NVIDIA GPU. A benchmark is provided
with the usage of GPU based CNN model for the image processing in comparison with CPU.
The chapter is structured as follows: Section II gives a clear description about the theory
involved in parallel computing. Section III introduces the concepts of CNN, the system
diagram and its functionality and systems designed methodology which is done using
OpenCL. Section IV concludes the section and says about the work going to be done in the
future.

Download 3,22 Mb.

1 ... 50 51 52 53 54 55 56 57 ... 81

Download 3,22 Mb.

Pdf ko'rish