85
algorithm may require thousands of basic operations per pixels, and a typical vision system
requires significantly more complex computations.
As we can see, parallel computing is essential to solve such problems [133]. In fact, the
need to speed up image processing computations brought
parallel processing into
computer vision domain. Most image processing algorithms are inherently parallel because
they involve similar computations for all pixels in an image except in some special cases
[133]. Conventional general-purpose machines cannot manage the distinctive I/O
requirements of most image processing tasks; neither do they take advantage of the
opportunity for parallel computation present in many vision related applications [121].
Many research efforts have shifted to
Commercial-Off-The-Shelf
(COTS) -based platforms in
recent years, such as
Symmetric Multiprocessors
(SMP) or clusters of PCs. However, these
approaches do not often deliver the highest level of performance due to many inherent
disadvantages of the underlying sequential platforms and “the divergence problem”. The
recent advent of multi-million gate on the
Field Programmable Gate Array
(FPGAs) having
richer embedded feature sets, such as plenty on
–
chip memory, DSP blocks and embedded
hardware microprocessor IP cores, facilitates high performance, low power consumption
and high density [134].
But, the development of dedicated processor is usually
expensive and their limited
availability restricts their widespread use and its complexity of design and implementation
also makes the FPGA not preferable. However, in the last few years, the graphic cards with
impressive performance are being introduced into the market for lower cost and flexibility
of design makes it a better choice. Even though they have been initially released for the
purpose of gaming, they also find the scientific applications where there is a great
requirement of parallel processing. Along with the support of hardware platforms there
are some software platforms available like
Compute Unified Device Architecture
(CUDA)
and OpenCL for designing and developing parallel programs on GPU [135]. Out of these
available software platforms OpenCL framework recently developed for writing programs
can be executed across multicore heterogeneous platforms. For instance, it can be executed
on multicore CPU’s and GPU’s and their combination. Usage o
f this framework also
provides an advantage
of the portability that is; the developed kernel is compatible with
other devices. Along with the available hardware and software platforms we used the CNN
parallel computing paradigm for some image processing applications.
86
The idea of CNN was taken from the architecture of artificial neural networks and cellular
automata. In contrast to ordinary neural networks, CNN has the property of local
connectivity. The weights of the cells are established by the parameters called the
template. The functionality of the CNN is dependent on the template. So with a single
common
computing model, by calculating the templates we can achieve the desired
functionality. The CNN has been successfully used for various high-speed parallel signal
processing applications such as image processing, visual computing and pattern
recognition as well as computer vision [91]. So we thought of implementing it on the
hardware for the need of HPC in real time image processing. Also, the parallel processing
capability of the CNN makes us to implement the CNN architecture
on the hardware
platform for its efficient visualization.
In this research, the effort is done to develop a DT-CNN model on the graphics processing
units with the OpenCL framework. An effort is done to make the development of DTCNN
entirely on the kernel which make it executable on every platform. But, it should be noticed
that the GPU is a coprocessor which supports the processor in our system. Hence, the CPU
still executes several tasks, like the transmission of the data to the local memory of the
graphics card and retrieving back. Finally, GPU-based
Universal Machine - CNN
(UM-CNN)
was implemented using the OpenCL framework on NVIDIA GPU. A benchmark is provided
with the usage of GPU based CNN model for the image processing in comparison with CPU.
The chapter is structured as follows: Section II gives a clear description about the theory
involved in parallel computing. Section III introduces the concepts of CNN, the system
diagram and its functionality and systems designed methodology which is done using
OpenCL. Section IV concludes the section and says about the work going to be done in the
future.