dc.description.abstract | The fourth industrial revolution use modern technologies that produce a continues flow of data. This
large amount of data cannot be analyzed with traditional technologies to detect and diagnose problem
without the need of a human. Deep learning consists of a set of methods based on neural networks
that can process and extract information from a such amount of data.
Deep learning frameworks provide a high-level programming interface to offer fast design and
implementation of deep learning tasks. Based on them, new models and applications are developed
and perform better and better. Nevertheless, a framework that runs on a single computer cannot alone
takes into account the huge flow of data. It is known that cluster of computers can operate to quickly
deliver a model or to enable the design of a complex neural network spread among computers. Edge
artificial intelligence and cloud computers are other technologies in which deep learning tasks can be
distributed between the available computing nodes. The advantage of cloud computing over other
technologies is its elasticity: the ability to scale its infrastructure depending of the resources
requirement.
To design a framework which scales compute nodes depending on the deep learning task, we review
and analyze the state-of-the-art frameworks. In this work, we collect data on how frameworks use the
CPU, the RAM, and the GPU with and without multi-threading on convolutional neural networks that
predict a label on a small and a big dataset. Moreover, we discuss the process of data collection
management when using GPU frameworks. We consider five frameworks, namely MxNet, Paddle,
pyTorch, Singa, and Tensorflow 2. All of them have a native implementation with a Python binding and
they support both the CUDA and the OpenCL library. We noted that Singa quickly computes the results
but does not take care of available resources resulting in a crash. MxNet and Paddle also cannot handle
some running configurations and are not able to adapt their behavior to accomplish the task. Other
frameworks can achieve the goal with a difference in response time in favor of pyTorch. Moreover we
show that pyTorch uses 100% of the CPU with a steady number of threads counter to Tensorflow.
Tensorflow requires less RAM than pyTorch with the stochastic gradient descent method. When it
comes to the mini-batch learning process, pyTorch has quite similar RAM needs than Tensorflow. We
also infer the GPU behavior from time spent in the CUDA functions. These different analyzes show us
that pyTorch makes better use of the available resources. That is why it outperforms Tensorflow. | en_US |