Single node deep learning frameworks: Comparative study and CPU/GPU performance analysis
Deep learning presents an efficient set of methods that allow learning from massive volumes of data using complex deep neural networks. To facilitate the design and implementation of algorithms, deep learning frameworks provide a high-level programming interface. Based on these frameworks, new models, and applications are able to make better and better predictions. One type of deep learning application is the Internet of Things that can gather a continuous flow of data, which causes an explosion of the amount of data. Therefore, to handle this data management issue, computation technologies can offer new perspectives to analyze more data with more complex models. In this context, a cluster of computers can operate to quickly deliver a model or to enable the design of a complex neural network spread among computers. An alternative is to distribute a deep learning task with HPC cloud computing resources and to scale cluster in order to quickly and efficiently train a neural network. As a first step to design an infrastructure aware framework which is able to scale the computing nodes, this work aims to review and analyze the state-of-the-art frameworks by collecting device utilization data during the training task. We gather information about the CPU, RAM and the GPU utilization on deep learning algorithms with and without multi-threading. The behavior of each framework is discussed and analyzed in order to shed light on the strengths and weaknesses of the different deep learning frameworks.