This workshop will be delivered as an ONLINE COURSE for remote participation due to the COVID-19 measures enforced by most European governments. The workshop will take place ONLINE via Zoom on MONDAY, 7 - THURSDAY, 10 SEPTEMBER 2020 at 10:00-12:00 and 13:00-16:00 EEST [09:00-11:00 and 12:00-15:00 CEST] each day.
Overview
NVIDIA Deep Learning Institute (DLI) offers hands-on training for developers, data scientists, and researchers looking to solve challenging problems with deep learning.
Learn how to train and deploy a neural network to solve real-world problems, how to generate effective descriptions of content within images and video clips, how to effectively parallelize training of deep neural networks on Multi-GPUs and how to accelerate your applications with CUDA C/C++ and OpenACC.
This 4-days workshop combines lectures about fundamentals of Deep Learning for Multiple Data Types and Multi-GPUs with lectures about Accelerated Computing with CUDA C/C++ and OpenACC.
The lectures are interleaved with many hands-on sessions using Jupyter Notebooks. The exercises will be done on a fully configured GPU-accelerated workstation in the cloud.
The workshop is part of PRACE Training Centres activity and co-organized by LRZ – Leibniz Supercomputing Centre (Garching near Munich) as part of Gauss Centre for Supercomputing (Germany), IT4I – National Supercomputing Center VSB Technical University of Ostrava (Czech Republic), CSC – IT Center for Science Ltd (Finland) and NVIDIA Deep Learning Institute (DLI) for the Partnership for Advanced Computing in Europe (PRACE).
Lecturers: Dr. Momme Allalen, Dr. Juan Durillo Barrionuevo, Dr. Volker Weinberg (LRZ and NVIDIA University Ambassadors), Georg Zitzlsberger (IT4Innovations and NVIDIA University Ambassador)
Language: English
Price: Free of charge (4 training days)
Prerequisites and content level
Please note, that the workshop is exclusively for verifiable students, staff, and researchers from any academic institution (for industrial participants, please contact NVIDIA for industrial specific training).
Technical background, basic understanding of machine learning concepts, basic C/C++ or Fortran programming skills. In addition, basics in Python will be helpful. Since Python 2.7 is used, the following tutorial can be used to learn the syntax: docs.python.org/2.7/tutorial/index.html.
For the 1st day familiarity with TensorFlow will be a plus as all the hands-on sessions are using TensorFlow. For those who do not program in TensorFlow, please go over TensorFlow tutorial (especially the "Learn and use ML" section): www.tensorflow.org/tutorials/.
The content level of the course is broken down as: beginner's - 5,2 h (20%), intermediate - 14,3 h (55%), advanced - 6,5 h (25%), community-targeted content - 0,0 h (0%).
Important information
After you are accepted, please create an account under courses.nvidia.com/join .
Ensure your laptop / PC will run smoothly by going to http://websocketstest.com/
Make sure that WebSockets work for you by seeing under Environment, WebSockets is supported and Data Receive, Send and Echo Test all check Yes under WebSockets (Port 80). If there are issues with WebSockets, try updating your browser. If you have any questions, please contact Marjut Dieringer at: mdieringer"at"nvidia.com.
AGENDA / Description and learning outcomes
Day 1: Fundamentals of Deep Learning for Multiple Data Types
This day explores how convolutional and recurrent neural networks can be combined to generate effective descriptions of content within images and video clips.
Learn how to train a network using TensorFlow and the Microsoft Common Objects in Context (COCO) dataset to generate captions from images and video by:
- Implementing deep learning workflows like image segmentation and text generation
- Comparing and contrasting data types, workflows, and frameworks
- Combining computer vision and natural language processing
Upon completion, you’ll be able to solve deep learning problems that require multiple types of data inputs.
Day 2: Fundamentals of Accelerated Computing with OpenACC
On the 2d day you learn the basics of OpenACC, a high-level programming language for programming on GPUs. Discover how to accelerate the performance of your applications beyond the limits of CPU-only programming with simple pragmas. You’ll learn:
- How to profile and optimize your CPU-only applications to identify hot spots for acceleration
- How to use OpenACC directives to GPU accelerate your codebase
- How to optimize data movement between the CPU and GPU accelerator
Upon completion, you'll be ready to use OpenACC to GPU accelerate CPU-only applications.
Day 3: Fundamentals of Accelerated Computing with CUDA C/C++
The CUDA computing platform enables the acceleration of CPU-only applications to run on the world’s fastest massively parallel GPUs. On the 3rd day you experience C/C++ application acceleration by:
- Accelerating CPU-only applications to run their latent parallelism on GPUs
- Utilizing essential CUDA memory management techniques to optimize accelerated applications
- Exposing accelerated application potential for concurrency and exploiting it with CUDA streams
- Leveraging command line and visual profiling to guide and check your work
Upon completion, you’ll be able to accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques. You’ll understand an iterative style of CUDA development that will allow you to ship accelerated applications fast.
Day 4: Fundamentals of Deep Learning for Multi-GPUs
The computational requirements of deep neural networks used to enable AI applications like self-driving cars are enormous. A single training cycle can take weeks on a single GPU or even years for larger datasets like those used in self-driving car research. Using multiple GPUs for deep learning can significantly shorten the time required to train lots of data, making solving complex problems with deep learning feasible.
On the last day we will teach you how to use multiple GPUs to train neural networks. You'll learn:
- Approaches to multi-GPUs training
- Algorithmic and engineering challenges to large-scale training
- Key techniques used to overcome the challenges mentioned above
Upon completion, you'll be able to effectively parallelize training of deep neural networks using TensorFlow.