Learn how to train and deploy a neural network to solve real-world problems, how to generate effective descriptions of content within images and video clips, how to effectively parallelize training of deep neural networks on Multi-GPUs and how to accelerate your applications with CUDA C/C++ and OpenACC.
This new 4-days workshop offered for the first time at LRZ combines lectures about fundamentals of Deep Learning for Multiple Data Types and Multi-GPUs with lectures about Accelerated Computing with CUDA C/C++ and OpenACC.
The lectures are interleaved with many hands-on sessions using Jupyter Notebooks. The exercises will be done on a fully configured GPU-accelerated workstation in the cloud.
The workshop is co-organized by LRZ and NVIDIA Deep Learning Institute (DLI) for the Partnership for Advanced Computing in Europe (PRACE). Since 2012 LRZ as part of GCS is one of currently 10 PRACE Training Centres which serve as European hubs and key drivers of advanced high-quality training for researchers working in the computational sciences.
NVIDIA DLI offers hands-on training for developers, data scientists, and researchers looking to solve challenging problems with deep learning.
All instructors are NVIDIA certified University Ambassadors.
1st day: Fundamentals of Deep Learning for Multiple Data Types
This day explores how convolutional and recurrent neural networks can be combined to generate effective descriptions of content within images and video clips.
Learn how to train a network using TensorFlow and the Microsoft Common Objects in Context (COCO) dataset to generate captions from images and video by:
- Implementing deep learning workflows like image segmentation and text generation
- Comparing and contrasting data types, workflows, and frameworks
- Combining computer vision and natural language processing
Upon completion, you’ll be able to solve deep learning problems that require multiple types of data inputs.
2nd day: Fundamentals of Deep Learning for Multi-GPUs
The computational requirements of deep neural networks used to enable AI applications like self-driving cars are enormous. A single training cycle can take weeks on a single GPU or even years for larger datasets like those used in self-driving car research. Using multiple GPUs for deep learning can significantly shorten the time required to train lots of data, making solving complex problems with deep learning feasible.
On the 2nd day we will teach you how to use multiple GPUs to train neural networks. You'll learn:
Approaches to multi-GPUs training
Algorithmic and engineering challenges to large-scale training
Key techniques used to overcome the challenges mentioned above
Upon completion, you'll be able to effectively parallelize training of deep neural networks using TensorFlow.
3rd day: Fundamentals of Accelerated Computing with CUDA C/C++
The CUDA computing platform enables the acceleration of CPU-only applications to run on the world’s fastest massively parallel GPUs. On the 3rd day you experience C/C++ application acceleration by:
Accelerating CPU-only applications to run their latent parallelism on GPUs
Utilizing essential CUDA memory management techniques to optimize accelerated applications
Exposing accelerated application potential for concurrency and exploiting it with CUDA streams
Leveraging command line and visual profiling to guide and check your work
Upon completion, you’ll be able to accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques. You’ll understand an iterative style of CUDA development that will allow you to ship accelerated applications fast.
4th day: Fundamentals of Accelerated Computing with OpenACC
On the last day you learn the basics of OpenACC, a high-level programming language for programming on GPUs. Discover how to accelerate the performance of your applications beyond the limits of CPU-only programming with simple pragmas. You’ll learn:
How to profile and optimize your CPU-only applications to identify hot spots for acceleration
How to use OpenACC directives to GPU accelerate your codebase
How to optimize data movement between the CPU and GPU accelerator
Upon completion, you'll be ready to use OpenACC to GPU accelerate CPU-only applications.
You must bring your own laptop to this workshop!
After you are accepted, please create an account under courses.nvidia.com/join .
Ensure your laptop will run smoothly by going to http://websocketstest.com/ Make sure that WebSockets work for you by seeing under Environment, WebSockets is supported and Data Receive, Send and Echo Test all check Yes under WebSockets (Port 80).If there are issues with WebSockets, try updating your browser. If you have any questions, please contact Marjut Dieringer at mdieringer"at"nvidia.com.
PRACE Training and Education
The mission of PRACE (Partnership for Advanced Computing in Europe) is to enable high-impact scientific discovery and engineering research and development across all disciplines to enhance European competitiveness for the benefit of society. PRACE has an extensive education and training effort through seasonal schools, workshops and scientific and industrial seminars throughout Europe. Seasonal Schools target broad HPC audiences, whereas workshops are focused on particular technologies, tools or disciplines or research areas.
NVIDIA Deep Learning Institute
The NVIDIA Deep Learning Institute delivers hands-on training for developers, data scientists, and engineers. The program is designed to help you get started with training, optimizing, and deploying neural networks to solve real-world problems across diverse industries such as self-driving cars, healthcare, online services, and robotics.