Fundamentals of Accelerated Computing with CUDA C/C++ @ IT4Innovations

207 (VŠB - Technical University Ostrava, IT4Innovations building)


VŠB - Technical University Ostrava, IT4Innovations building

Studentská 6231/1B 708 33 Ostrava–Poruba Czech Republic


The CUDA computing platform enables the acceleration of CPU-only applications to run on the world’s fastest massively parallel GPUs. You experience C/C++ application acceleration by:

  • Accelerating CPU-only applications to run their latent parallelism on GPUs
  • Utilizing essential CUDA memory management techniques to optimize accelerated applications
  • Exposing accelerated application potential for concurrency and exploiting it with CUDA streams
  • Leveraging command line and visual profiling to guide and check your work.

This training is a part of NVIDIA AI & HPC ACADEMY 2020.

The lectures are interleaved with many hands-on sessions using Jupyter Notebooks. The exercises will be done on a fully configured GPU-accelerated workstation in the cloud.

The workshop is co-organized by LRZ, IT4Innovations and NVIDIA Deep Learning Institute (DLI) for the Partnership for Advanced Computing in Europe (PRACE). Both IT4Innovations and LRZ, as part of GCS, are PRACE Training Centres, serve as European hubs and key drivers of advanced high-quality training for researchers working in the computational sciences.

NVIDIA DLI offers hands-on training for developers, data scientists, and researchers looking to solve challenging problems with deep learning.

All instructors are NVIDIA certified University Ambassadors.





Purpose of the course

Upon completion, you will be able to accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques. You will understand an iterative style of CUDA development that will allow you to ship accelerated applications fast.

About the tutor

Dr. Momme Allalen received his Ph.D. in theoretical Physics from the University of Osnabrück in 2006. He worked in the field of molecular magnetics through modelling techniques such as the exact numerical diagonalisation of the Heisenberg model. He joined the Leibniz Computing Centre (LRZ) in 2007 working in the High Performance Computing group. His tasks include user support, optimisation and parallelisation of scientific application codes, and benchmarking for characterising and evaluating the performance of high-end supercomputers. Momme is an NVIDIA DLI certified instructor for Fundamentals of Accelerated Computing with CUDA C/C++. His research interests are various aspects of parallel computing and new programming languages and paradigms on novel HPC architectures.

NVIDIA Deep Learning Institute

The NVIDIA Deep Learning Institute delivers hands-on training for developers, data scientists, and engineers. The program is designed to help you get started with training, optimizing, and deploying neural networks to solve real-world problems across diverse industries such as self-driving cars, healthcare, online services, and robotics.


This event was partially supported by The Ministry of Education, Youth and Sports from the Large Infrastructures for Research, Experimental Development and Innovations project "e-Infrastruktura CZ – LM2018140“ and partially by the PRACE-6IP project - the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 823767. We would like to also thank Bayncore Labs for their contributions to this event.


Local course web page
Travel & Accommodation
Fundamentals of Accelerated Computing with CUDA C/C++
    • 08:30 09:00
    • 09:00 09:20
    • 09:20 11:00
      Part 1: Accelerating Applications with CUDA C/C++

      Introduction to the Singularity Containers on the Salomon cluster
      How to bootstrap singularity image from docker

      • 09:20
        Coffee 30m
    • 11:00 11:15
      Coffee 15m
    • 11:15 13:00
      Part 1 continued
    • 13:00 14:00
      Lunch 1h
    • 14:00 15:30
      Part 2: Managing Accelerated Application Memory with CUDA unified memory and nvprof
    • 15:30 15:45
      Coffee 15m
    • 15:45 17:00
      Part 3: Asynchronous Streaming and Visual Profiling for Accelerated Applications with CUDA C/C++