2-6 June 2014
BSC, Barcelona UPC, Campus Nord
CET timezone
Objectives: 

The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Many commercial offerings from NVIDIA, AMD, and Intel already offer such levels of concurrency. Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors. The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors.

Learning Outcomes:

The students who finish this course will learn how to program massively parallel processors and achieve high performance, functionality, maintainability, and scalability across future generations.
 

Requirements: 

Basic knowledge of C/C++ programming

Attendees will need to bring their own laptops with a SSH client

 

 

 

 

The students who finish this course will acquire technical knowledge required to achieve the above goals by learning principles and patterns of parallel algorithms, processor architecture features and constraints, and programming API, tools and techniques.

Level: BEGINNERS: for trainees from different background or very little knowledge (All courses are designed for specialists with at least finished 1st cycle degree)

Starts
Ends
CET
BSC, Barcelona UPC, Campus Nord
Vertex Building, room VS 219
http://bsc.es/education
Course Outline:

Day 1

Session 1 / 9am - 1 pm: (3h lectures with 5 min breaks on the hour)

  1. Introduction to CUDA
  2. CUDA Threading Model (I)
  3. CUDA Threading Model (II)

Lunch Break (1pm to 2pm)

Session 2 / 2 pm- 6 pm: (3h practical session)

Lab exercises

Day 2

Session 3 / 9am- 1 pm: ((3h lectures with 5 min breaks on the hour))

  1. CUDA Memory Model
  2. Matrix Multiplication – Shared Memory
  3. 2D Convolution – Constant Memory

Lunch Break (1pm to 2pm)

Session 4 / 2 pm- 6 pm: (3h practical session)

Lab exercises

Day 3

Session 5 / 9am- 1 pm: ((3h lectures with 5 min breaks on the hour))

  1. CUDA Memory Model
  2. Matrix Multiplication – Shared Memory
  3. 2D Convolution – Constant Memory

Lunch Break (1pm to 2pm)

Session 6 / 2 pm- 6 pm: (3h practical session)

Lab exercises

Day 4

Session 7 / 9am- 1 pm: (3h practical session)

  1. Parallel Reductions
  2. Memory Bandwidth Considerations
  3. Prefix Scan

Lunch Break (1pm to 2pm)

Session 8 / 2 pm- 6 pm: (3h practical session)

Lab exercises

Day 5

Session 9 / 9am - 1 pm: (3h lectures with 5 min breaks on the hour)

MPI and Multi-GPU programming

Lunch Break (1pm to 2pm)

Session 10/ 2 pm- 6 pm: (3h practical session)

Lab exercises

END of COURSE