[ONLINE] Introduction to parallel programming @ UL

Europe/Ljubljana
Faculty of Mechanical Engineering, University of Ljubljana

Faculty of Mechanical Engineering, University of Ljubljana

Askerceva 6 1000 Ljubljana Slovenia
Description

First part of this course (Day 1&2) is part of Slovenian EuroHPC Competence Centre training event.

The second part of this course (Day 3&4) is a PRACE training event.

Event is organized by LECAD Laboratory, University of Ljubljana, Slovenia.

Participation is free of charge! Registration will close on Monday 12 October at midnight.

In light of COVID-19 situation, this will be an online event.

Organized by

Laboratory LECAD
Faculty of Mechanical Engineering
University of Ljubljana

    • 09:00 12:00
      Parallel programming in C using MPI: Introduction, point-to-point communication, basic collective communication
      • Introduction
      • Point-to-point communication
      • Basic collective communication
      Convener: Timotej Hrga (University of Ljubljana, Faculty of Mechanical Engineering)
    • 10:40 11:00
      Coffee break 20m
    • 12:00 13:00
      Lunch break 1h
    • 13:00 17:00
      Parallel programming in C using MPI: Advanced collective communication, non-blocking communication, one-sided communication
      • Advanced collective communication
      • Non-blocking communication
      • One-sided communication. The essential concepts of one-sided communication in MPI, as well as the advantages of the MPI communication model.
      Convener: Timotej Hrga (University of Ljubljana, Faculty of Mechanical Engineering)
    • 14:40 15:00
      Coffee break 20m
    • 09:00 10:40
      Introduction to PRACE How to write efficient OpenMP programs; Hybrid MPI + OpenMP programming

      How to identify performance bottlenecks, perform numerical computations efficiently. Hybrid application programs using MPI + OpenMP are now commonplace on large HPC systems.
      There are two main motivations for this combination of programming models:
      - Reduction in memory footprint
      - Improved performance

      Convener: Leon Kos (University of Ljubljana, Faculty of Mechanical Engineering)

      To unpack and run exercises:

      cd $HOME
      wget https://fs.hlrs.de/projects/par/par_prog_ws/practical/MPI31.tar.gz
      tar xvzf MPI31.tar.gz
      cd MPI/03
      cp ~/MPI/tasks/C/Ch11/solutions/ring-1sided-put-win-alloc-shared.c $PWD
      cp ~/MPI/tasks/C/Ch11/ring-1sided-put-win-alloc-shared-skel.c $PWD 
      gedit ring-1sided-put-win-alloc-shared-skel.c
      module load OpenMPI/3.1.4-GCC-8.3.0
      mpicc ring-1sided-put-win-alloc-shared-skel.c
      env --unset=LD_PRELOAD srun -n 3 --partition=haswell ./a.out

       

       

    • 10:40 11:00
      Break 20m
    • 11:00 12:00
      Profiling OpenMP and MPI applications, performance evaluation and optimizing of OpenMP applications

      Design; choosing a parallel algorithm, discussion about the paradigms, starting with a serial code towards parallelization, testing!
      Optimization; premature optimization, unnecessarily optimization, optimizing communications > computation, data transfer, MPI collective operations.

      Convener: Leon Kos (University of Ljubljana, Faculty of Mechanical Engineering)

      Compiling and running with TAU

      module load tau
      cp -r /home/leon//PTC_OpenMPI-MP_profiling/$HOME/PTC_OpenMPI-MP_profiling
      cd $HOME/PTC_OpenMPI-MP_profiling/examples/openmpi/simple-work
      tau_cc.sh -tau_makefile=/opt/pkg/software/tau/2.29.1/x86_64/lib/Makefile.tau-mpi-openmp -tau_options=-optCompInst simple.c
      mpirun -np 4 tau_exec -io ./a.out

      pprof
      paraprof
       

    • 12:00 13:00
      Lunch break 1h
    • 13:00 14:40
      Advanced MPI: User-defined datatypes

      Explaining user defined datatypes, used for communication purposes, that are required for advanced usage od MPI-I/O. This feature is particularly useful to library writers.

      Convener: Leon Kos (University of Ljubljana, Faculty of Mechanical Engineering)

      Exercise 1

      cd MPI
      cp tasks/C/Ch12/derived-contiguous-skel.c 04
      cd 04
      gedit derived-contiguous-skel.c
      mpicc derived-contiguous-skel.c
      srun -n 4 --partition=haswell ./a.out

      Exercise 2

      1662  ls tasks/C/Ch12/
       1663  cp tasks/C/Ch12/derived-contiguous-skel.c 04
       1664  ls 04
       1665  cd 04
       1666  gedit derived-contiguous-skel.c
       1667  bg
       1668  man MPI_Type_contiguous
       1669  mpicc derived-contiguous-skel.c
       1670  env --unset=LD_PRELOAD srun -n 3 --partition=haswell ./a.out
       1671  mpicc derived-contiguous-skel.c
       1672  env --unset=LD_PRELOAD srun -n 3 --partition=haswell ./a.out
       1673  cd ..
       1674  cp tasks/C/Ch12/derived-struct-skel.c 04
       1675  cp tasks/C/Ch12/solutions/derived-struct.c 04
       1676  cd 04
       1677  diff -u derived-struct-skel.c derived-struct.c | less
       1678  emacs derived-struct-skel.c derived-struct.c &

       

    • 14:40 15:00
      Break 20m
    • 15:00 17:00
      Parallel File I/O with MPI

      MPI I/O is an API standard for parallel I/O that allows multiple processes of a parallel program to access data in a common file simultaneously. MPI I/O maps I/O reads and writes to message-passing sends and receives. Implementing parallel I/O can improve the performance of your parallel application.

      Convener: Leon Kos (University of Ljubljana, Faculty of Mechanical Engineering)

      Exercises

       1682  cd $HOME/MPI
       1683  ls tasks/Ch13/
       1684  ls tasks/C/Ch13/
       1685  cp tasks/C/Ch13/mpi_io_exa1_skel.c 05
       1686  cp tasks/C/Ch13/mpi_io_exa[23]_skel.c 05
       1687  cp tasks/C/Ch13/solutions/mpi_io_exa[123].c 05
       1688  cd 05
       1689  ls -la
       1690  gedit mpi_io_exa1_skel.c &
       1691  man MPI_File_write_at
       1692  man MPI_File_open
       1693  man MPI_File_write_at
       1694  mpicc mpi_io_exa1_skel.c
       1695  srun -n 4 --partition=haswell ./a.out
       1696  ls -la
       1697  cat my_test_file
       1698  sinfo
       1699  srun -n 4 -N 4 --partition=haswell ./a.out
       1700  cat my_test_file
       1701  srun -n 40 -N 4 --partition=haswell ./a.out
       1702  ls
       1703  cat my_test_file

    • 09:00 10:40
      Parallel programming: accelerators

      Transformation from a serial code, containing a loop towards paralleled version, where the iterations inside the loop are distributed into accelerated parallel processes. Study of the code and explanation of how-to’s.

      Convener: Leon Bogdanović (University of Ljubljana, Faculty of Mechanical Engineering, LECAD lab)
    • 10:40 11:00
      Break 20m
    • 11:00 12:00
      CUDA and OpenCL optimization

      Improving GPU code for performance.
      Profiling and debugging

      Convener: Leon Bogdanović (University of Ljubljana, Faculty of Mechanical Engineering, LECAD lab)
    • 12:00 13:00
      Lunch break 1h
    • 13:00 17:00
      Hand on exercises: Simulations, optimization, and visualizations in natural and technical sciences (physics, chemistry, biology, mathematics, mechanical engineering, material science)

      Selected examples (heat transfer and kinetic particle code) refactoring and benchmarking under heterogeneous architecture.
      Implementation of parallel processing inside finite element method.
      Installing Particle in cell codes on HPC, simulations small plasma devices, working with input plasma parameters, benchmark cases with CPU and GPU.

      Conveners: Borut Černe (University of Ljubljana, Faculty of Mechanical Engineering) , Ivona Vasileska (University of Ljubljana, Faculty of Mechanical Engineering)

      SIMPIC 

      git clone https://bitbucket.org/lecad-peg/simpic.git

      module load Python/3.7.4-GCCcore-8.3.0 #HOME DIRECTORY

      module load OpenMPI/3.1.4-gcccuda-2019b #HOME DIRECTORY

      python3 -m venv simpyenv #HOME DIRECTORY

      source simpyenv/bin/activate #HOME DIRECTORY

      python3 -m pip install --upgrade pip sphinx_rtd_theme #HOME DIRECTORY

      pip install numpy scipy pandas matplotlib #HOME DIRECTORY

      pip list #HOME DIRECTORY

      cd simpic

      cd GPU_mover_fields #GPU version

      make

      ./runsimpic.sh 2> run.log

      For CPU versions first is needed to install STARPU

      wget https://files.inria.fr/starpu/starpu-1.3.4/starpu-1.3.4.tar.gz

      tar xvf starpu-1.3.4.tar.gz

      cd starpu-1.3.4

      mkdir build

      cd build

      ../configure --prefix=$HOME/starpu

      make

      make install

      export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$HOME/starpu/lib/pkgconfig #IN SIMPIC

      export LD_LIBRARY_PATH=$HOME/starpu/lib:$LD_LIBRARY_PATH #IN SIMPIC

      export PATH=$PATH:$HOME/starpu/bin #IN SIMPIC

    • 14:40 15:00
      Break 20m