This course focuses on the development and execution of bioinformatics pipelines and on their optimization with regards to computing time and disk space. In an era where the data produced per-analysis is in the order of terabytes, simple serial bioinformatic pipelines are no longer feasible. Hence the need for scalable, high-performance parallelization and analysis tools which can easily cope with large-scale datasets. To this end we will study the common performance bottlenecks emerging from everyday bioinformatic pipelines and see how to strike down the execution times for effective data analysis on current and future supercomputers.
As a case study, two different bioinformatics pipelines (whole-exome and transcriptome analysis) will be presented and re-implemented on the supercomputers of Cineca thanks to ad-hoc hands-on sessions aimed at applying the concepts explained in the course.
NGS-data, big-data analysis, code parallelization, MPI, running a bioinformatics pipeline, large-scale sample datasets.
Biologists, bioinformaticians and computer scientists interested in approaching large-scale NGS-data analysis for the first time.
Basic knowledge of python and command line. A very basic knowledge of biology is recommended but not required.
The lunch for the three days will be offered to all the participants and some grants are available. The only requirement to be eligible is to be not funded by your institution to attend the course and to work or live in an institute outside the Roma area. The grant will be 300 euros for students working and living outside Italy and 150 euros for students working and living in Italy (except Milano area). Some documentation will be required and the grant will be paid only after a certified presence of minimum 80% of the lectures.
Further information about how to request the grant, will be provided at the confirmation of the course: about 3 weeks before the starting date.