Parallel and Scalable Machine Learning @ JSC

CET
Description

The course offers basics of analyzing data with machine learning and data mining algorithms in order to understand foundations of learning from large quantities of data. This course is especially oriented towards beginners that have no previous knowledge of machine learning techniques. The course consists of general methods for data analysis in order to understand clustering, classification, and regression. This includes a thorough discussion of test datasets, training datasets, and validation datasets required to learn from data with a high accuracy. Easy application examples will foster the theoretical course elements that also will illustrate problems like overfitting followed by mechanisms such as validation and regularization that prevent such problems.

The tutorial will start from a very simple application example in order to teach foundations like the role of features in data, linear separability, or decision boundaries for machine learning models. In particular this course will point to key challenges in analyzing large quantities of data sets (aka ‘big data’) in order to motivate the use of parallel and scalable machine learning algorithms that will be used in the course. The course targets specific challenges in analyzing large quantities of datasets that cannot be analyzed with traditional serial methods provided by tools such as R, SAS, or Matlab. This includes several challenges as part of the machine learning algorithms, the distribution of data, or the process of performing validation. The course will introduce selected solutions to overcome these challenges using parallel and scalable computing techniques based on the Message Passing Interface (MPI) and OpenMP that run on massively parallel High Performance Computing (HPC) platforms. The course ends with a more recent machine learning method known as deep learning that emerged as a promising disruptive approach, allowing knowledge discovery from large datasets in an unprecedented effectiveness and efficiency.

Prerequisites:
Knowledge on job submissions to large HPC machines using batch scripts, knowledge of mathematical basics in linear algebra helpful.

Participants should bring their own notebooks (with an ssh-client).

Learning outcome:
After this course participants will have a general understanding how to approach data analysis problems in a systematic way. In particular this course will provide insights into key benefits of parallelization such as during the n-fold cross-validation process where significant speed-ups can be obtained compared to serial methods. Participants will also get a detailed understanding why and how parallelization provides benefits to a scalable data analyzing process using machine learning methods for big data and a general understanding for which problems deep learning algorithms are useful and how parallel and scalable computing is facilitating the learning process when facing big datasets. Participants will learn that deep learning can actually perform ‘feature learning’ that bears the potential to significantly speed-up data analysis processes that previously required much feature engineering.

Application
Registration is closed. Applicants will be notified, whether they are accepted for participitation.

Instructor: Prof. Dr. Morris Riedel, JSC

Contact
For any questions concerning the course please send an e-mail to m.riedel@fz-juelich.de

    • 09:00 10:30
      Lecture 1 - Introduction to Machine Learning Fundamentals Rotunda

      Rotunda

    • 10:30 11:00
      Coffee break 30m
    • 11:00 12:00
      Lecture 2 - PRACE and Parallel Computing Basics Rotunda

      Rotunda

    • 12:00 13:30
      Lunch Break 1h 30m Seecasino

      Seecasino

    • 13:30 15:00
      Lecture 3 - Unsupervised Clustering and Applications Rotunda

      Rotunda

    • 15:00 15:30
      Coffee Break 30m
    • 15:30 16:30
      Lecture 4 - Unsupervised Clustering Challenges and Solutions Rotunda

      Rotunda

    • 19:00 22:20
      Dinner 3h 20m Restaurant "Am Hexentrum", Grosse Rurstr. 94, Jülich

      Restaurant "Am Hexentrum", Grosse Rurstr. 94, Jülich

    • 09:00 10:30
      Lecture 5 - Supervised Classification and Learning Theory Basics Rotunda

      Rotunda

    • 10:30 11:00
      Coffee Break 30m
    • 11:00 12:00
      Lecture 6 - Classification Applications, Challenges and Solutions Rotunda

      Rotunda

    • 12:00 13:30
      Lunch Break 1h 30m Seecasino

      Seecasino

    • 13:30 15:00
      Lecture 7 - Support Vector Machines and Kernel Methods Rotunda

      Rotunda

    • 15:00 15:30
      Coffee Break 30m
    • 15:30 16:30
      Lecture 8 - Practicals with SVMs Rotunda

      Rotunda

    • 09:00 10:30
      Lecture 9 - Validation and Regularization Techniques Rotunda

      Rotunda

    • 10:30 11:00
      Coffee Break 30m
    • 11:00 12:00
      Lecture 10 - Practicals with Validation and Regularization Rotunda

      Rotunda

    • 12:00 13:30
      Lunch Break 1h 30m Seecasino

      Seecasino

    • 13:30 15:00
      Lecture 11 - Parallelization Benefits Rotunda

      Rotunda

    • 15:00 15:30
      Coffee Break 30m
    • 15:30 16:30
      Lecture 12 - Cross-Validation Practicals Rotunda

      Rotunda

Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×