October 31, 2019 to November 1, 2019
EPCC at The Alan Turing Institute
Europe/London timezone

Apache Spark is an open-source framework for cluster computing, ideal for large-scale parallel data processing, that is designed for performance and ease-of-use. It is faster and simpler to use than Hadoop MapReduce, providing a rich set of APIs in Python, Java and Scala.

This hands-on course will cover the following topics:

  • Introduction to Spark
  • Map, Filter and Reduce
  • Running on a Spark Cluster
  • Key-value pairs
  • Correlations, logistic regression
  • Decision trees, K-means

Sessions

10:00 - 17:30 (Thu)
10:00 - 15:30 (Fri)

Attendees will be provided with access to EPCC's Tier2 Cirrus system for all practical exercises.

The practicals will be done using Jupyter notebooks so a basic knowledge of Python would be extremely useful.

Registration: Registration has been closed as the course is full with a long waiting list.

Timetable

Full timetable and course materials

 

Starts
Ends
Europe/London
EPCC at The Alan Turing Institute
Enigma
The Alan Turing Institute, 2QR, 96 Euston Rd, London NW1 2DB

This course is part-funded by the PRACE project and is free to all. Please register using the online form. If you have any questions, please consult the course forum page or contact support@archer.ac.uk.