Introduction to Spark for Data Scientists @ EPCC at Alan Turing Institute London

Enigma (EPCC at The Alan Turing Institute)


EPCC at The Alan Turing Institute

The Alan Turing Institute, 2QR, 96 Euston Rd, London NW1 2DB

Apache Spark is an open-source framework for cluster computing, ideal for large-scale parallel data processing, that is designed for performance and ease-of-use. It is faster and simpler to use than Hadoop MapReduce, providing a rich set of APIs in Python, Java and Scala.

This hands-on course will cover the following topics:

  • Introduction to Spark
  • Map, Filter and Reduce
  • Running on a Spark Cluster
  • Key-value pairs
  • Correlations, logistic regression
  • Decision trees, K-means


10:00 - 17:30 (Thu)
10:00 - 15:30 (Fri)

Attendees will be provided with access to EPCC's Tier2 Cirrus system for all practical exercises.

The practicals will be done using Jupyter notebooks so a basic knowledge of Python would be extremely useful.

Registration: Registration has been closed as the course is full with a long waiting list.


Full timetable and course materials


The agenda of this meeting is empty