Jan 10 – 11, 2019
Europe/London timezone

Apache Spark is an open-source framework for cluster computing, ideal for large-scale parallel data processing, that is designed for performance and ease-of-use. It is faster and simpler to use than Hadoop MapReduce, providing a rich set of APIs in Python, Java and Scala.

This hands-on course will cover the following topics:

  • Introduction to Spark
  • Map, Filter and Reduce
  • Running on a Spark Cluster
  • Key-value pairs
  • Correlations, logistic regression
  • Decision trees, K-means


09:30 - 17:30 (Thu)
09:30 - 15:30 (Fri)

Attendees will be provided with access to EPCC's Tier2 Cirrus system for all practical exercises.

The practicals will be done using Jupyter notebooks so a basic knowledge of Python would be extremely useful.


Full timetable and course materials

Bayes G.03
The University of Edinburgh Bayes Centre 47 Potterrow Edinburgh EH8 9BT
This course is part-funded by the PRACE project and is free to all. Please register using the online form. If you have any questions, please consult the course forum page or contact support@archer.ac.uk.