Introduction to Spark for Data Scientists @ EPCC

Bayes G.03 (EPCC)

Bayes G.03


The University of Edinburgh Bayes Centre 47 Potterrow Edinburgh EH8 9BT

Apache Spark is an open-source framework for cluster computing, ideal for large-scale parallel data processing, that is designed for performance and ease-of-use. It is faster and simpler to use than Hadoop MapReduce, providing a rich set of APIs in Python, Java and Scala.

This hands-on course will cover the following topics:

  • Introduction to Spark
  • Map, Filter and Reduce
  • Running on a Spark Cluster
  • Key-value pairs
  • Correlations, logistic regression
  • Decision trees, K-means


09:30 - 17:30 (Thu)
09:30 - 15:30 (Fri)

Attendees will be provided with access to EPCC's Tier2 Cirrus system for all practical exercises.

The practicals will be done using Jupyter notebooks so a basic knowledge of Python would be extremely useful.


Full timetable and course materials

The agenda of this meeting is empty
Your browser is out of date!

Update your browser to view this website correctly. Update my browser now