Introduction to Spark for Data Scientists @ EPCC at Alan Turing Institute London

Europe/London
Enigma (EPCC at The Alan Turing Institute)

Enigma

EPCC at The Alan Turing Institute

The Alan Turing Institute, 2QR, 96 Euston Rd, London NW1 2DB
Description

Apache Spark is an open-source framework for cluster computing, ideal for large-scale parallel data processing, that is designed for performance and ease-of-use. It is faster and simpler to use than Hadoop MapReduce, providing a rich set of APIs in Python, Java and Scala.

This hands-on course will cover the following topics:

  • Introduction to Spark
  • Map, Filter and Reduce
  • Running on a Spark Cluster
  • Key-value pairs
  • Correlations, logistic regression
  • Decision trees, K-means

Sessions

10:00 - 17:30 (Thu)
10:00 - 15:30 (Fri)

Attendees will be provided with access to EPCC's Tier2 Cirrus system for all practical exercises.

The practicals will be done using Jupyter notebooks so a basic knowledge of Python would be extremely useful.

Registration: Registration has been closed as the course is full with a long waiting list.

Timetable

Full timetable and course materials

 

The agenda of this meeting is empty