Introduction to Spark for Data Scientists @ EPCC at Alan Turing Institute London
→
Europe/London
Enigma (EPCC at The Alan Turing Institute)
Enigma
EPCC at The Alan Turing Institute
The Alan Turing Institute,
2QR,
96 Euston Rd,
London
NW1 2DB
Description
Apache Spark is an open-source framework for cluster computing, ideal for large-scale parallel data processing, that is designed for performance and ease-of-use. It is faster and simpler to use than Hadoop MapReduce, providing a rich set of APIs in Python, Java and Scala.
This hands-on course will cover the following topics:
- Introduction to Spark
- Map, Filter and Reduce
- Running on a Spark Cluster
- Key-value pairs
- Correlations, logistic regression
- Decision trees, K-means
Sessions
10:00 - 17:30 (Thu)
10:00 - 15:30 (Fri)
Attendees will be provided with access to EPCC's Tier2 Cirrus system for all practical exercises.
The practicals will be done using Jupyter notebooks so a basic knowledge of Python would be extremely useful.
Registration: Registration has been closed as the course is full with a long waiting list.
Timetable
Full timetable and course materials.
The agenda of this meeting is empty