Introduction to Spark for Data Scientists @ EPCC

Europe/London
Bayes G.03 (EPCC)

Bayes G.03

EPCC

The University of Edinburgh Bayes Centre 47 Potterrow Edinburgh EH8 9BT
Description

Apache Spark is an open-source framework for cluster computing, ideal for large-scale parallel data processing, that is designed for performance and ease-of-use. It is faster and simpler to use than Hadoop MapReduce, providing a rich set of APIs in Python, Java and Scala.

This hands-on course will cover the following topics:

  • Introduction to Spark
  • Map, Filter and Reduce
  • Running on a Spark Cluster
  • Key-value pairs
  • Correlations, logistic regression
  • Decision trees, K-means

Sessions

09:30 - 17:30 (Thu)
09:30 - 15:30 (Fri)

Attendees will be provided with access to EPCC's Tier2 Cirrus system for all practical exercises.

The practicals will be done using Jupyter notebooks so a basic knowledge of Python would be extremely useful.

Timetable

Full timetable and course materials

Surveys
Feedback: Intro to Spark for Data Scientists 10-11 Jan 2019 EPCC
The agenda of this meeting is empty
Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×