This course will take place at University of Portsmouth.
Data Analytics, Data Science and Big Data are a just a few of the many terms used in business and academic research. These refer to the manipulation, processing and analysis of data and are concerned with the extraction of knowledge from data whether for competitive advantage or to provide scientific insight. In recent years, this area has undergone a revolution in which HPC has been a key driver.
This course provides an overview of data science and the analytical techniques that form its basis as well as exploring how HPC provides the power that has driven their adoption. The course will cover: key data analytical techniques such as, classification, optimisation, and unsupervised learning; key parallel patterns, such as Map Reduce, for implementing analytical techniques.
Attendees should be familiar with basic Linux bash shell commands and have some previous experience with Python programming.
Attendees will be given temporary access to the Data Analytics Cluster on ARCHER so will not require to have Python installed on their laptops, but will require the ability to use an ssh connection (using e.g. terminal (Mac/Linux) or putty (Win))
Timetable
Thursday 29th June 2017
09:00 – 09:30 Arrival/set-up/Welcome
09:30 – 10:30 What are data analytics, big data, data science
10:30 – 11:00 COFFEE
11:00 – 12:00 Data Cleaning
12:00 – 13:00 Practical: Data Cleaning
13:00 – 14:00 LUNCH
14:00 – 14:45 Supervised Learning, feature selection, trees, forests
14:45 – 15:30 Naïve Bayes
15:30 – 16:00 COFFEE
16:00 – 17:00 Naïve Bayes Practical
17:00 CLOSE OF DAY
Friday 30th June 2017
09:00 – 10:30 MapReduce/Hadoop
10:30 – 11:00 COFFEE
11:00 – 11:30 Hadoop demonstrations
11:30 – 12:30 Unsupervised learning
12:30 – 13:30 LUNCH
13:30 – 14:15 Spark
14:15 – 15:00 Data streaming
15:00 – 15:30 COFFEE
15:30 – 16:00 Spark, Data streaming demonstrations
16:00 – CLOSE OF COURSE
Course material : http://www.archer.ac.uk/training/course-material/2017/06/dataan-port/index.php