This course will take place at Queen's University Belfast.
Data Analytics, Data Science and Big Data are a just a few of the many terms used in business and academic research. These refer to the manipulation, processing and analysis of data and are concerned with the extraction of knowledge from data whether for competitive advantage or to provide scientific insight. In recent years, this area has undergone a revolution in which HPC has been a key driver.
This course provides an overview of data science and the analytical techniques that form its basis as well as exploring how HPC provides the power that has driven their adoption. The course will cover: key data analytical techniques such as, classification, optimisation, and unsupervised learning; key parallel patterns, such as Map Reduce, for implementing analytical techniques.
Attendees should be familiar with basic Linux bash shell commands and have some previous experience with Python programming.
Attendees will be given temporary access to the Data Analytics Cluster on ARCHER so will not require to have Python installed on their laptops, but will require the ability to use an ssh connection (using e.g. terminal (Mac/Linux) or putty (Win))
Below is a timetable from a previous run of this course - details may be subject to change.
Day 1
09:00 – 09:30 Arrival/set-up/Welcome
09:30 – 10:30 What are data analytics, big data, data science
10:30 – 11:00 COFFEE
11:00 – 12:00 Data Cleaning
12:00 – 13:00 Practical: Data Cleaning
13:00 – 14:00 LUNCH
14:00 – 14:45 Supervised Learning, feature selection, trees, forests
14:45 – 15:30 Naïve Bayes
15:30 – 16:00 COFFEE
16:00 – 17:00 Naïve Bayes Practical
17:00 CLOSE OF DAY
Day 2
09:00 – 10:30 MapReduce/Hadoop
10:30 – 11:00 COFFEE
11:00 – 11:30 Hadoop demonstrations
11:30 – 12:30 Unsupervised learning
12:30 – 13:30 LUNCH
13:30 – 14:15 Spark
14:15 – 15:00 Data streaming
15:00 – 15:30 COFFEE
15:30 – 16:00 Spark, Data streaming demonstrations
16:00 – CLOSE OF COURSE
Location details including travel directions and maps : http://www.archer.ac.uk/training/locations/ubelfast.php
Course material : http://www.archer.ac.uk/training/course-material/2018/02/data-an-belfast/index.php
Terry Sloan
Terry originally joined EPCC in 1994 and holds the position of Group Manager within EPCC’s Software Development Group. He has extensive experience of managing novel, HPC and Grid projects for Scottish SMEs, UK corporations, European and global collaborations.