Registration is now open. Please, bring your own laptop. All the PATC courses at BSC are free of charge.
Course Convener: Maria-Ribera Sancho
Objectives: The course brings together key information technologies used in manipulating, storing, and analysing data including:
- the basic tools for statistical analysis
- techniques for parallel processing
- tools for access to unstructured data
- storage solutions
Learning outcomes: Students will be introduced to systems that can accept, store, and analyse large volumes of unstructured data. The learned skills can be used in data intensive application areas.
Level: For trainees with some theoretical and practical knowledge
AGENDA:
Day 1 (Feb 5)
9:30 – 13:00 Introduction to Big Data (Vassil Alexandrov)
Data Science current trends session will focus on results of the latest key studies both in Europe and the USA in the area of Data Science and will outline the major trends, findings and recommendations.
11:00 - 11:30 Coffee break
Data Science definitions and mathematical foundations introduction.
While tackling Big Data problems in many cases elementary or standard statistical approaches fail. New research methods are required to be developed to tackle such problems. Therefore, this session will focus key research methods and approaches for Data Science, ranging from theory creating and theory testing approaches to conceptual-analytical approaches and experimental ones, that are able to lead to discovering global properties on data. These will be mainly deterministic and hybrid (stochastic/deterministic) methods and algorithms.
13:00 – 14:00 Lunch Break
14:00 – 16:00 Introduction to Big Data (Vassil Alexandrov)
This session will focus on several key methods and algorithms (both serial and parallel) that enable to discover global properties on data while dealing with Big Data:
Network Science
Multi Constrained and Multi-Objective Optimization
Examples using the above approaches and some hands-on exercise
16:00 – 16:30 Coffee break
16:30 – 18:00 Harnessing the Power of Big Data and Simulation for Societal Challenges (Josep Casanovas and Isa Romanowska)
The scale of challenges our societies face nowadays calls for innovative and creative methods and solutions. Here, we present the role of simulation in modern scientific practice and its complementarity to data driven applications. Focusing on a relatively novel technique – agent-based modelling we show how by using High Performance Computing enabled methods we can simulate large scale processes and mechanisms driving human societies at different scales. The marriage between Big Data and Agent-based modelling is a particularly promising avenue of research that should be explored further to advance our social science toolset.
Day 2 (Feb 6)
9:30 – 13:00 Data Analytics with Apache Spark (Josep Lluis Berral)
11:00 - 11:30 Coffee break
Apache Spark has become a consolidated technology for large-scale processing in a fast and general way, with “programmer-friendly” interfaces and official bindings for many of the most used languages (Java, Scala, Python and R), extensive documentation and development tools. This course introduces Apache Spark, as well as some of its core libraries for data manipulation, machine learning, data streams and graph analytics.
13:00 – 14:00 Lunch Break
14:00 – 16:00 Data Analytics with Apache Spark. Part 2 (Josep Lluis Berral)
16:00 – 16:30 Coffee break
16:30 – 18:00 Big IoT Project (Dr. Ernest Teniente)
Day 3 (Feb 7)
9:30 – 13:00 Big Data Management (Albert Abelló and Petar Jovanovic)
: Big Data has many definitions and facets, we'll pay attention to the problems we have to face to store it and how we can process it. More specifically, we'll focus on the Apache Hadoop ecosystem and its two basic components, namely HBase and MapReduce engine.
11:00 - 11:30 Coffee break
Hands-on exercise
13:00 – 14:00 Lunch Break
14:00 - 16:00 NoSQL databases (Oscar Romero)
The relational model has dominated data storage systems since the mid 1970s. However, the changing storage needs over the past decade have given rise to new models for storing data, collectively known as NoSQL. In this presentation, we will focus on two of the most common types of NoSQL databases: document-oriented databases and graph databases and explain the use cases suitable for each of them.
16:00 - 16:30 Coffee break
16:30 - 18:00 Multidisciplinary research and data analytics: Smart Cities (Maria Cristina Marinescu)
Day 4 (Feb 8)
9:30 – 13:00 Practical Data Analytics for Solving Real World Problems (Carlos Carrasco)
Data analytics has changed the way we make decisions. We see the benefits and the advances in many fields that go from financial to medical and industrial applications due to the integration of advanced data analytics. In this course we will propose practical tips gained through our experience at BSC in big data analytics projects. We will also discover how to overcome some of the most challenging tasks in practical data analytics.
11:00 - 11:30 Coffee break
END of COURSE