8-11 February 2016
BSC, Barcelona UPC, Campus Nord
CET timezone

NOTICE: From 14th of January all aplicants will be put on waiting list and places cannot be confirmed. If any places become available, we will inform you on a first come - first served basis.

Please do not forget to bring your laptop on the couorse.


Course Convener: Maria-Ribera Sancho

Objectives: The course brings together key information technologies used in manipulating, storing, and analysing data including:

  • the basic tools for statistical analysis,
  • techniques for parallel processing,
  • tools for access to unstructured data,
  • storage solutions.

Learning outcomes: Students will be introduced to systems that can accept, store, and analyse large volumes of unstructured data. The learned skills can be used in data intensive application areas.

Level: For trainees with some theoretical and practical knowledge


Day 1 08/02:  Introduction (Vassil Alexandrov)
Session 1: 9:30am – 1pm

  1. Data Science current trends session will focus on results of the latest key studies both and Europe and the USA an in the area of Data Science and outline the major trends, findings and recommendations.

Coffee break 11:00- 11:30

  1. Data Science definitions and mathematical foundations introduction. 

While tackling Big Data problems in many cases elementary or standard statistical approaches fail. New research methods are required to be developed to tackle such problems. Therefore this session will focus key research methods and approaches for Data Science, ranging from theory creating and theory testing approaches to conceptual-analytical approaches and experimental ones, that are able to lead to discovering global properties on data  These will be mainly deterministic and hybrid (stochastic/deterministic) methods and algorithms.
Session 2: 2pm – 6pm

  1. This session will focus on several key methods and algorithms (both serial and parallel) that enable to discover global properties on data while dealing with Big Data:
    • Network Science
    • Multi Constrained and Multi-Objective Optimization
    • Examples of using the above approaches
  2. Examples using the above approaches and some hands-on exercise

Coffee break 16:00 – 16:30

  1. Social Simulation Applications (Josep Casanovas)

Day 2 09/02:
Session 1: 9:30am – 1pm Data sharing (Anna Queralt)

  1. In this session we will provide an overview on current Open Data and data sharing approaches.

Usually, when talking about Big Data, the emphasis is put on how to efficiently store and analyse huge amounts of data. However, only when data from independent sources is combined it is possible to gain insights that would be impossible to obtain by analysing each dataset separately. Thus, it is essential that data, either public or private, is shared so that researchers, students, app developers or citizens in general can extract as much value as possible from it.
Coffee break 11:00- 11:30

  1. Hands-on exercise

Session 2: 2pm – 5pm Data analytics with Apache Spark - part 1 (Mario Macias)

In the recent years, Apache Spark has emerged as one of the most promising technologies for large-scale data processing in a fast and general way, with “programmer-friendly” interfaces and official bindings for many of the most used languages (Java, Scala, Python and R), extensive documentation and development tools. In addition, overcomes other MapReduce engines by 10x to 100x in terms of performance. This course introduces Apache Spark, as well as some of its core libraries for data manipulation, machine learning, graph analytics, etc.

  1. Introduction to the core concepts of Apache Spark: RDDs and Basic Data Access.
  2. Hands on: get the most frequent term from a text.
  3. Processing semi-structured data with Spark SQL.
  4. Hands on: statistical processing from Data Sheets.

Coffee break 16:00 - 16:30

  1. Multidisciplinary research and data analytics: Smart Cities (Maria Cristina Marinescu)

Day 3 10/02
Session 1: 9:30am – 1pm Data analytics with Apache Spark - part 2 (Mario Macias)

  1. Machine learning with Spark ML.

Coffee break 11:00- 11:30

  1. Hands on: clustering images according to their tags.

Session 2: 2pm – 6pm (Jordi Torres)

  1. Hello World in TensorFlow

If you want to learn how to start to program Deep Neural Networks, working with TensorFlow is an excellent way to start. TensorFlow is a machine learning library, which aims to bring large-scale, distributed machine learning and deep learning to everyone, open-sourced last November by Google. This tutorial will takes you through the TensorFlow programming model one step at a time.

  1. Hands-on exercises: beginning with basic machine learning models before moving on to a deep neural network, you will try out programming concepts as you learn them.

Coffee break 16:00 – 16:30
Day 4 11/02:
Session 1: 9:30am – 1pm (Alberto Abello)

  1. Big Data Management
  2. Big Data has many definitions and facets, we'll pay attention to the problems we have to face to store it and how we can process it. More specifically, we'll focus on the Apache Hadoop ecosystem and its two basic components, namely HBase and MapReduce engine.

Coffee break 11:00- 11:30

  1. Hands-on exercise

Session 2: 2pm – 6pm  Big (Javier Espinosa)

  1. Data Visualisation

Data visualizations are everywhere and are more important than ever. From creating a visual representation of data points as part of an executive presentation, to showcasing progress, or visualizing concepts for customer segments, data visualizations are a critical and valuable tool in many different situations. When it comes to big data, weak tools with basic features do not cut it so specific techniques should be applied. This course will address different techniques for visualizing big data collections including a vision of the visualization process as a complex and greedy task and then as out of the box solution that can help to analyse and interpret big data collection.
Coffee break 16:00 – 16:30

  1. Hands-on exercise


BSC, Barcelona UPC, Campus Nord
VX208, Vertex building
details at http://bsc.es/education