Description
The increasing amount of scientific data being collected through sensors or computational simulations may take advantage of new analytics techniques for being processed in order to extract new meanings out of raw data. The purpose of this workshop is to present scientists tools and techniques, open issues, recent developments, applications and enhancements for MapReduce, and similar systems. Over the years, MapReduce has become one of the main programming models of choice for processing large data sets. Although it was originally developed for processing web information, the technique has gained a lot of attention from the scientific community for its applicability in large parallel data analysis. Participants will learn how to combine tools and techniques from statistics and computer science to solve their problems more efficiently. The course will consist of introductory lectures held by guest data-analyst experts, and hands-on sessions.
Target Audience
Students, PhD, and researchers in computational sciences and scientific areas with different backgrounds, looking for new technologies and methods to process and analyse large amount of data. Participants must have basic knowledge in programming with Python.
Topics
● Basic principles of Python, MapReduce, and technologies like Hadoop and Spark.
● Basic understandings for problem analysis and optimization.
● Project design and strategies for building a scalable data analysis application.
About half of the course will consist of practical hands-on sessions. The programme will include one invited talk from a guest speaker working in the field.
Benefits
After the course the participants should be able to work with Hadoop and related libraries, writing application in Python with the basic features to execute and optimize on the described system.
By the end of this course students should be able to:
● understand the MapReduce algorithm
● run a Python MapReduce program
● improve development skills on Python language
● improve their prospects when submitting project applications for requesting resources from providers such as PRACE or other agencies.
Long description
Data deluge is a main focus problem for data analytics. Big Data is real and spreading over all the data fields: science, commerce and any other information-handling activity.
Many solutions have been proposed by the most advanced data-oriented companies (e.g. Google, Amazon, Yahoo, etc.) and some open-source projects have reached a level of maturity high enough to be integrated into your own data cluster or infrastructure. This workshop is oriented to any person with development skills and expertise on UNIX systems which would like to explore the new paradigms and forefront technologies available in the data analytics field.
The agenda includes two days of sessions. Solving Big Data problems requests at first understanding what the new challenges and the real background of data analytics are. For this reason the first session gives an introduction and basic definitions of the arguments. As Python is recognized to be one of the most powerful high-level programming languages available for data science, it will be used for hands on and examples.
Running data analytics collaboratively for processing Big Data requires the knowledge of MapReduce algorithm and its most famous implementation Apache Hadoop. The second day of the workshop aims to introduce real Python implementations of MapReduce examples. Also YARN and HARP will be described as fundamental bricks of the basic stack of a Big Data application.
Case studies covering different scientific fields, including Genomic and Bioinformatics, will be presented and further discussed.
Grant
WARNING: UNFORTUNATELY WE REACHED THE MAXIMUM OF STUDENTS ADMITTED, PLEASE WRITE TO THE SUPPORT TO BE ADDED IN THE RESERVE LIST.
Grant
The lunch for the two days will be offered to all the participants and some grants are available.
The only requirement to be eligible is to be not funded by your institution to attend the course and to work in an institute outside the Bologna area.
The grant will be 200 euros for students working outside Italy and 100 euros for students working in Italy.
Some documentation will be required and the grant will be paid only after a certified presence of minimum 80% of the lessons.
Further information about how to request the grant, at the confirmation of the course: about 3 weeks before the starting date.
Further information about how to request the grant, at the confirmation of the course: about 3 weeks before the starting date.
WARNING: UNFORTUNATELY WE REACHED THE MAXIMUM OF STUDENTS ADMITTED, PLEASE WRITE TO THE SUPPORT TO BE ADDED IN THE RESERVE LIST.