Description
The age of "Big Data" has researched high performance computers and new tools and technologies must be integrated into the software ecosystem of scientist in order to extract knowledge from data. New challenges emerge from the complexities of simulations integrating more physics, more fidelity as simultaneously the memory and storage hierarchies have dramatically increased the difficulty to cope with the large volumes and fast velocity of data. In this tutorial students will learn the best practices and techniques that are crucial to allow them to work with exponentially growing data sizes. Our tutorial will teach students the basics of high performance I/O, analytics, and visualization.
Part I of this tutorial will introduce parallel I/O in general, and middleware libraries that were created to work with "Big Scientific Data". First, we will teach serial and parallel HDF5, and learn how to incorporate this into serial and parallel simulations. Next, we will summarize the lessons and the key techniques that our team gained through years of collaboration with domain scientists working in areas such as fusion, combustion, astrophysics, materials science, and seismology. This experience and knowledge resulted in the creation of ADIOS, a tool that makes scaling I/O easy, portable, and efficient.
In the second part of the tutorial we will discuss ADIOS, and how this has helped many applications move from I/O to compute dominated simulations. We will show the API's which allow ADIOS to utilize different methods to write self-describing files, and achieve high performance I/O. This will be followed by a hands-on session on how to write/read data, and how to use different I/O componentizations inside the ADIOS framework. Part III will teach students how to take advantage of the ADIOS framework to do topology-aware data movement, compression and data staging/streaming using advanced methods.
The session will be conducted by Jeremy Logan and Norbert Podhorszki.
Jeremy Logan is a Computational Scientist at the University of Tennessee and works closely with the Scientific Data Group at Oak Ridge National Laboratory. Jeremy’s research interests include I/O performance, data and workflow management, and the application of domain specific, generative techniques to High Performance Computing.
Norbert Podhorszki is a Research Scientist in the Scientific Data Group at Oak Ridge National Laboratory. He is the lead developer of ADIOS. He works with application users of the Oak Ridge Leadership Facility to improve their I/O performance using ADIOS. His research interest is in how to enable data processing on-the-fly using memory-to-memory data movements, e.g. for speeding up I/O, coupling simulation codes, and building in-situ workflows.