18 May 2017
T-Systems
Europe/Zurich timezone
hpc-ch forum

Session

Big Data tools for Astrophysics

18 May 2017, 11:45
Room: New York (T-Systems)

Room: New York

T-Systems

T-Systems Kloten (Balsberg), Balz-Zimmermann-Strasse 7, CH-8302 Kloten

Conveners

Big Data tools for Astrophysics

  • Rok Roskar (ETH Zurich)

Description

Astrophysical simulations have been a constant presence on HPC clusters around the world for many years. The computational power is now so large that the biggest production runs can easily generate datasets of hundreds of TB in just a few days. The challenge to efficiently post-process the data is significant, because the data can no longer fit in memory and the usual domain tools very cumbersome to use. To address this problem, and the problem of “big data” analysis on HPC clusters in general, we have undertaken a project to try and bring together the benefits of HPC with the ease-of-use of Big Data frameworks. We have developed an analysis code built on top of Apache Spark to analyze 200+TB outputs from recent state-of-the-art cosmological simulation run on Piz Daint at CSCS. Spark is used for orchestration of work and collection of intermediate results; highly-optimized domain code is used for the main part of the computation. In addition, we have developed a tool that allows for quick and easy deployment and monitoring of Spark clusters on HPC infrastructure. I will discuss the issues inherent in combining scientific codes with Big Data frameworks and the approaches we used to overcome them, from the perspective of both software and hardware.

Presentation materials

Building timetable...