Storage Technologies and Data Management

Europe/Zurich
CSCS Conference Room Ground Floor (CSCS - Swiss National Supercomputing Centre)

CSCS Conference Room Ground Floor

CSCS - Swiss National Supercomputing Centre

Via Trevano 131 CH-6900 Lugano Switzerland
Michele De Lorenzi (CSCS), Stefano Gorini (CSCS)
Description
Introduction
Storage and data management is the key point for most IT services and especially in HPC. The amounts of data created and consumed in science increase almost exponentially year-on-year and storage capacity never lasts long enough. As administrators of data centers we face on a daily basis the need of a constant technology life cycle in order to handle upgrades and fulfill new requirements related for example to the implementation of FAIR principles (findable, accessible, interoperable, reusable).
In this topic forum we would like to get an overview of the trends in storage technologies (online, offline) and better understand the new requirements related to the management of data.

Key Questions
We invite you to discuss with us the following questions:
  • FAIR principles: How will they impact us?
  • Persistent Identifier: Science, Research or a much wider scope?
  • Old fashion POSIX versus the most popular Object Storage: Both, One or None of them in the next future?
  • Online Storage, Offline/Tape or new upcoming storage technologies? What do we foresee?
  • Cloud, Delocalisation or Centralisation of storage? Is it really important where the real data is?
  • Distributed Storage and Security, are these two compatible?
  • Bandwidth versus Storage Capacity, is it a problem?
Participants
  • Alex Upton
  • Alexander Kashev
  • Allen Neeser
  • Caterina Barillari
  • Christian Bolliger
  • Colin McMurtrie
  • Derek Heinrich Feichtinger
  • Diana Coman Schmid
  • Diego Moreno
  • Gianfranco Sciacca
  • Giuseppe Lo Re
  • Hardik Kothari
  • Henry Luetcke
  • Jani Heikkinen
  • Jiri Kuncar
  • Luca Cervigni
  • Mario Valle
  • Markus Reinhardt
  • Martin Jacquot
  • Mattia Belluco
  • Michael Rolli
  • Michele De Lorenzi
  • Nick Holway
  • Nico Faerber
  • Nicolas Buchschacher
  • Nikolaos Apostolakos
  • Olivier Byrde
  • Pablo Fernandez
  • Patrick Zosso
  • Pierre Berthier
  • Pierre Dubath
  • Raluca Hodoroaba
  • Roberto Aielli
  • Roberto Fabbretti
  • Rémy Ressegaire
  • Silvan Hostettler
  • Simon Leinen
  • Sofiane Sarni
  • Stefano Gorini
  • Szymon Gadomski
  • Thomas Kramer
  • Xavier Espinal
  • Yves Revaz
    • 09:30
      Coffee and registration
    • Welcome and introduction
      Conveners: Michele De Lorenzi (CSCS), Stefano Gorini (CSCS)
      Picture
      Picture
    • Keynote Presentation. Research Data Management at ETH Zurich: Covering the Data Life Cycle from Planning to Publication

      Research data management has become a challenge for many scientists, due to the increase in size and complexity of data sets as well as more stringent requirements from funding agencies, journals and institutions. Tools, education and support for data management in academic institutions are thus becoming a necessity.

      This presentation will describe data management services provided to scientists at ETH Zurich by Scientific IT Services (SIS) and the ETH Library. To cover the complete data life cycle, SIS supports scientists with their daily data management activities, while the ETH Library supports the later stages of data publication, long term preservation and sharing. Both units also provide joint training and consulting on data management planning.

      In addition, we will provide an overview of openBIS, the comprehensive data management solution developed by SIS, and describe how openBIS integrates with the HPC infrastructure at ETH Zurich.

      Conveners: Caterina Barillari (ETH Zurich), Henry Lütcke (ETH Zurich)
      Picture
      slides
    • Keynote Presentation. Your Data Deserves a Permanent Identifier (PID)

      Too often, data is considered second-class compared to publications and papers regarding academic recognition. Too often scientists consider their data as fire-and-forget weapons: useful for the publication and then thrown away. However, data do not necessarily has to be considered this way. There are interesting examples of discoveries made by browsing existing data or surprising results from old data reanalysis. The common premise for these data recycling successes is to find and unambiguously identify the data.

      CSCS has been appointed as the Swiss administrator and resolver for Permanent Identifier (PID). PID will play a role similar to DOI in the publication world to identify data and scientists could use this new CSCS service, available soon, to identify data objects regardless of their location, associate metadata to them and claim authorship.

      Convener: Mario Valle (CSCS)
      Picture
      slides
    • Data Storage, Management and Access Evolution: a Glimpse into the Future

      Storage is one of the main challenges for the next decade in Scientific Computing. Evolving the current data management and access models is fundamental to keep providing the required infrastructures for the future scientific community needs.
      The “storage challenge” is multi-dimensional and computing facilities are envisioning a change of scale in data growth and data access that are forcing us to re-evaluate the current deployment and models.
      Most of the nowadays hot topics are common to all of us and they are the constituent parts of this presentation: data redundancy costs, bandwidth and networking optimisation, data federations and data caching, catering with application access patterns (from high throughput to high I/O), distributed file systems, future of backup technologies (tape market on the spotlight), possible roles of IaaS (cloud), evolution of Auth/Authz.

      Convener: Xavier Espinal (CERN)
      Picture
      slides
    • 12:15
      Lunch and networking
    • Simplified Multi-Tenant for Data Driven Personalized Health Research

      Personalized Health Research is data driven and it requires powerful Data and IT infrastructures to securely store, manage, compute on and share sensitive personal data. For this, the Scientific IT Services of the ETH Zurich has built Leonhard Med - a secure and high performance IT platform to support data driven biomedical research.

      In this context, system administrators and vendors need to implement technical solutions that preserve the confidentiality and privacy of the data. At the same time, it is required to share some common resources such as the file system or the network. Lustre provides multi-tenancy, subdirectory mounts, the SSK feature and the great flexibility of LNET. This set of features can appropriately address some of the security layers required in the context of personalized health.

      In this presentation we will show how the Leonhard Med platform at ETH Zurich tries to make use of some of these Lustre features. The focus is to maximize the simplicity for the setup and administration of the file system while keeping all of the advantages provided by Lustre.

      Convener: Diego Moreno (ETH Zurich)
      Picture
      slides
    • BeeGFS: the HPC Storage Solution Adopted at the Geneva Observatory

      BeeGFS is a parallel cluster file system initially developed at the Fraunhofer-Institut für Techno-und Wirtschaftsmathematik. Owing to its scaling performances, gratuity, and ease-of-use it offers an interesting alternative to well established parallel file systems like Lustre or GPFS.
      In this talk, after a short description of the BeeGFS file system, I will present in more details the technical solution recently adopted at the Geneva Observatory relying. This solution is planed to be used both as efficient HPC storage but also as long term storage.

      Convener: Yves Revaz (EPFL)
      Picture
      slides
    • Community Development
      Picture
    • 14:40
      Coffee Break
    • Farewell and end of the meeting
      Picture