HPC Configuration Management

Europe/Zurich
Akademie I (Empa)

Akademie I

Empa

Überlandstrasse 129 8600 Dübendorf
Carlo Pignedoli (Empa), Christian Bolliger (ETH Zurich), Daniele Passerone (Empa), Edoardo Baldi (Empa), Michele De Lorenzi (CSCS)
Description

Introduction

The often untold story about HPC is configuration management. When it comes to administering even small HPC clusters, configuration management is key. Since this happens in the background, most users are not aware of all the efforts to keep a cluster up-to-date, all nodes in sync, and to react to users' demands for specific settings.

There are almost as many ways to fulfill this task, as there are HPC systems. 

Key Questions

In the next hpc-ch forum we will discuss questions like:

  • which configuration management tools have been proved to be useful?
  • which caveats can be found in managing HPC clusters?
  • is it preferable to use preconfigured configurations or to define the configuration management from the scratch? Is there a middle ground?
  • which role are configuration management databases (CMDB) playing?
  • which new aspects in configuration management have been brought-in by containerization and container orchestration (e.g. Kubernetes)? 
  • what are your experiences with configuration management systems? 

 

Participants
  • Adam Henderson
  • Adrien ALBERT
  • Albert Glensk
  • Alexander Kashev
  • Alexandre Wetzel
  • Alvise Dorigo
  • Andrei Plamada
  • Arnaud Fortier
  • Azar Feyziyev
  • Bastian Bukatz
  • Carlo Antonio Pignedoli
  • Christian Bolliger
  • Daniele Passerone
  • Diego Moreno
  • Edoardo Baldi
  • Enrico Favero
  • Filippo Stenico
  • Gianfranco Sciacca
  • Guillermo Losilla
  • Hardik Kothari
  • Heinrich Billich
  • Jani Heikkinen
  • Jean-Baptiste Aubort
  • Johann FLEURY
  • Jonas Liechti
  • Julia Gustavsen
  • Jörg Ruppe-Tanner
  • Lento Manickathan
  • Marco Maslon
  • Maria Grazia Giuffreda
  • Mario Jurcevic
  • Martin Jacquot
  • Michael Rolli
  • Michele De Lorenzi
  • Miguel Gila
  • Nick Holway
  • Olivier Byrde
  • Pablo Fernandez
  • Pablo Llopis Sanmillan
  • Pierre Berthier
  • Radim Janalik
  • Remy Ressegaire
  • Ricardo Silva
  • Roman Briskine
  • Simba Nyamudzanga
  • Sofiane Sarni
  • Stefan Weber
  • Tayebeh Khoshroonemati
  • Thomas Kramer
  • Tiziano Barbari
  • Ulrich Tehrani
  • Urban Borštnik
  • Victor Holanda Rusu
  • Yann sagon
    • 09:30 10:00
      Welcome Coffee & Registration
    • 10:00 10:15
      Welcome and Introduction 15m
      Speakers: Daniele Passerone (Empa), Michele De Lorenzi (CSCS)
    • 10:15 10:45
      Using SaltStack to Manage a Cluster Centrally 30m

      SaltStack is a way to apply "states" to a bunch of servers. It is easy to customize the states based on a central configuration files (pillar) or nodes spec (grains). We coupled this software with kickstart and clush to be able to re install a node or server untenanted. This permits to re-install the whole cluster on a regular basis instead of updating it.

      Speaker: Yann Sagon (University of Geneva)
    • 10:45 11:15
      Experiences Deploying Slurm Clusters on OpenStack using MagicCastle 30m

      Magic Castle is project that aims to replicate Compute Canada’s user experience in public clouds. Here we present the four different configuration management tools used in the project and what they are used for.

      Speaker: Victor Holanda Rusu (CSCS)
    • 11:15 11:45
      A Look at Some Scalable HPC Configuration Management Tools 30m

      Going through the history of recent CSCS’ large supercomputers, we look at how the CM tools have evolved and how we work with them to get our systems to be configured efficiently at scale.

      Speaker: Miguel Gila (CSCS)
    • 11:45 12:15
      Configuration Management on a Secure OpenStack Environment 30m

      Managing secure scientific infrastructure is challenging. We present the constraints encountered during deployment of secure OpenStack environment and configuration management tools introduced to address them.

      Speaker: Jani Heikkinen (University of Basel)
    • 12:15 13:30
      Lunch and Networking 1h 15m
    • 13:30 14:30
      Community Development 1h

      The session will be dedicated to:

      • Members Update after Two years of Pandemic: Latest Developments in our Organization.
        Members are kindly invited to give a 10-min presentation on the current activity status.

      • Selection of topics of interest, themes and location for future forums.

      Speakers: Christian Bollliger (ETH Zurich), Michele De Lorenzi (CSCS)
    • 14:30 15:30
      Guided Tour - Automated Driving Sensor Testing Vehicle 1h

      Autonomous vehicles have the potential to positively influence future traffic behavior. Nevertheless, the new technology opens social, legal, economical and technical questions. A significant technical question concerns the necessary minimum technical requirements for self-driving cars to make them street legal. We focus in testing the autonomous driving sensors in real world driving under different weather, light and contamination conditions. Storage and computational resources are important for our research, since we are producing a lot of data and we need computational power in order to execute the perception algorithms for AD and to analyze the performance of them.

      Speakers: Christian Hohl (Empa), Dejan Milojevic (Empa)
    • 15:30 15:45
      Farewell and End of the Meeting 15m