October 3, 2019
WSL Birmensdorf
Europe/Zurich timezone
hpc-ch forum


High Performance Computing systems generate a huge amount of logs and metric data during their operations: information about resources utilization, performance, failures, errors and so on is worth to be stored and analyzed.

This kind of data is often unstructured and not easily comprehensible: finding correlations, recognizing meaningful events, discard false positives is a common challenge all HPC centers have to face.

The reward is worth the effort: post mortem investigation, problems and incidents trouble shooting, security threat hunting, early warning and alerting, applications performance analysis, evaluation of resources utilization are all contexts that take advantage of a careful elaboration of logs and metrics data.

A thorough understanding of the underlying infrastructure producing this information is essential to make sense of it especially considering the complex hardware and software stack modern large scale systems comprise.

Key Questions

  1. What are the benefits of collecting logs and metrics data
  2. How to correlate logs from different systems
  3. Centralized collection of logs and metrics: challenges and returns
  4. Are logs and metrics Big Data?
  5. How to tackle the increasing complexity of multi-layered architectures (virtualization, containers, etc.)
  6. Threat Intelligence: Proactively identifying unusual network activities and unauthorized accesses
WSL Birmensdorf
Room Englersaal
Zürcherstrasse 111 8903 Birmensdorf