Logging and Monitoring

Name: Logging and Monitoring
Start: 2019-10-03T09:00:00+02:00
End: 2019-10-03T17:00:00+02:00
Location: WSL Birmensdorf

3 October 2019

WSL Birmensdorf

Europe/Zurich timezone

hpc-ch forum

Support

raluca.hodoroaba@cscs.ch

Introduction

High Performance Computing systems generate a huge amount of logs and metric data during their operations: information about resources utilization, performance, failures, errors and so on is worth to be stored and analyzed.

This kind of data is often unstructured and not easily comprehensible: finding correlations, recognizing meaningful events, discard false positives is a common challenge all HPC centers have to face.

The reward is worth the effort: post mortem investigation, problems and incidents trouble shooting, security threat hunting, early warning and alerting, applications performance analysis, evaluation of resources utilization are all contexts that take advantage of a careful elaboration of logs and metrics data.

A thorough understanding of the underlying infrastructure producing this information is essential to make sense of it especially considering the complex hardware and software stack modern large scale systems comprise.

Key Questions

What are the benefits of collecting logs and metrics data
How to correlate logs from different systems
Centralized collection of logs and metrics: challenges and returns
Are logs and metrics Big Data?
How to tackle the increasing complexity of multi-layered architectures (virtualization, containers, etc.)
Threat Intelligence: Proactively identifying unusual network activities and unauthorized accesses

Conference information

Date/Time

Starts 3 Oct 2019, 09:00

Ends 3 Oct 2019, 17:00

All times are in Europe/Zurich

Location

WSL Birmensdorf

Room Englersaal

Zürcherstrasse 111 8903 Birmensdorf

Choose timezone