Data Science & Machine Learning

Name: Data Science & Machine Learning
Start: 2017-05-18T09:30:00+02:00
End: 2017-05-18T17:05:00+02:00
Location: T-Systems

Thursday 18 May 2017, 09:30 → 17:05 Europe/Zurich

Room: New York (T-Systems)

Room: New York

T-Systems

T-Systems Kloten (Balsberg), Balz-Zimmermann-Strasse 7, CH-8302 Kloten

Bolliger Christian (ETH Zurich), Valerio Zanetti-Überwasser (T-Systems Schweiz AG)

Description

Introduction

Data Science and Machine Learning have become relevant in many research areas and industries. The amount of collected data repeatedly breaks before known speed and volume barriers, which creates the need for automated data processing. Before automated data processing can take place, a machine or algorithm has to be trained for the intended task, which can be information search/retrieval, gaining insights or taking actions. The training phase might last long and occupy a large part of the available infrastructure. Especially if it has to be repeated on new incoming data. To minimize infrastructure costs, machine learning workloads tend to be offloaded to specialized hardware.

Machines are trained to take semantic action in specific domains. To act successfully in such a context, machine learning can't solely rely on data and general purpose algorithms, domain models play an important role in generating accurate results. Data Science - which can be seen as a combination of mathematics, heuristics and domain knowledge - helps discovering patterns and regularities in data, which ideally give birth to new models that help understanding the digitized ocean.

Key questions

What algorithms and computational models are best fit for machine learning at scale?
How scalable are the current implementations of support vector machines and deep neuronal networks?
Which type of hardware is a good match for offloading machine learning workloads: DSPs, ASICs, GPUs?
What deployment models and data flows are most supportive for machine learning applications?
How does machine learning impact resource usage in a HPC cluster?
What can Data Science adopt from HPC and vice versa?

Participants

52 View full list

- 09:30 → 10:00
  
  Coffee and registration 30m Room: New York
  
  Room: New York
  
  T-Systems
  
  T-Systems Kloten (Balsberg), Balz-Zimmermann-Strasse 7, CH-8302 Kloten
- 10:00 → 10:15
  
  Welcome and introduction Room: New York
  
  Room: New York
  
  T-Systems
  
  T-Systems Kloten (Balsberg), Balz-Zimmermann-Strasse 7, CH-8302 Kloten
  
  Conveners: Christian Bolliger (ETH Zurich), Valerio Zanetti-Ueberwasser (T-Systems Schweiz AG)
- 10:15 → 11:00
  
  Keynote presentation: Application of Machine Learning Approaches to Real-Time Prediction of Train Arrivals Room: New York
  
  Room: New York
  
  T-Systems
  
  T-Systems Kloten (Balsberg), Balz-Zimmermann-Strasse 7, CH-8302 Kloten
  
  The talk will present a solution that T-Systems has created for Deutsche Bahn to improve the passenger information by predicting the arrival of trains in real time based on the trains' current positions.
  
  It will be shown, how "classical" statistical machine learning approaches can be combined with artificial neural networks to solve the problem. The solution is designed in a way that it can scale horizontally based on an Hadoop based HPC platform.
  
  Furtherly, an outlook on new datatypes and compute approaches in industrial HPC applications will be given.
  
  Convener: Ingo Elsen (T-Systems Schweiz AG)
  
  slides
- 11:00 → 11:45
  
  Keynote presentation: Near Real-Time Optimization of Train Traffic in Densely Used Network Areas at SBB Room: New York
  
  Room: New York
  
  T-Systems
  
  T-Systems Kloten (Balsberg), Balz-Zimmermann-Strasse 7, CH-8302 Kloten
  
  SBB operates one of the busiest railway networks in the world. In densely used parts of the railway network the planned headway of 2 minutes between trains requires a strict control of train sequence and train velocity to avoid unnecessary stops and additional delays.
  
  A near real-time optimization based on mixed integer programming is used to calculate the optimum solution every 6 seconds. This optimization algorithm is an integrated component of the centralized dispatching system of SBB and in operation since 2013.
  
  Convener: Steffen Oettich (SBB)
- 11:45 → 12:15
  
  Big Data tools for Astrophysics Room: New York
  
  Room: New York
  
  T-Systems
  
  T-Systems Kloten (Balsberg), Balz-Zimmermann-Strasse 7, CH-8302 Kloten
  
  Astrophysical simulations have been a constant presence on HPC clusters around the world for many years. The computational power is now so large that the biggest production runs can easily generate datasets of hundreds of TB in just a few days. The challenge to efficiently post-process the data is significant, because the data can no longer fit in memory and the usual domain tools very cumbersome to use. To address this problem, and the problem of “big data” analysis on HPC clusters in general, we have undertaken a project to try and bring together the benefits of HPC with the ease-of-use of Big Data frameworks. We have developed an analysis code built on top of Apache Spark to analyze 200+TB outputs from recent state-of-the-art cosmological simulation run on Piz Daint at CSCS. Spark is used for orchestration of work and collection of intermediate results; highly-optimized domain code is used for the main part of the computation. In addition, we have developed a tool that allows for quick and easy deployment and monitoring of Spark clusters on HPC infrastructure. I will discuss the issues inherent in combining scientific codes with Big Data frameworks and the approaches we used to overcome them, from the perspective of both software and hardware.
  
  Convener: Rok Roskar (ETH Zurich)
  
  slides
- 12:15 → 13:15
  
  Lunch and networking 1h Room: New York
  
  Room: New York
  
  T-Systems
  
  T-Systems Kloten (Balsberg), Balz-Zimmermann-Strasse 7, CH-8302 Kloten
- 13:15 → 13:45
  
  Data Science Services at CSCS Room: New York
  
  Room: New York
  
  T-Systems
  
  T-Systems Kloten (Balsberg), Balz-Zimmermann-Strasse 7, CH-8302 Kloten
  
  Convener: Marcel Schoengens (CSCS)
  
  slides
- 13:45 → 14:10
  
  Analytics on Health Data: Ethical Considerations Room: New York
  
  Room: New York
  
  T-Systems
  
  T-Systems Kloten (Balsberg), Balz-Zimmermann-Strasse 7, CH-8302 Kloten
  
  Convener: Christian Bolliger (ETH Zurich)
  
  slides
- 14:10 → 14:25
  
  Community Development Room: New York
  
  Room: New York
  
  T-Systems
  
  T-Systems Kloten (Balsberg), Balz-Zimmermann-Strasse 7, CH-8302 Kloten
  
  Convener: Michele De Lorenzi (CSCS)
- 14:25 → 15:00
  
  Transfer to Bombardier Transportation Bus station
  
  Bus station
  
  Bombardier Transportation (Switzerland) Ltd Brown Boveri-Strasse 5 8050 Zurich
  
  picture
- 15:00 → 15:10
  
  Welcome @ Bombardier Transportation Toro 1/ K2
  
  Toro 1/ K2
  
  Convener: Stéphane Wettstein (Bombardier Transportation)
  
  slides
- 15:10 → 16:00
  
  Condition Monitoring and Condition Based Maintenance on ICN (SBB) and ETR 1000 (Trenitalia) Toro 1/ K2
  
  Toro 1/ K2
  
  Bombardier Transportation (Switzerland) Ltd Brown Boveri-Strasse 5 8050 Zurich
  
  Conveners: Hanspeter Krieger (Bombardier Transportation), Stefano Ritter (Bombardier Transportation)
  
  picture
- 16:00 → 17:00
  
  Hardware and Software Testing for High-Power Traction Systems Room: New York
  
  Room: New York
  
  T-Systems
  
  T-Systems Kloten (Balsberg), Balz-Zimmermann-Strasse 7, CH-8302 Kloten
  
  Convener: Markus Jörg (Bombardier Transportation)
  
  picture
- 17:00 → 17:05
  
  Farewell and end of the meeting Lab
  
  Lab

Choose timezone

Data Science & Machine Learning

Room: New York

T-Systems

Room: New York

T-Systems

Room: New York

T-Systems

Room: New York

T-Systems

Room: New York

T-Systems

Room: New York

T-Systems

Room: New York

T-Systems

Room: New York

T-Systems

Room: New York

T-Systems

Room: New York

T-Systems

Bus station

Toro 1/ K2

Toro 1/ K2

Room: New York

T-Systems

Lab