Learn more about LinuxCon + ContainerCon + CloudOpen China, happening June 19-20. 

Customize your schedule by experience level and/or presentation language: Refer to the “Filter by Type” list on the right to find a session based on topic and/or experience level. Presentation Language - Sessions are categorized as [C] Chinese, [C,E] Chinese with English Slides or [E] English at the end of each talk title.
Back To Schedule
Tuesday, June 20 • 14:55 - 15:25
Challenge of HPC Data Center: When HPC Meets the ML/DL and Container [C} - Yong Feng, IBM Canada Ltd.

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
With the trend of the AI technology, the HPC data centers are facing the challenge of developing and running ML/DL workloads on their systems with container run time environment.
The existing HPC job schedulers are not usually chosen to run ML/DL stack due to the gap of supporting long running services. However, the popular container platform used to run ML/DL stack cannot meet the requirement of traditional HPC workload due to the lack of non-docker support and scheduling policy such as back-fill, cpu binding and so on.
This session introduces a Kubernetes+HPC Job Scheduler+Tensorflow based architecture of HPC data center to run MPI job and DL stack together, isolated by container and dynamically share resource between each other with a demo. The session explains the technical issues met during development and how they are resolved by enhancing those open source components.


Yong Feng

Senior Product Architect, IBM Canada Ltd.
Yong Feng is a Senior Product Architect in IBM Spectrum Computing Canada. He has more than 10 years experience on resource scheduling and management in the areas of HPC, virtual machine management, analytics/big data platforms and container cloud. Yong Feng is currently leading a... Read More →

Tuesday June 20, 2017 14:55 - 15:25 HKT
Room 309B
  Cloud Native & Containers, Developer