Arm Architecture HPC Workshop by Linaro and HiSiliconThursday 26th July - Huawei, 2330 Central Expressway, Santa Clara, CA 95050
Arm Architecture HPC Workshop by Linaro and HiSilicon on 26th July 2018 – At the Huawei campus in Santa Clara, CA.
Logistics: The workshop will be held at Cafeteria Conference Room in Building C. All outside visitors should head directly to the Linaro registration area which will be signposted. No need to go through Building B to sign in.
About: How does the Arm-Powered supercomputing future look and how can you prepare for it? The Arm Architecture HPC Workshop will bring together the leading Arm vendors, end users and the open source development community in the Bay area, to discuss the latest products, developments and open source software support. Topics of focus are, but not limited to:
- Compilers including GCC, LLVM, C++, Fortran, optimisations, benchmarking and general support
- OS and Runtime
- Math Libraries
- Machine Learning
For End Users: In the Arm Architecture HPC Workshop you will hear from Arm Members and their Partners, information about new trends, technologies and products for the planning and operation of an Arm-Powered supercomputer.
For Arm Members and Partners: The Arm Architecture HPC Workshop is a one day conference to provide training and discussion panels for Arm-Powered solutions and offerings for all aspects of HPC including Server, Networking, Storage and Development.
You can expect:
- Exciting best practices and technology outlooks with peers from the Arm ecosystem
- Unique opportunities to network with current Arm users, vendors and the developer community
- We are looking forward to your visit!
Arm Architecture HPC Workshop
- Cost (Free)
- Sponsorship options available
Call for Papers is openSubmit Now
Deadline for submissions 28th June
The Vanguard program looks to expand the potential technology choices for leadership-class High Performance Computing (HPC) platforms, not only for the National Nuclear Security Administration (NNSA) but for the Department of Energy (DOE) and wider HPC community.
ARM's SVE instructions provide a whole new set of features for HPC computing. Engineers are understandably keen to start making sure their software is ready to take advantage as soon as hardware arrives.
The latest revolution in HPC interconnect architecture is the development of In-Network Computing, a technology that enables handling and accelerating application workloads at the network level.
SUSE has been delivering commercial Linux support for Arm based servers since 2016. Initially the focus was on high end servers for HPC and Ceph based software defined storage.
Optimizing for ARM64 has some unique features that optimization for x86 does not. We will describe what those features are, including...
Applications, programming languages, and libraries that leverage sophisticated network hardware capabilities have a natural advantage when used in today's and tomorrow's high-performance and data center computer environments.
In order to test OpenHPC packages and components and to use it as a platform to benchmark HPC applications, Linaro is developing an automated deployment strategy, using Ansible, Mr-Provisioner and Jenkins, to install the OS, OpenHPC and prepare the environment on varied architectures (Arm, x86).
High performance application tuning -- performance engineering -- relies heavily on tools for profiling, debugging, and visualization.
The Isambard supercomputer is due to be the world's first production, Arm-based system when it goes online in the summer of 2018. The project, run by the GW4 alliance and the UK's Met Office, has already produced significant results using early access systems based on Cavium ThunderX2 CPUs.
Post-K, a flagship supercomputer in Japan, is being developed by Riken and Fujitsu. It will be the first supercomputer with Armv8-A+SVE.
A high level review of a wide range of requirements to architect an ARM based competitive HPC solution is provided.
Renato Golin Full Abstract/Bio
OpenHPC Automation with Ansible
In order to test OpenHPC packages and components and to use it as a platform to benchmark HPC applications, Linaro is developing an automated deployment strategy, using Ansible, Mr-Provisioner and Jenkins, to install the OS, OpenHPC and prepare the environment on varied architectures (Arm, x86). This work is meant to replace the existing ageing Bash-based recipes upstream while still keeping the documents intact. Our aim is to make it easier to vary hardware configuration, allow for different provisioning techniques and mix internal infrastructure logic to different labs, while still using the same recipes. We hope this will help more people use OpenHPC with a better out-of-the-box experience and with more robust results
He started programming in the late 80's in C for PCs after a few years playing with 8-bit computers, but he only started programming professionally in the late 90's during the .com bubble. After many years working on Internet's back-end, he moved to UK and worked a few years on bioinformatics at EBI before joining ARM, where he worked on the DS-5 debugger and on the EDG-to-LLVM bridge, where he became the LLVM Tech Lead. Recently, he worked with large clusters and big data at HPCC before moving to Linaro.
Alex Bennée Full Abstract/Bio
Setting up an SVE developer environment
ARM's SVE instructions provide a whole new set of features for HPC computing. Engineers are understandably keen to start making sure their software is ready to take advantage as soon as hardware arrives. Up until now that has meant building and running in slow system emulation models. Fortunately the latest QEMU now supports SVE instruction in it's Linux user-emulation mode. Alex will talk through how to set this up so building and testing your SVE enabled code is as easy as running make.
Long time systems and embedded developer with a side of Dynamic Binary Translation. Alex started learning to program in the 80s in an era of classic home computers that allowed you to get down and dirty at the system level. After graduating with a degree in Chemistry he's worked on a variety of projects including Fruit Machines, Line Cards, CCTV recorders and point-to-multipoint wireless microwave systems. Since the turn of the century his primary focus has been working with FLOSS platforms, especially Linux. An alumni of Transitive he has a broad experience of cross-platform virtualization as well as a strong background in telecommunications and networking. A keen Emacs user he will happily answer questions and proselytise for the One True Editor (tm).
Andrew J Younge Full Abstract/Bio
Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Supercomputing
The Vanguard program looks to expand the potential technology choices for leadership-class High Performance Computing (HPC) platforms, not only for the National Nuclear Security Administration (NNSA) but for the Department of Energy (DOE) and wider HPC community. Specifically, there is a need to expand the supercomputing ecosystem by investing and developing emerging, yet-to-be-proven technologies and address both hardware and software challenges together, as well as to prove-out the viability of such novel platforms for production HPC workloads.
The first deployment of the Vanguard program will be Astra, a prototype Petascale Arm supercomputer to be sited at Sandia National Laboratories during 2018. This talk will focus on the arthictecural details of Astra and the significant investments being made towards the maturing the Arm software ecosystem. Furthermore, we will share initial performance results based on our pre-general availability testbed system and outline several planned research activities for the machine."
Andrew Younge is a R&D Computer Scientist at Sandia National Laboratories with the Scalable System Software group. His research interests include Cloud Computing, Virtualization, Distributed Systems, and energy efficient computing. Andrew has a Ph.D in Computer Science from Indiana University, where he was the Persistent Systems fellow and a member of the FutureGrid project, an NSF-funded experimental cyberinfrastructure test-bed. Over the years, Andrew has held visiting positions at the MITRE Corporation, the University of Southern California / Information Sciences Institute, and the University of Maryland, College Park. He received his Bachelors and Masters of Science from the Computer Science Department at Rochester Institute of Technology (RIT) in 2008 and 2010, respectively.
Simon McIntosh-Smith Full Abstract/Bio
Isambard: the world's first production Arm-based supercomputer
The Isambard supercomputer is due to be the world's first production, Arm-based system when it goes online in the summer of 2018. The project, run by the GW4 alliance and the UK's Met Office, has already produced significant results using early access systems based on Cavium ThunderX2 CPUs. The production Isambard system will be a Cray XC50, complete with Cray's software toolchain ported to Arm. Isambard's early results compare the two most common open-source compilers - GNU and LLVM - with Cray's compiler. Early results suggest that Cavium ThunderX2 CPUs are performance competitive with the latest x86 CPUs, while delivering compelling performance per dollar advantages. Our experience has also been that porting complex production codes of millions of lines of legacy code is relatively painless, a significant achievement for the Arm software ecosystem. However, there are still some areas where improvement is required, and we shall highlight those in this talk.
Simon McIntosh-Smith is a full Professor of High Performance Computing at the University of Bristol in the UK. He began his career as a microprocessor architect at Inmos and STMicroelectronics in the early 1990s, before co-designing the world's first fully programmable GPU at Pixelfusion in 1999.
In 2002 he co-founded ClearSpeed Technology where, as Director of Architecture and Applications, he co-developed the first modern many-core HPC accelerators. He now leads the High Performance Computing Research Group at the University of Bristol, where his research focuses on performance portability and application based fault tolerance. He plays a key role in designing and procuring HPC services at the local, regional and national level, including the UK’s national HPC server, Archer.
In 2016 he led the successful bid by the GW4 Alliance along with the UK’s Met Office and Cray, to design and build ‘Isambard’, the world’s first large-scale production ARMv8-based supercomputer.
Gilad Shainer and Scot Schultz Full Abstract/Bio
Intelligent Interconnect Architecture to Enable Next Generation HPC
The latest revolution in HPC interconnect architecture is the development of In-Network Computing, a technology that enables handling and accelerating application workloads at the network level. By placing data-related algorithms on an intelligent network, we can overcome the new performance bottlenecks and improve the data center and applications performance. The combination of In-Network Computing and Arm based processors offer a rich set of capabilities and opportunities to build the next generation of HPC platforms.
Gilad Shainer Bio
Gilad Shainer has served as Mellanox's vice president of marketing Mr. Shainer serves as the chairman of the HPC Advisory Council organization, he serves as a board member in the OpenPOWER, CCIX, OpenCAPI and UCF organizations, a member of IBTA and contributor to the PCISIG PCI-X and PCIe specifications.
Mr. Shainer holds multiple patents in the field of high-speed networking. He is also a recipient of 2015 R&D100 award for his contribution to the CORE-Direct collective offload technology. Gilad Shainer holds a MSc degree and a BSc degree in Electrical Engineering from the Technion Institute of Technology in Israel.
Scot Schultz Bio
Scot Schultz is a HPC technology specialist with broad knowledge in operating systems, high speed interconnects and processor technologies. Joining the Mellanox team in 2013, Schultz is 30-year veteran of the computing industry. Prior to joining Mellanox, he spent the past 17 years at AMD in various engineering and leadership roles in the area of high performance computing. Scot has also been instrumental with the growth and development of various industry organizations including the Open Fabrics Alliance, and continues to serve as a founding board-member of the OpenPOWER Foundation and Director of Educational Outreach and founding member of the HPC-AI Advisory Council.
Yutaka Ishikawa Full Abstract/Bio
Post-K and Arm HPC Ecosystem
Post-K, a flagship supercomputer in Japan, is being developed by Riken and Fujitsu. It will be the first supercomputer with Armv8-A+SVE. This talk will give an overview of Post-K and how RIKEN and Fujitsu are currently working on software stack for an Arm architecture.
Yutaka Ishikawa is the project leader of FLAGSHIP 2020 project at Riken Center for Computational Science, Japan. Ishikawa received PhD degree in electrical engineering from Keio University. From 1987 to 2001, he was a member of AIST (former Electrotechnical Laboratory), METI. From 1993 to 2001, he was the chief of Parallel and Distributed System Software Laboratory at Real World Computing Partnership. From 2002 to 2006 and from 2006 to 2014, he was an associate professor and a professor at the University Tokyo, respectively. From 2010 to 2014, he was also the director of Information Technology Center at the University of Tokyo.
Pavel Shamis Full Abstract/Bio
HPC network stack on Arm
Applications, programming languages, and libraries that leverage sophisticated network hardware capabilities have a natural advantage when used in today¹s and tomorrow's high-performance and data center computer environments. Modern RDMA based network interconnects provides incredibly rich functionality (RDMA, Atomics, OS-bypass, etc.) that enable low-latency and high-bandwidth communication services. The functionality is supported by a variety of interconnect technologies such as InfiniBand, RoCE, iWARP, Intel OPA, Cray¹s Aries/Gemini, and others. Over the last decade, the HPC community has developed variety user/kernel level protocols and libraries that enable a variety of high-performance applications over RDMA interconnects including MPI, SHMEM, UPC, etc. With the emerging availability HPC solutions based on Arm CPU architecture it is important to understand how Arm integrates with the RDMA hardware and HPC network software stack. In this talk, we will overview Arm architecture and system software stack, including MPI runtimes, OpenSHMEM, and OpenUCX.
Pavel is a Principal Research Engineer at Arm with over 16 years of experience in development HPC solutions. His work is focused on co-design software and hardware building blocks for high-performance interconnect technologies, development communication middleware and novel programming models. Prior to joining ARM, he spent five years at Oak Ridge National Laboratory (ORNL) as a research scientist at Computer Science and Math Division (CSMD). In this role, Pavel was responsible for research and development multiple projects in high-performance communication domain including: Collective Communication Offload (CORE-Direct & Cheetah), OpenSHMEM, and OpenUCX. Before joining ORNL, Pavel spent ten years at Mellanox Technologies, where he led Mellanox HPC team and was one of the key driver in enablement Mellanox HPC software stack, including OFA software stack, OpenMPI, MVAPICH, OpenSHMEM, and other.
Pavel is a recipient of prestigious R&D100 award for his contribution in development of the CORE-Direct collective offload technology and he published in excess of 20 research papers.
Joel Jones Full Abstract/Bio
Optimizing for ARM64—A Toolchain Perspective
Optimizing for ARM64 has some unique features that optimization for x86 does not. We will describe what those features are, including:
- Importance of having latest tools and libraries
- Using existing optimizations as a guide to optimizing for ARM64
- Configuring tools for best optimization results
- Current status of HPC library and application performance
- Effects of hardware and configuration considerations
- Our experiences with various optimization techniques on numerous HPC applications and libraries
Joel Jones has worked at Cavium for five years and currently leads the toolchain team. He has worked for Apple, Transcella, Coverity, Wind River, and others. He has been a professor of computer science, and has a PhD in Computer Science from the University of Illinois.
Jay Kruemcke Full Abstract/Bio
It just keeps getting better - SUSE enablement for Arm
SUSE has been delivering commercial Linux support for Arm based servers since 2016. Initially the focus was on high end servers for HPC and Ceph based software defined storage. But we have enabled a number of other Arm SoCs and are even supporting the Raspberry Pi. This session will cover the SUSE products that are available for the Arm platform and view to the future.
Jay is responsible for the SUSE Linux server products for High Performance Computing, 64-bit Arm systems, and SUSE Linux for IBM Power servers.
Jay has built an extensive career in product management including using social media for client collaboration, product positioning, driving future product directions, and evangelizing the capabilities and future directions for dozens of enterprise products.
Ryan Hulguin Full Abstract
Cross Platform Performance Engineering
High performance application tuning -- performance engineering -- relies heavily on tools for profiling, debugging, and visualization. This talk will present a methodology for porting HPC applications to Arm, and the ecosystem of cross platform performance engineering toolkits and libraries that is currently available on Arm. An overview and use cases for Arm Forge, PAPI, ScoreP, TAU and others will be provided along with guidance and advice for HPC performance engineering on the latest Arm-based CPU offerings.
Joshua Mora, PhD - Full Abstract
Huawei’s requirements for the ARM based HPC solution readiness
A high level review of a wide range of requirements to architect an ARM based competitive HPC solution is provided. The review combines both Industry and Huawei’s unique views with the intend to communicate openly not only the alignment and support in ongoing efforts carried over by other ARM key players but to brief on the areas of differentiation that Huawei is investing towards the research, development and deployment of homegrown ARM based HPC solution(s).
20 years of experience in research and development of both software and hardware for high performance computing. Currently leading the architecture definition and development of ARM based HPC solutions, both hardware and software, all the way to the applications (ie. turnkey HPC solutions for different compute intensive markets where ARM will succeed !!).
|Andrew Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Supercomputing||Slides|
|Yutaka Ishikawa - Post-K and Arm HPC Ecosystem||Slides|
|Gilad Shainer and Scot Schultz - Intelligent Interconnect Architecture to Enable Next Generation HPC||Slides|
|Alex Bennée - Setting up an SVE developer environment||Slides|
|Jay Kruemcke - It just keeps getting better - SUSE enablement for Arm||Slides|
|Pavel Shamis - HPC network stack on ARM||Slides|
|Renato Golin - OpenHPC Automation with Ansible||Slides|
|Kanta Vekaria - Welcome Note||Slides|
|Joshua Mora - Huawei’s requirements for the ARM based HPC solution readiness||Slides|