Arm HPC Workshop Sessions and Speakers

We are pleased to announce the sessions and speakers for the first ever Arm HPC workshop taking place in Tokyo 12-13th Dec 2017. The sold-out event has attracted over 100 attendees from Japan, UK, USA, China and other parts of the world. Joining us are speakers from RIKEN AICS, Fujitsu, Arm and some Research institutions. Please find the detailed schedule below. For more information on Linaro High Performance Computing (HPC) work click here.


Session Details

Welcome Note

By David Rusling CTO, Linaro

Introduction of Post-K development

By Yutaka Ishikawa, RIKEN AICS

Post-K is the next flagship supercomputer in Japan, replacement of the K supercomputer. Its node architecture and interconnect are based on ARMv8 SVE and a 6-D mesh/torus network, respectively. A three level hierarchical storage system will be installed with compute nodes. The system software developed in the post K supercomputer includes a novel operating system for general-purpose manycore architectures, low-level communication and MPI libraries, and file I/O middleware.

Yutaka Ishikawa Bio

The project leader of flagship2020 project

Contact Email

Post-K: Building the Arm HPC Ecosystem

By Koichi Hirai, Fujitsu

Post-K use Arm based super computer. But there are not too many Arm based servers for HPC. Therefore we think to need to build Arm HPC Ecosystem until Post-K release. In this presentation, we describe our collaboration efforts to build the Arm HPC Ecosystem.

Arm tools and roadmap for SVE compiler support

By Richard Sandiford, Florian Hahn (Arm), ARM

This presentation will give an overview of what Arm is doing to develop the HPC ecosystem, with a particular focus on SVE. It will include a brief synopsis of both the commercial and open-source tools and libraries that Arm is developing and a description of the various community initiatives that Arm is involved in. The bulk of the talk will describe the roadmap for SVE compiler support in both GCC and LLVM. It will cover the work that has already been done to support both hand-optimised and automatically-vectorised code, and the plans for future improvements.

HCQC : HPC Compiler Quality Checker

By Masaki Arai, Fujitsu Laboratories Ltd.

For numerical calculation programs on supercomputers, the kernel part occupies 80% or more of the execution time in many cases. Therefore, the quality of the code generated by the compiler for these kernel parts is significant. We created a tool, which is called HCQC, to aid in the investigation of the quality of the code generated by the compiler for the kernel part. In this presentation, we report the details of HCQC and the results of evaluating the quality of GCC and LLVM when compiling the kernel part of benchmark programs using HCQC.

Masaki Arai Bio

In 1992, He joined Fujitsu Laboratories Ltd. His research interests are in the area of compiler optimizations and computer architectures. He joined Linaro as member engineer in 2017.


State of the Scalasca Toolset

By Itaru Kitayama, RIKEN AICS

Scalasca is a standardized toolset for parallel applications to evaluate their performance on HPC systems. In this talk, starting from the general introduction to the toolset, we’ll review the current state of Scalasca, focusing on the arm64 support. As of today Scalasca just works out of the box on arm64, except sampling mode support which is only available on x86 systems. The on-going porting work to address this missing feature is presented in detail and also a major upgrade to their trace format called Online Trace Format (OTF) is summarized. As time permits, project outlook and demo slides actually carried on a Cavium ThunderX system will be given in the talk.

Itaru Kitayama Bio

Itaru Kitayama has been working on HPC tools for supercomputers at AICS since 2013.


Porting and Optimization of Numerical Libraries for ARM SVE

By Toshiyuki Imamura, RIKEN AICS

RIKEN and Fujitsu are developing ARM-based numerical libraries optimized with the new feature of ARM-SVE. We present porting status of netlib+SSL-II for ARM-SVE and other OSS. Also, we demonstrate some optimization policies and techniques, especially for the basic numerical linear algebra kernels.

Toshiyuki Imamura Bio

Toshiyuki Imamura is currently a team leader of Large-scale Parallel Numerical Computing Technology at Advanced Institute for Computational Science (AICS), RIKEN. He is in charge of the development of numerical libraries for the post-K project. His research interests include high-performance computing, automatic-tuning technology, eigenvalue computation (algorithm/software/applications), etc. He and his colleagues (Japan Atomic Energy Agency (JAEA) team) were nominated as one of the finalists of Gordon Bell Prize in SC05 and SC06. He is a member of IPSJ, JSIAM, and SIAM.


An Evaluation of EasyBuild for Open Source Software Deployment

By Takahiro Ogura, RIKEN

Sharing build procedures of Open Source Software (OSS) is critical to quick OSS deployment. It is difficult for us because our target architecture is ARM and the public know-hows are not abundant since ARM based HPC machines are not prevalent. We will share the lessons learned from our evaluation of EasyBuild, which facilitates formulation and sharing of build recipes.

Takahiro Ogura Bio

Advanced Institute for Computational Science Research & Development Scientist


An Overview of the IHK/McKernel Multi-kernel Operating System

By Balazs Gerofi, RIKEN Advanced Institute For Computational Science

RIKEN Advanced Institute for Computation Science is in charge of leading the development of Japan's next generation flagship supercomputer, the successor of the K. Part of this effort is to design and develop a system software stack that suits the needs of future extreme scale computing. In this talk, we focus on operating system (OS) requirements for HPC and discuss IHK/McKernel, a multi-kernel based operating system framework. IHK/McKernel runs Linux with a light-weight kernel (LWK) side-by-side on compute nodes with the primary motivation of providing scalable, consistent performance for large scale HPC simulations, but at the same time to retain a fully Linux compatible execution environment. We provide an overview of the project and discuss the status of its support for ARM architecture.

Balazs Gerofi Bio

Research Scientist at RIKEN Advanced Institute For Computational Science.


Compilation of COSMO for GPU using LLVM

By Tobias Grosser, Scalable Parallel Computing Laboratory (SPCL)

The COSMO climate and weather model delivers daily forecasts for Switzerland and many other nations. As a traditional HPC application it was developed with SIMD-CPUs in mind and large manual efforts were required to enable the 2016 move to GPU acceleration. As today's high-performance computer systems increasingly rely on accelerators to reach peak performance and manual translation to accelerators is both costly and difficult to maintain, we propose a fully automatic accelerator compiler for the automatic translation of scientific Fortran codes to CUDA GPU accelerated systems. Several challenges had to be overcome to make this reality: 1) improved scalability, 2) automatic data placement using unified memory, 3) loop rescheduling to expose coarse-grained parallelism, 4) inter-procedural loop optimization, and 5) plenty of performance tuning. Our evaluation shows that end-to-end automatic accelerator compilation is possible for non-trivial portions of the COSMO climate model, despite the lack of complete static information. Non-trivial loop optimizations previously implemented manually are performed fully automatically and memory management happens fully transparently using unified memory. Our preliminary results show notable performance improvements over sequential CPU code (40s to 8s reduction in execution time) and we are currently working on closing the remaining gap to hand-tuned GPU code. This talk is a status update on our most recent efforts and also intended to gather feedback on future research plans towards automatically mapping COSMO to FPGAs.

Tobias Grosser Bio

Tobias Grosser is a senior researcher in the Scalable Parallel Computing Laboratory (SPCL) of Torsten Hoefler at the Computer Science Department of ETH Zürich. Supported by a Google PhD Fellowship he received his doctoral degree from Universite Pierre et Marie Curie under the supervision of Albert Cohen. Tobias' research is taking place at the border of low-level compilers and high-level program transformations with the goal of enabling complex - but highly-beneficial - program transformations in a production compiler environment. He develops with the Polly loop optimizer a loop transformation framework which today is a community project supported throught the Polly Labs research laboratory. Tobias also developed advanced tiling schemes for the efficient execution of iterated stencils. Today Tobias leads the heterogeneous compute efforts in the Swiss University funded ComPASC project and is about to start a three year NSF Ambizione project on advancing automatic compilation and heterogenization techniques at ETH Zurich.


Involvement in OpenHPC

By Takeharu Kato, Fujitsu

Nowadays, OpenHPC is gradually spreading as a software stack standard for HPC. OpenHPC is one of the most promising software stack to achieve interoperability among HPC systems. It is designed and developed to makes building HPC systems easier. In this presentation, we explain the current status of OpenHPC and our involvement in OpenHPC to establish Arm HPC eco-system.

Cyber-physical System and Industrial Applications of Large-Scale Graph Analysis and Optimization Problem

By Katsuki Fujisawa, The Institute of Mathematics for Industry, Kyushu University & The Artificial Intelligence Research Center, Advanced Industrial Science and Technology)

In this talk, we present our ongoing research project. The objective of many ongoing research projects in high performance computing (HPC) areas is to develop an advanced computing and optimization infrastructure for extremely large-scale graphs on the peta-scale supercomputers. The extremely large-scale graphs that have recently emerged in various application fields, such as transportation, social networks, cyber-security, and bioinformatics, require fast and scalable analysis. The number of vertices in the graph networks has grown from billions to trillions and that of the edges from hundreds of billions to tens of trillions. The Graph500 ( and Green Graph 500 ( benchmarks are designed to measure the performance of a computer system for applications that require irregular memory and network access patterns. Following its announcement in June 2010, the Graph500 list was released in November 2010, since when it has been updated semiannually. The Graph500 benchmark measures the performance of any supercomputer performing a breadth-first search (BFS) in terms of traversed edges per second (TEPS). In 2014 to 2017, our project team has been a winner at the eighth, and 10th to 15th Graph500 benchmark. We commenced our research project for developing the Urban OS (Operating System) for a large-scale city in 2013. The Urban OS, which is regarded as one of the emerging applications of the cyber-physical system (CPS), gathers big data sets of the distribution of people and transportation movements by utilizing sensor technologies and storing them in the cloud storage system. In the next step, we apply optimization and simulation techniques to solve them and check the validity of solutions obtained on the cyber space. The Urban OS employs the graph analysis system developed by this research project and provides a feedback to a predicting and controlling center to optimize many social systems and services. We briefly explain our ongoing research project for realizing the Urban OS.

Katsuki Fujisawa Bio

Fujisawa has been a Full Professor at the Institute of Mathematics for Industry (IMI) of Kyushu University, Japan. He had also been a research director of the JST (Japan Science and Technology Agency) CREST (Core Research for Evolutional Science and Technology) post-Peta High Performance Computing from 2011 to 2017. He received his Ph. D. from the Tokyo Institute of Technology in 1998. The objective of the JST CREST project is to develop an advanced computing and optimization infrastructure for extremely large-scale graphs on post peta-scale supercomputers. His project team has challenged the Graph500 benchmark, which is designed to measure the performance of a computer system for applications that require irregular memory and network access patterns. In 2014 to 2017, his project team was a winner at the eighth, and 10th to 14th Graph500 benchmark. In 2017, He received the Prize for Science and Technology (Research Category), Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology, Japan.


New Process/Thread Runtime

By Atsushi Hori, RIKEN

New portable and practical parallel execution model, Process in Process (PiP in short) will be presented. PiP tasks share the same virtual address space like the multi-thread model and privatized variables like the multi-process model. Because of this, PiP provides the best of two worlds, multi-process (MPI) and multi-thread (OpenMP).

Atsushi Hori Bio

Researcher, System Software Development Team, RIKEN


An evaluation of LLVM compiler for SVE with fairly complicated loops

By Hiroshi Nakashima, Kyoto University / RIKEN AICS

As a part of the evaluation of Post-K’s compilers, we have been investigating compiled codes of vectorizable kernel loops in a particle-in-cell simulation program. This talk will reveal how the latest version of LLVM compiler (v1.4) works on the loops together with the qualitative and quantitative comparison with the code generated by Intel’s compiler for KNL.

Hiroshi Nakashima Bio

Currently working as a professor of Kyoto University’s supercomputer center (ACCMS) for R&D on HPC programming and supercomputer system architecture, as well as a visiting senior researcher of RIKEN AICS for the evaluation of Post-K computer and its compilers.



By Renato Golin, Linaro

Programming Languages & Tools for Higher Performance & Productivity

By Hitoshi Murai, RIKEN AICS

For higher performance and productivity of HPC systems, it is important to provide users with good programming environment including languages, compilers, and tools. In this talk, the programming model of the post-K supercomputer will be shown.

Hitoshi Murai Bio

Hitoshi Murai received a master's degree in information science from Kyoto University in 1996. He worked as a software developer in NEC from 1996 to 2010. He received a Ph.D degree in computer science from University of Tsukuba in 2010. He is currently a research scientist of the programming environment research team and the Flagship 2020 project in Advanced Institute for Computational Science, RIKEN. His research interests include compilers and parallel programming languages.


Advantages of the Compiler for Post-K computer

By Shun Kamatsuka, Fujitsu

Fujitsu is developing the compiler for Post-K computer to achieve high performance and productivity. The compiler utilizes Arm SVE and supports new features of C/C++ and Fortran language standards. In this presentation, I will show advantages of the Post-K compiler with Fujitsu's technologies, focusing on SVE and coarray features of Fortran.

Application Development Tools for Post-K Supercomputer

By Tomotake Nakamura, Fujitsu

RIKEN and Fujitsu are developing programming assistance tools for Post-K computer, providing with new values. New features such as utilization of Eclipse PTP, and outputting performance data in XML-form are shown in this presentation.

The perfect mix: SUSE's HPC, ARM and Containers

By Vojtech Pavlik , SUSE

SUSE's operating system is well established in the HPC market as a solid and flexible foundation to build on. SUSE is complementing that with strong ARM expertise and Container skills and tools. A perfect mix for the next generation of ARM-based supercomputer with scalable management. The talk discusses the details of what SUSE offers, including how it was achieved on the technical level, from ARM enablement to its HPCaaS - High Performance Computing as a Service.

Vojtech Pavlik Bio

Vojtěch Pavlík is the director of SUSE Labs, a department of SUSE R&D focusing on core Linux technologies - kernel, compiler, as well as specific applications of those - Real Time and High performance Computing. In his kernel developer past Vojtěch Pavlík worked on support of USB and human input devices in Linux, work which is used today on every Linux and Android device. He enjoys solving interesting problems facing Linux, most recently working on Linux live patching technology.

OpenMP Extension for Explicit SIMD Programming using ARM SVE

By Jinpil Lee, RIKEN AICS

Recent trends in processor design accommodate wide vector extensions. SIMD vectorization is more important than before to exploit the potential performance of the target architecture. The latest OpenMP specification provides new directives which help compilers produce better code for SIMD auto-vectorization. However, it is hard to optimize the SIMD code performance in OpenMP since the target SIMD code generation mostly relies on the compiler implementation. In this research, we propose a new directive that specifies user-defined SIMD variants of functions used in SIMD loops. The compiler can then use the user-defined SIMD variants when it encounters OpenMP loops instead of auto-vectorized SIMD variants. The user can optimize the SIMD performance by implementing highly-optimized SIMD code with intrinsic functions.

Jinpil Lee Bio

Jinpil Lee received his PhD degree in computer science from University of Tsukuba in 2013, under the supervision of Prof. Mitsuhisa Sato. From 2013 to 2015, he was working in KISTI, the national supercomputing center in Korea. Currently he is working at Riken AICS in Japan, doing research about directive-based parallel programming models.


Performance evaluation with Arm HPC tools for SVE

By Miwako Tsuji, RIKEN AICS

The "co-design" is a bi-directional approach where a system would be designed on demand from applications and the applications must be optimized to the system. The performance estimation and evaluation of applications are important for the co-design. In this talk, we focus on the performance evaluation with Arm HPC tools for SVE.

Miwako Tsuji Bio

Miwako Tsuji received master and PhD degrees from Information Science and Technology, Hokkaido University. From 2007 to 2013, she was working in University of Hokkaido, University of Tokyo, University of Tsukuba and Universite de Versailles Saint-Quentin-en-Yvelines. She is a research scientist at RIKEN Advanced Institute for Computational Science since 2013. She is a member of the architecture development team of the flagship 2020 project, i.e. post-K computer project, since the project was started in 2014. She is a coauthor of ACM Gordon Bell Prize in 2011.