What’s new in QEMU 2.9

QEMU is an interesting multi-faceted open source project. It is a standard component for the Linux virtualisation stack, used by both the KVM and Xen hypervisors for device emulation. Thanks to its dynamic just-in-time recompilation engine know as the Tiny Code Generator (TCG) it is also capable of emulating other architectures on a number of hosts. This takes the form of either a full system emulation or the lighter weight user-mode emulation that allows foreign user-space binaries to be run alongside the rest of the host system.

Started in 2003 by Fabrice Bellard QEMU is now maintained by a community of mostly corporate sponsored engineers, although unaffiliated individuals are still the second largest set of contributors. The projects codebase has continued to grow over the years and it now has reached the point of making around 3 stable releases a year, typically in one in April, August and December.

Linaro engineers takes an active part in development and maintenance of the project and we thought it would be useful to talk about some of the ARM related features in the up-coming 2.9 release.

1 AArch64 EL2 Support for TCG

Building on previous work to enable EL3 (otherwise known as TrustZone) we now fully support the hypervisor exception level EL2. As the virtualisation support of the interrupt controller is an important part EL2 you need to define a GICv3 as part of your machine definition.

qemu-system-aarch64 ${QEMU_OPTS} \
  -machine gic-version=3 \
  -machine virtualization=true

This is especially useful if you are wanting to debug hypervisor code while working on KVM as it is often easier to attach to QEMU’s GDB stub than debug on real hardware with a hardware assisted debugger over JTAG.

While it is still slow compared to real KVM support it is faster than running nested TCG emulations. It also means you can use QEMU instead of the fast model to test hyper-visors which is useful given the next feature…

2 Multi-threaded TCG for System Emulation

Previously system-emulation in QEMU has been single-threaded – with a single host-thread emulating all the guests vCPUs. As the number of SMP systems has grown this has slowly become more of a bottleneck in QEMUs performance. The multi-threaded TCG project (also known as MTTCG) is the culmination of several years of shared effort between commercial, community and academic contributors. Linaro has been proud to be heavily involved in coding, reviewing and helping get this feature accepted upstream.

While the work has focused on system emulation a number of the updates have also had benefits for the rest of TCG emulation including the efficient QHT translation-cache lookup algorithm and completely overhauling how TCG deals with emulating atomic operations. If you are interested in a more detailed write-up of the technical choices made we wrote an article for LWN last year.

While this work finally removes the single-threaded bottle-necks from system emulation it is not a performance panacea. While you have unused CPU cores on your host machine you should see performance improvement for each new vCPU you add to your guest up until around 8 cores. At that point the cost of keeping the system behaviour coherent will eventually catch-up with you.

The core technology on which MTTCG relies is target agnostic and designed so all the various architectures QEMU emulates can take advantage of it. However each front-end needs to make changes to their emulation to ensure they take advantage of the new TCG facilities for modelling atomic and barrier operations.

Currently MTTCG is enabled by default for both 32 and 64 bit ARM chips as well as the Alpha architecture when running on an x8664 host. This is by far the most common use case for ARM emulation.

3 Cortex M fixes

In the last few years Linaro has been mostly concentrating on the A-profile (Application profile) ARM processors. These are the ones designed to run full-stack operating systems like Linux. With the growing interest in Internet of Things (IoT) we are starting to turn our attention to the M-profile. The Microcontroller profile processors are targeted at much more constrained low-latency, low-power deeply embedded applications. Their memory is usually measured in kilobytes (kB) rather than megabytes (MB) so they tend to run custom run-loops or highly constrained real-time operating systems (RTOS) like Zephyr.

While QEMU nominally supports the Cortex-M3 processor support for boards using it has been sporadic and the resulted in a situation where there have been long standing un-fixed bugs and important features missing. As the architecture has progressed support for the newer M-profile CPUs has also lagged.

The 2.9 release sees a number of fixes to the Cortex-M series emulation as we ramp up our efforts to improve QEMU’s microcontroller support. The fixes have so far been aimed at architectural aspects which was known to be broken like the NVIC emulation. However part of the discussion at our recent BUD17 session was looking at what features we should prioritise for future QEMU releases.

This summary is not intended to be exhaustive and has concentrated on ARM specific features. For example we have not covered updates to the common sub-systems shared by all architectures. For those interested in all the details the full changelog is worth a read.

Recommended Posts