This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
Journal of Supercomputing, 2025, 81(3), 452
Kluwer Academic Publishers
Heterogeneous computing
Hybrid parallel computing
Co-execution
SYCL
OpenCL
CUDA
OneAPI
Performance portability
LLVM
Usability
Load balancing
The performance and energy efficiency offered by heterogeneous systems are highly useful for modern C++ applications, but the technological variety demands adequate portability and programmability. Initiatives such as Intel oneAPI facilitate the exploitation of Intel CPUs and GPUs, but not NVIDIA GPUs, which are present in
systems of all kinds and are necessarily leveraged by CUDA technology. Frequently, only GPUs are used, leaving the CPU for management tasks, with the consequent loss of energy and system utilization. In this work, the CoexecutorRuntime system design and API are extended to transparently integrate backends of diverse technologies, unifying offloading mechanisms under a consistent co-execution API and scheduling runtime. Moreover, CPU-GPU co-execution of hybrid technologies is enabled to ensure performance portability. Experimental results show performance improvements for all programs studied, achieving average efficiencies of 0.91 and speedups of 1.31 over using only the GPU.