Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems

Pérez Pavón, Borja; Stafford Fernández, Esteban; Bosque Orero, José Luis; Beivide Palacio, Ramón; Mateo, S.; Teruel, X.; Martorell, X.; Ayguadé, E.

dc.contributor.author	Pérez Pavón, Borja
dc.contributor.author	Stafford Fernández, Esteban
dc.contributor.author	Bosque Orero, José Luis
dc.contributor.author	Beivide Palacio, Ramón
dc.contributor.author	Mateo, S.
dc.contributor.author	Teruel, X.
dc.contributor.author	Martorell, X.
dc.contributor.author	Ayguadé, E.
dc.contributor.other	Universidad de Cantabria	es_ES
dc.date.accessioned	2020-10-13T18:25:57Z
dc.date.available	2020-10-13T18:25:57Z
dc.date.issued	2019
dc.identifier.issn	0743-7315
dc.identifier.issn	1096-0848
dc.identifier.other	CVE-2014-18166 ; TIN2016-76635-C2-2-R (AEI/FEDER, UE) ; TIN2015-65316-P	es_ES
dc.identifier.uri	http://hdl.handle.net/10902/19331
dc.description.abstract	The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different compute devices. However, it does not support the co-execution of a single kernel on several devices. This paper presents an extension of OmpSs that rises to this challenge, and presents Auto-Tune, a load balancing algorithm that automatically adjusts its internal parameters to suit the hardware capabilities and application behavior. The extension allows programmers to take full advantage of the computing devices with negligible impact on the code. It takes care of two main issues. First, the automatic distribution of datasets and the management of device memory address spaces. Second, the implementation of a set of load balancing algorithms to adapt to the particularities of applications and systems. Experimental results reveal that the co-execution of single kernels on all the devices in the node is beneficial in terms of performance and energy consumption, and that Auto-Tune gives the best overall results.	es_ES
dc.description.sponsorship	This work has been supported by the University of Cantabria with grant CVE-2014-18166, the Generalitat de Catalunya under grant 2014-SGR-1051, the Spanish Ministry of Economy, Industry and Competitiveness under contracts TIN2016-76635-C2-2-R (AEI/FEDER, UE) and TIN2015-65316-P. The Spanish Government through the Programa Severo Ochoa (SEV-2015-0493)	es_ES
dc.format.extent	13 p.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.source	Journal of Parallel and Distributed Computing, Volume 125, March 2019, Pages 45-57	es_ES
dc.subject.other	Heterogeneous systems	es_ES
dc.subject.other	OmpSs programming model	es_ES
dc.subject.other	OpenCL	es_ES
dc.subject.other	Co-execution	es_ES
dc.title	Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.relation.publisherVersion	https://doi.org/10.1016/j.jpdc.2018.11.001	es_ES
dc.rights.accessRights	openAccess	es_ES
dc.type.version	acceptedVersion	es_ES