A simulator for intelligent workload managers in heterogeneous clusters
Ver/ Abrir
Registro completo
Mostrar el registro completo DCAutoría
Herrera Arcila, Adrián; Ibáñez Bolado, Mario


Fecha
2021Derechos
© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Publicado en
21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing CCGrid 2021, Los Alamitos, CA, IEEE, 2021
Enlace a la publicación
Palabras clave
Resource Management
Reinforced Learning
Scheduling Simulation
Heterogeneous Systems
Resumen/Abstract
Modern High Performance Computing (HPC) clusters often comprise a huge amount of computing resources of different capabilities, making them heterogeneous and difficult to manage. In addition, they must deal with a wide range of applications with different requirements. All this poses a great challenge to the workload managers that assign applications to resources. There are many new proposals to overcome this challenge, including some that employ Deep Reinforcement Learning (DRL) techniques. This paper proposes a novel simulation framework for the study of workload managers, that has been conceived to foster the study of workload managers based on DRL techniques. Its main features include the simulation of heterogeneous clusters based on multicore architectures, taking into account the contention in shared memory access and the energy consumption. A validation of the accuracy and performance of the simulator was made, compared with a real environment based on Slurm. This shows good accuracy of the results, with a relative error below 5% in makespan and 10% in energy consumption, and speedups up to 200.
Colecciones a las que pertenece
- D30 Congresos [57]
- D30 Proyectos de Investigación [116]