Defining the boundaries for endpoint congestion management in networks for high-performance computing
Ver/ Abrir
Registro completo
Mostrar el registro completo DCAutoría
Postigo Díaz, Daniel; Herreros Cerro, David; Barón, Eloy; Camarero Coterillo, Cristobal

Fecha
2024Derechos
© 2024 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution International 4.0 License.
Publicado en
SNTA '24: proceedings of the Seventh International Workshop on Systems and Network Telemetry and Analytics, Nueva York, Association for Computing Machinery, 2024. Pisa, 15-23
Editorial
Association for Computing Machinery
Enlace a la publicación
Palabras clave
Network congestion
Hotspot pattern
Endpoint congestion
High-performance interconnection networks
Resumen/Abstract
A hotspot traffic pattern of communications can be a common phenomenon in HPC topologies that causes significant and lasting network performance degradation. This performance deterioration remains persistent over time, intensifying its impact even after the cessation of the detrimental traffic injection into the network. To understand its causes and effects, we analyze the network behavior under different hotspot traffic scenarios and compare the performance on various topologies. We examine both the performance drop due to traffic flows with endpoint contention, and the recovery process of the network after this phenomenon has occurred, if swift action is taken to mitigate it. Our results show that some topologies are more resilient to hotspot traffic than others, both to reduce the performance drop and/or to accelerate the recovery process. In particular, Flattened Butterfly is more resilient to congestion and consistently demonstrates a rapid recovery. The results of the analysis reinforce the need for mechanisms with effective and expeditious action to reduce the magnitude and duration of the performance drop. Furthermore, they highlight behavioral differences between topologies that can affect the effectiveness of mechanisms using congestion-based metrics.
Colecciones a las que pertenece
- D30 Congresos [57]
- D30 Proyectos de Investigación [116]