Towards A Modular End-To-End Machine Learning Benchmarking Framework
Robert Bayer, Ties Robroek, Pınar Tözün
Accepted to TDIS 2025
Github
Abstract
Machine learning (ML) benchmarks are crucial for evaluating the performance, efficiency, and scalability of ML systems,
especially as the adoption of complex ML pipelines, such as retrieval-augmented generation (RAG), continues to grow.
These pipelines introduce intricate execution graphs that require more advanced benchmarking approaches. Additionally,
collocating workloads can improve resource efficiency but may introduce contention challenges that must be carefully
managed. Detailed insights into resource utilization are necessary for effective collocation and optimized edge
deployments. However, existing benchmarking frameworks often fail to capture these critical aspects.
We introduce a modular end-to-end ML benchmarking framework designed to address these gaps. Our framework emphasizes
modularity and reusability by enabling reusable pipeline stages, facilitating flexible benchmarking across diverse ML
workflows. It supports complex workloads and measures their end-to-end performance. The workloads can be collocated,
with the framework providing insights into resource utilization and contention between the concurrent workloads.