
The variety and complexity of data-intensive applications and systems have been increasing drastically the past decade. Tasks from a SQL-based big data analytics request can be very different from tasks from deep learning training. Nevertheless, these data-intensive applications run on shared powerful hardware resources in data centers and high-performance computing (HPC) centers or resource-constrained edge/Internet-of-Things(IoT) devices. These hardware resources are increasingly diverse: (1) compute resources range from general-purpose CPUs and GPUs to specialized hardware like TPUs, and (2) modern storage hierarchy is getting more complex with the variety of interfaces to NVMe SSDs and novel interconnect protocols such as CXL. There is a pressing need for a more resource-aware infrastructure that orchestrates the different data-intensive tasks over the available hardware effectively. To achieve this, our approach is to first characterize modern hardware and hardware needs of different data-intensive workloads, and then to establish and implement guidelines for hardware resource management for data-intensive systems. These days, our team more specifically focuses on resource-aware and resource-constrained machine learning, GPU-centric storage access, and leveraging NVMe SSDs and CXL for database systems.
This research has been supported by Independent Research Fund Denmark, Novo Nordisk Foundation, Innovation Fund Denmark, and Swiss National Science Foundation.