4th International Workshop on
RESource DISaggregation
in High-Performance Computing

Held together with The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24)

November 17th (Sun) 2024, Atlanta, GA, USA

Introduction

Disaggregation is an emerging compute paradigm that splits existing monolithic servers into a number of consolidated single-resource pools that communicate over a fast interconnect. This model decouples individual hardware resources, including tightly coupled ones such as processors and memory, and enables the composition of logical compute platforms with flexible and dynamic hardware configurations.

The concept of disaggregation is driven by various recent trends in computation. From an application perspective, the increasing importance of data analytics and machine learning workloads in HPC centers brings unprecedented need for memory capacity, which is in stark contrast with the growing imbalance in the peak compute-to-memory capacity ratio of traditional system board based server platforms where memory modules are co-located with processors. Meanwhile, traditional simulation workloads leave memory underutilized. At the hardware front, the proliferation of heterogeneous, special purpose computing elements promotes the need for configurable compute platforms, while at the same time, the increasing maturity of optical interconnects raises the prospects of better distance independence in networking infrastructure.

The workshop intends to explore various aspects of resource disgregation, composability and their implications for high performance computing, both in dedicated HPC centers as well as in cloud environments. RESDIS aims to bring together researchers and industrial practitioners to foster discussion, collaboration, mutual exchange of knowledge and experience related to future disaggregated systems.



Call for Papers

The RESDIS program committee solicits original, high-quality submissions of unpublished results related to the theme of resource disaggregation and composable systems. Topics of interest include, but not limited to:

- Disaggregated hardware in high-performance computing
- Operating systems and runtime support for disaggregated platforms
- Simulation of disaggregated platforms with existing infrastructure
- Runtime systems and programming abstractions for disaggregation and composability
- Networking for disaggregation, including silicon photonics and optical interconnects
- Implications of resource disaggregation for scientific computing and HPC applications
- Algorithm design for disaggregated and composable systems
- Disaggregated high throughput storage
- Disaggregated heterogeneous accelerators (GPUs, FPGAs, AI Accelerators, etc.)
- Resource management in disaggregated and composable platforms

The workshop proceedings will be published electronically via the IEEE Computer Society Digital Library. Submitted manuscripts must use the proceedings templates at: https://www.ieee.org/conferences/publishing/templates.html. Submissions must be between 5 and 8 pages, including references and figures. Prospective authors should submit their papers in PDF format through Linklings’ submission site:

Submissions Closed



Important Dates

August 16th (Fri) AoE

Submission deadline (EXTENDED)

Acceptance notification

Camera ready paper deadline

Workshop date

Organization

Workshop Chairs


Intel Corporation, USA & RIKEN, Japan

Lawrence Berkeley National Laboratory, USA

IBM Research Europe, Ireland

Program Committee


Michael Aguilar

Sandia National Laboratories, USA

Larry Dennison

Nvidia, USA

Thaleia Dimitra Doudali

IMDEA Software Institute, Spain

Kyle Hale

Illinois Institute of Technology, USA

John (Jack) Lange

Oak Ridge National Laboratory, USA

Ivy Peng

KTH Royal Institute of Technology, Sweden

Yu Tanaka

Fujitsu, Japan

Gaël Thomas

Télécom SudParis, France

Agenda

All times in Eastern Time (UTC-4)

Welcome and Introduction

Keynote: Open Chiplet Ecosystems and Economies for Disaggregated Systems

Cliff Grossner (Open Compute Project Foundation)

Abstract: The increase in demand for compute performance at scale has never been greater, and the current silicon supply chain delivering monolithic and generic SoCs can no longer deliver the generation over generation increase in performance needed to satiate this demand. Chiplets offer the promise of diversity to better match workloads to computational infrastructure, creating large-scale high-performance computers with pools of heterogenous processors that can be dynamically composed to provide virtual compute nodes specialized for particular workloads. Unfortunately, Chiplet technology today is mostly used in proprietary settings by a few large SoC suppliers, limiting the ability of the market to innovate. The Open Compute Project (OCP) Community believes that opening the silicon supply chain enabling innovation in specialized silicon processing elements by smaller companies is necessary to meet the future demands of high-performance computing. This talk will cover the ongoing work at the OCP Open Chiplet Economy Project focused on enabling a new silicon supply chain with an open chiplet marketplace intended to foster the innovation and emerging market for specialized chiplet-based System in Package (SiP) SoCs, enabling composable systems.

Coffee Break

Multi-Host Sharing of a Single-Function NVMe Device in a PCIe Cluster

Jonas Markussen (Dolphin Interconnect Solutions), Lars Bjørlykke Kristiansen (Dolphin Interconnect Solutions), Håkon Stensland (Simula Research Laboratory / University of Oslo), Pål Halvorsen (SimulaMet / Oslo Metropolitan University)

Abstract: Distributed cluster applications, including machine learning tasks, database applications, and HPC workloads, often rely on NVMe-oF using RDMA for fast, block-level access to storage devices over a network. However, RDMA solutions add extra latency by requiring software on the critical path. In this paper, we present a distributed NVMe driver for sharing NVMe storage devices across hosts in a PCIe cluster. By building on PCIe shared memory capabilities, we demonstrate disaggregation of NVMe controllers at the I/O queue level, allowing them to be used in parallel by remote hosts without relying on RDMA. Our experimental results prove that our PCIe-based solution reduces network latency and is comparable to local access.

Examining the Viability of Row-Scale Disaggregation for Production Applications

Curtis Shorts (Queen's University), Ryan Eric Grant (Queen's University)

Abstract: Row-scale Composable Disaggregated Infrastructure (CDI) is a heterogeneous high performance computing (HPC) architecture that relocates the GPUs to a single chassis which CPU nodes can then request compute resources from. This is a distinctly different architecture from rack-scaled CDI as the GPUs are accessed over a network rather than existing in the same PCIe domain as the CPUs. We perform comparisons between the kernel and data transfer characteristics of two production applications to a slack proxy application which allowed for the development of a mathematical model to predict the performance penalty generalized applications can face as a result of slack. Our proposed method found that the applications tested would pessimistically see a less than 1% performance penalty above the effects of crossing the network in an environment which induced 100 us of slack, thus demonstrating that row-scale CDI is a viable technology from an application performance perspective.

Granularity and Interference-Aware GPU Sharing with MPS

Alex Weaver (University of North Texas), Krishna Kavi (University of North Texas), Dejan Milojicic (Hewlett Packard Enterprise), Rolando Pablo Hong Enriquez (Hewlett Packard Enterprise), Ninad Hogade (Hewlett Packard Enterprise), Alok Mishra (Hewlett Packard Enterprise), Gayatri Mehta (University of North Texas)

Abstract: With the advent of exascale computing, GPU acceleration has become central to the performance of supercomputers. Even at this extreme scale, most scientific and HPC-scale DNN applications underutilize GPU resources. Existing GPU sharing mechanisms can be used to increase utilization, throughput, and energy efficiency. However, naively co-scheduling workflows often does not yield optimal results. Scheduling multiple high-utilization workloads on the same set of GPUs, for example, leads to performance degradation due to high resource contention. In short, GPU sharing must be granularity- and interference-aware to maximize the benefit. We propose a scheduling approach that optimizes workflow scheduling configurations for given system metrics--i.e., throughput and energy efficiency, uses workload profiling data to right-size GPU resources for combinations of HPC workflows, and collocates workflows using existing concurrency mechanisms. We show that choosing the right arrangement of workflows to collocate can increase throughput by as much as 2x and energy efficiency by 1.6x.

A Software Platform to Support Disaggregated Quantum Accelerators

Ercüment Kaya (Leibniz Supercomputing Centre / TU Munich), Jorge Echavarria (Leibniz Supercomputing Centre), Muhammad Nufail Farooqi (Leibniz Supercomputing Centre), Aleksandra Swierkowska (Leibniz Supercomputing Centre / TU Munich), Patrick Hopf (Leibniz Supercomputing Centre / TU Munich), Burak Mete (Leibniz Supercomputing Centre / TU Munich), Laura Schulz (Leibniz Supercomputing Centre), Martin Schulz (TU Munich)

Abstract: Quantum computers are making their way into High Performance Computing centers in the form of accelerators. Due to their physical implementation as mostly large appliances in separate racks, their number in typical data centers is significantly lower than the number of nodes offloading work to them, unlike the case with GPU accelerators. As a consequence, they form large-scale disaggregated infrastructures that pose a number of integration challenges due to their diverse implementation technologies and their need to be used as a shared resource for optimal utilization. Running hybrid High Performance Computing-Quantum Computing (HPCQC) applications in HPC environments, where the quantum portion is offloaded to the quantum processing units requires sophisticated resource management strategies to optimize resource utilization and performance. In this paper, we present the Munich Quantum Software Stack (MQSS), a Just-In-Time (JIT) compilation and execution software stack tailored for integrating disaggregated quantum accelerators into traditional HPC workflows.

Towards Disaggregated NDP Architectures for Large-scale Graph Analytics

Suyeon Lee (Georgia Institute of Technology), Vishal Rao (Georgia Institute of Technology), Ada Gavrilovska (Georgia Institute of Technology)

Abstract: The performance of large-scale graph analytics is limited by the capacity and performance of the memory subsystem on the platforms on which they execute. In this paper, we first discuss the limitations of existing approaches to scaling graph processing, and describe how they can be addressed via the use of disaggregated solutions with near-data processing (NDP) capabilities. Using observations from experimental analysis of the tradeoffs for different types of graphs and analytics kernels, we then identify the systems-level mechanisms that will be required by future graph analytics frameworks for disaggregated NDP architectures.

Paper Q&A and Panel Discussion

Curtis Shorts (Queen's University), Alex Weaver (University of North Texas), Vishal Rao (Georgia Institute of Technology), Jonas Markussen (Dolphin Interconnect Solutions), Håkon Stensland (Simula Research Laboratory / University of Oslo), Ercüment Kaya (Leibniz Supercomputing Centre / TU Munich)

Adjourn

Event Venue

285 Andrew Young International Blvd NW, Atlanta, GA 30313

The Georgia World Congress Center

Located in the heart of downtown, the Georgia World Congress Center is top-ranked in the nation and close to hotels, dining, attractions, shopping, and public transportation.

A STATE OF THE ART VENUE

Explore the Georgia World Congress Center website, including sections on their sustainability efforts.

Contact Us

If you have any comments/questions, do not hesitate to contact us.

Address

2501 NE Century Blvd, Hillsboro,
OR 97124, USA