RESDIS'25

Introduction

Disaggregation is an emerging compute paradigm that splits existing monolithic servers into a number of consolidated single-resource pools that communicate over a fast interconnect. This model decouples individual hardware resources, including tightly coupled ones such as processors and memory, and enables the composition of logical compute platforms with flexible and dynamic hardware configurations.

The concept of disaggregation is driven by various recent trends in computation. From an application perspective, the increasing importance of data analytics and machine learning workloads in HPC centers brings unprecedented need for memory capacity, which is in stark contrast with the growing imbalance in the peak compute-to-memory capacity ratio of traditional system board based server platforms where memory modules are co-located with processors. Meanwhile, traditional simulation workloads leave memory underutilized. At the hardware front, the proliferation of heterogeneous, special purpose computing elements promotes the need for configurable compute platforms, while at the same time, the increasing maturity of optical interconnects raises the prospects of better distance independence in networking infrastructure.

The workshop intends to explore various aspects of resource disgregation, composability and their implications for high performance computing, both in dedicated HPC centers as well as in cloud environments. RESDIS aims to bring together researchers and industrial practitioners to foster discussion, collaboration, mutual exchange of knowledge and experience related to future disaggregated systems.

Call for Papers

The RESDIS program committee solicits original, high-quality submissions of unpublished results related to the theme of resource disaggregation and composable systems. Topics of interest include, but not limited to:

- Disaggregated hardware in high-performance computing
- Operating systems and runtime support for disaggregated platforms
- Simulation of disaggregated platforms with existing infrastructure
- Runtime systems and programming abstractions for disaggregation and composability
- Networking for disaggregation, including silicon photonics and optical interconnects
- Implications of resource disaggregation for scientific computing and HPC applications
- Algorithm design for disaggregated and composable systems
- Disaggregated high throughput storage
- Disaggregated heterogeneous accelerators (GPUs, FPGAs, AI Accelerators, etc.)
- Resource management in disaggregated and composable platforms

Workshop papers will be published in the SC Workshops Proceedings volume.
Submitted manuscripts must use the ACM proceedings template, two-column:

https://www.acm.org/publications/proceedings-template.

Submissions must be at least 5 pages (no upper limit), including references and figures. Prospective authors should submit their papers in PDF format through Linklings’ submission site:

~~Submissions Closed~~

Important Dates

August 8th 15th (Fri) AoE

~~Submission deadline (EXTENDED, FINAL)~~

September 5th (Fri)

Acceptance notification

September 26th (Fri)

Camera ready paper deadline

November 16th (Sun)

Workshop date

Organization

Workshop Chairs

Balazs Gerofi

Intel Corporation, USA & RIKEN, Japan

John Shalf

Lawrence Berkeley National Laboratory, USA

Christian Pinto

IBM Research Europe, Ireland

Program Committee

Michael Aguilar

Sandia National Laboratories, USA

Larry Dennison

Nvidia, USA

Aadesh Deshmukh

AMD, USA

Kyle Hale

Illinois Institute of Technology, USA

John (Jack) Lange

Oak Ridge National Laboratory, USA

George Michelogiannakis

Lawrence Berkeley National Laboratory, USA

Ivy Peng

KTH Royal Institute of Technology, Sweden

Yu Tanaka

Fujitsu, Japan

Gaël Thomas

Télécom SudParis, France

Agenda

All times in Central Time Zone (UTC-5)

09:00 AM

Welcome and Introduction

09:10 AM

Opening Keynote: Photonics-Enabled Systems for the Disaggregated Era of Supercomputing and AI

Fabrizio Petrini (Intel Corporation)

Abstract: As AI and HPC workloads push the limits of performance, power, and scalability, the industry faces a fundamental architectural shift. The traditional boundaries between compute, memory, and networking are dissolving, giving rise to disaggregated systems—composed from modular chiplets and accelerators that must communicate with near-monolithic efficiency. In this landscape, photonics is emerging not just as an I/O technology, but as the foundation for next-generation system architecture. This keynote will explore how advances in integrated photonics—including silicon photonic interposers, co-packaged optics, and optical circuit fabrics—are transforming the design of large-scale AI and supercomputing platforms. By delivering orders-of-magnitude improvements in bandwidth density, latency uniformity, and energy per bit, photonics interconnects enable the flexible composition of heterogeneous compute elements across dies, packages, and system racks. The talk will outline the evolution from electrical domain limits to photonic-domain scalability, highlighting how photonic fabrics can unify on-package, board-level, and cluster-scale communication. It will examine the interplay between photonics, packaging, and network topology, and discuss emerging opportunities in optically reconfigurable architectures for AI model training and HPC workflows. Ultimately, it will argue that photonics is not just a bandwidth solution—but the enabler of a new class of composable, memory-centric, and energy-efficient supercomputing systems.

10:00 AM

Coffee Break

10:30 AM

Fast on-demand Memory Mapping for Shared Memory and Disaggregated Systems

Yuang Yan (Queen's University), Ryan Grant (Queen's University)

Abstract: Efficient synchronization of memory mapping information is increasingly important as systems evolve toward greater resource disaggregation and heterogeneity. When memory is exported between processes, establishing shared mapping often requires costly page table walk and updates, particularly in fault-driven models. To study these costs, we implement an XPMEM-inspired shared-memory driver and evaluate techniques to reduce mapping overhead. Our approach combines parallel batched on-demand pinning, bypassing unnecessary cache-policy lookups in PFN mapping, and dynamic re-registration to expand registered regions without tearing down existing mappings. In our evaluation, these optimizations reduce cold-start memory copy by up to 13.22x over XPMEM in multi-process workloads, with particular benefits for collective communication patterns and rapidly resizing buffers. While developed in a shared-memory context, the results highlight general strategies—avoiding redundant translation work, enabling parallel mapping operations, and preserving mapping state—that can inform the design of memory management in disaggregated systems, including GPU disaggregation and heterogeneous memory environments.

10:50 AM

TEGRA - Scaling Up Graph Processing with Disaggregated Computing

William Shaddix (University of California, Davis), Mahyar Samani (University of California, Davis), Jason Lowe-Power (Google LLC, University of California, Davis), Venkatesh Akella (University of California, Davis)

Abstract: Graph processing workloads continue to grow in scale and complexity, demanding architectures that can adapt to diverse compute and memory requirements. Traditional scale-out accelerators couple compute and memory resources, resulting in resource underutilization when executing workloads with varying compute-to-memory intensities. In this paper, we present TEGRA, a composable, scale-up architecture for large-scale graph processing. TEGRA leverages disaggregated memory via CXL and a message-passing communication model to decouple compute and memory, enabling independent scaling of each. Through detailed evaluation using the gem5 simulator, we show that TEGRA improves memory bandwidth utilization by up to 15\% over state-of-the-art accelerators by dynamically provisioning compute based on workload demands. Our results demonstrate that TEGRA provides a flexible and efficient foundation for supporting emerging graph analytics workloads across a wide range of arithmetic intensities.

11:10 AM

An RDMA-First Object Storage System with SmartNIC Offload

Yu Zhu (ETH Zurich), Aditya Dhakal (Hewlett Packard Labs), Pedro Bruel (Hewlett Packard Labs), Gourav Rattihalli (Hewlett Packard Labs), Yunming Xiao (The Chinese University of Hong Kong, Shenzhen), Johann Lombardi (Hewlett Packard Labs), Dejan Milojicic (Hewlett Packard Labs)

Abstract: AI training and inference impose sustained, fine-grained I/O that stresses host-mediated, TCP-based storage paths. We revisit POSIX-compatible object storage for GPU-centric pipelines and present ROS2, an RDMA-first design that offloads the DAOS client to an NVIDIA BlueField-3 SmartNIC while leaving the server-side DAOS I/O engine unchanged. ROS2 splits a lightweight gRPC control plane from a high-throughput data plane (UCX/libfabric over RDMA or TCP), removing host mediation from the data path. Using FIO/DFS across local and remote settings, we show that on server-grade CPUs RDMA consistently outperforms TCP for large sequential and small random I/O. When the client is offloaded to BlueField-3, RDMA performance matches the host; TCP on the SmartNIC lags, underscoring RDMA’s advantage for offloaded deployments. We conclude that an RDMA-first, SmartNIC-offloaded object store is a practical foundation for LLM data delivery; optional GPUDirect placement is left for future work.

11:30 AM

DoCeph: DPU-Offloaded Messaging in Ceph for Reduced Host CPU Utilization

Kyuli Park (Sogang University, South Korea), Sungmin Yoon (Sogang University, South Korea), Farid Talibli (Sogang University, South Korea), Sungyong Park (Sogang University, South Korea), Jae-Hyuck Kwak (Korea Institute of Science and Technology Information), Kimoon Jeong (Korea Institute of Science and Technology Information), Awais Khan (Oak Ridge National Laboratory), Youngjae Kim (Sogang University, South Korea)

Abstract: Ceph is a widely used distributed object store, but its messenger layer imposes substantial CPU overhead on the host. To address this limitation, we propose DoCeph, a DPU-offloaded storage architecture for Ceph that disaggregates the system by offloading the communication-intensive messaging component to the DPU while retaining the storage backend on the host. The DPU efficiently manages communication, using lightweight RPC for metadata operations and DMA for data transfer. Moreover, DoCeph introduces a pipelining technique that overlaps data transmission with buffer preparation, mitigating hardware-imposed transfer size limitations. We implemented DoCeph on a Ceph cluster with NVIDIA BlueField-3 DPUs. Evaluation results indicate that DoCeph cuts host CPU usage by up to 92% while sustaining stable throughput and providing larger performance benefits for object writes over 1 MB.

11:50 AM

Closing Invited Talk: Fast and Scalable Inference with NVIDIA Dynamo

Benjamin Glick (NVIDIA), Arnav Goel (NVIDIA)

Abstract: As large language models move into production at unprecedented scale, the requirements for efficient, reliable, and cost-effective inference have diverged from those of training. Modern deployments must meet diverse SLAs, support rapidly growing GPU fleets, and include workloads with different performance characteristics. NVIDIA Dynamo is a production-grade framework for distributed inference at scale that addresses these challenges through modular disaggregation, topology-aware scheduling, and intelligent memory and KV-cache management. This presentation covers Dynamo’s design for high-performance inference at scale, detailing how disaggregating inference across prefill and decode phases increases utilization. We highlight advancements such as KV-cache-aware routing and offloading strategies that leverage the full memory hierarchy, from HBM to networked storage. Together, these strategies enable a cohesive platform that enables efficient and scalable LLM inference in real-world production environments.

12:30 PM

Adjourn

Event Venue

America's Center Convention Complex, 701 Convention Plaza, St. Louis, MO 63101

America's Center Convention Complex

Located in the heart of downtown, America’s Center has completed phase one of a major expansion and is close to hotels, dining, attractions, shopping, and public transportation.

A STATE OF THE ART VENUE

Explore America’s Center's website, including information on their AC Next Gen expansion and renovation project.

Contact Us

If you have any comments/questions, do not hesitate to contact us.

Address

2501 NE Century Blvd, Hillsboro,
OR 97124, USA

Email

bgerofi_AT_gmail.com

RESDIS'25

5th International Workshop on
RESource DISaggregation
in High-Performance Computing

Held together with The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'25)

Introduction

Call for Papers

Important Dates

August 8th 15th (Fri) AoE

September 5th (Fri)

September 26th (Fri)

November 16th (Sun)

Organization

Workshop Chairs

Program Committee

Agenda

Welcome and Introduction

Opening Keynote: Photonics-Enabled Systems for the Disaggregated Era of Supercomputing and AI

Coffee Break

Fast on-demand Memory Mapping for Shared Memory and Disaggregated Systems

TEGRA - Scaling Up Graph Processing with Disaggregated Computing

An RDMA-First Object Storage System with SmartNIC Offload

DoCeph: DPU-Offloaded Messaging in Ceph for Reduced Host CPU Utilization

Closing Invited Talk: Fast and Scalable Inference with NVIDIA Dynamo

Adjourn

Event Venue

America's Center Convention Complex

Contact Us

Address

Email

5th International Workshop on RESource DISaggregationin High-Performance Computing

Held together with The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'25)

Introduction

Call for Papers

Important Dates

August 8th 15th (Fri) AoE

September 5th (Fri)

September 26th (Fri)

November 16th (Sun)

Organization

Workshop Chairs

Program Committee

Agenda

Welcome and Introduction

Opening Keynote: Photonics-Enabled Systems for the Disaggregated Era of Supercomputing and AI

Coffee Break

Fast on-demand Memory Mapping for Shared Memory and Disaggregated Systems

TEGRA - Scaling Up Graph Processing with Disaggregated Computing

An RDMA-First Object Storage System with SmartNIC Offload

DoCeph: DPU-Offloaded Messaging in Ceph for Reduced Host CPU Utilization

Closing Invited Talk: Fast and Scalable Inference with NVIDIA Dynamo

Adjourn

Event Venue

America's Center Convention Complex

Contact Us

Address

Email

5th International Workshop on
RESource DISaggregation
in High-Performance Computing