SRC: A Scalable Reliable Connection for RDMA with Decoupled QPs and Connections

APNet'25 |

Published by ACM | Organized by ACM

Using Remote Direct Memory Access (RDMA) is the trend in data centers for achieving high throughput and low latency due to the benefits of hardware offloaded stacks. However, RDMA cannot provide consistently high performance at scale due to limited RDMA NIC (RNIC) hardware state capacity. We observe that the high number of required RDMA connections for efficient RDMA system is due to the coupled design of channels between host and RNIC and the network connections. In this paper, we propose a novel RDMA transport concept, SRC, which decouples the network connections from queue pairs. SRC introduces a lightweight mapping scheme for efficient forwarding between QPs and connection on RNIC. Besides, SRC lets software manages the mapping between QPs and connections. Results show that SRC can reduce RDMA states size from 146.20 MB to 0.19 MB for a 512-server cluster running RDMA applications.