Scaling Out Chip Interconnect Networks with Implicit Sequence Numbers
Journal:
arXiv
Published Date:
Jun 28, 2025
Abstract
As AI models outpace the capabilities of single processors, interconnects
across chips have become a critical enabler for scalable computing. These
processors exchange massive amounts of data at cache-line granularity,
prompting the adoption of new interconnect protocols like CXL, NVLink, and
UALink, designed for high bandwidth and small payloads. However, the increasing
transfer rates of these protocols heighten susceptibility to errors. While
mechanisms like Cyclic Redundancy Check (CRC) and Forward Error Correction
(FEC) are standard for reliable data transmission, scaling chip interconnects
to multi-node configurations introduces new challenges, particularly in
managing silently dropped flits in switching devices. This paper introduces
Implicit Sequence Number (ISN), a novel mechanism that ensures precise flit
drop detection and in-order delivery without adding header overhead.
Additionally, we propose Reliability Extended Link (RXL), an extension of CXL
that incorporates ISN to support scalable, reliable multi-node interconnects
while maintaining compatibility with the existing flit structure. By elevating
CRC to a transport-layer mechanism for end-to-end data and sequence integrity,
and relying on FEC for link-layer error correction and detection, RXL delivers
robust reliability and scalability without compromising bandwidth efficiency.