There’s a lingering belief in the industry that scaling up AI systems is difficult. That doing so means locking into proprietary interconnect or creating a completely new interconnect standard, managing tightly constrained topologies, and living with higher cost along with a new set of operational tools.
The truth is: Scale-up is simpler than scale-out. And with the right foundation, such as Scale-Up Ethernet (SUE) announced by Broadcom last month, it's far more flexible.
Over the past few years, Ethernet has proven itself as the best technology for scale-out AI networking — connecting XPU nodes within a data center and across data centers. Now it’s time to bring Ethernet’s advantages to scale-up: enabling fast, reliable, and open communication within a node or rack, across tens to hundreds of tightly coupled XPUs.
When AI models run at massive scale, performance bottlenecks often emerge inside the system — where XPUs need to share memory, coordinate collectives or route outputs for Mixture-of-Experts inference.
In large-scale AI deployments, particularly scale-up architectures, multiple computing elements (XPUs and CPUs) operate closely together, often within a single server or tightly integrated system. As models become extremely large — potentially trillions of parameters — the compute and memory requirements surpass what any single processing unit can handle efficiently. This necessitates highly efficient intra-system communication, enabling XPUs to rapidly share data, synchronize computation, and collectively execute operations like all-reduce, broadcast, and gather, which are common in distributed model training and inference.
These scale-up domains demand:
These requirements are pushing interconnect technologies to their limits — and it’s here that Ethernet stands apart.
Introducing the Scale-Up Ethernet (SUE) framework
To bring Ethernet to scale-up domains, Broadcom developed the Scale-Up Ethernet (SUE) framework — a clean, standards-based specification for high-performance XPU interconnect.
We’ve contributed the full SUE specification to the Open Compute Project, making it available for others to build upon.
At a high level, SUE defines:
SUE also allows flexibility in bandwidth — with 800G per SUE instance and supporting multiple network planes for load balancing, fault isolation, and increased aggregate throughput — and supports scale-up systems up to 1,024 XPUs in a single domain.
Below, Ram Velaga, Senior Vice President and General Manager, Core Switching Group at Broadcom, sits down with The Cube and explains Scale Up Ethernet (SUE).
AI infrastructure shouldn’t require two separate networks — one for intra-node and one for inter-node communication. With SUE, you can unify scale-up and scale-out on a single Ethernet fabric, using the same optical and copper interconnects, the same visibility tools, and the same platform-wide telemetry.
That translates to:
Unlike proprietary or semi-custom interconnect models, SUE runs on Ethernet — open by design, not just by license. No special chiplets, PHYs, or switch fabrics required.
SUE enables multi-vendor system design without dictating board layout, rack architecture, or thermal envelope. It works with the global Ethernet ecosystem, without needing specialized integration or vendor certification.
AI infrastructure is evolving fast — and the networking needs within an XPU node are now just as critical as those between nodes.
With Scale-Up Ethernet, we’ve made scale-up simple, efficient, and open. We’re excited to see how the industry builds on this foundation.
As Technovera Co., we officially partner with well-known vendors in the IT industry to provide solutions tailored to our customers’ needs. Technovera makes the purchase and guarantee of all these vendors, as well as the installation and configuration of the specified hardware and software.