Taishin Bank Advances Application Platform and Hybrid Cloud Architecture with VMware Cloud Foundation
May 25, 2025
CrowdStrike Collaborates with U.S. Department of Justice on DanaBot Takedown
May 26, 2025

Scale-up is simple. Ethernet makes it smarter.

Connectivity

There’s a lingering belief in the industry that scaling up AI systems is difficult. That doing so means locking into proprietary interconnect or creating a completely new interconnect standard, managing tightly constrained topologies, and living with higher cost along with a new set of operational tools.

The truth is: Scale-up is simpler than scale-out. And with the right foundation, such as Scale-Up Ethernet (SUE) announced by Broadcom last month, it's far more flexible.

Over the past few years, Ethernet has proven itself as the best technology for scale-out AI networking — connecting XPU nodes within a data center and across data centers. Now it’s time to bring Ethernet’s advantages to scale-up: enabling fast, reliable, and open communication within a node or rack, across tens to hundreds of tightly coupled XPUs.

Why Ethernet for scale-up?

When AI models run at massive scale, performance bottlenecks often emerge inside the system — where XPUs need to share memory, coordinate collectives or route outputs for Mixture-of-Experts inference.

In large-scale AI deployments, particularly scale-up architectures, multiple computing elements (XPUs and CPUs) operate closely together, often within a single server or tightly integrated system. As models become extremely large — potentially trillions of parameters — the compute and memory requirements surpass what any single processing unit can handle efficiently. This necessitates highly efficient intra-system communication, enabling XPUs to rapidly share data, synchronize computation, and collectively execute operations like all-reduce, broadcast, and gather, which are common in distributed model training and inference.

These scale-up domains demand:

  • High Bandwidth: Modern XPUs can have 40 – 100 TB/s of HBM bandwidth. Scale-up links need to match that.
  • Low Latency: Round-trip latency under 2µs is essential for remote memory access and fast synchronization.
  • Reliability: As XPUs access each other’s memory, every transfer must complete reliably — without introducing congestion, backpressure, or retries visible to the application.
  • Efficiency: When moving small payloads — often 64 bytes at a time — the networking overhead must be minimal.

These requirements are pushing interconnect technologies to their limits — and it’s here that Ethernet stands apart.

Introducing the Scale-Up Ethernet (SUE) framework

To bring Ethernet to scale-up domains, Broadcom developed the Scale-Up Ethernet (SUE) framework — a clean, standards-based specification for high-performance XPU interconnect.

We’ve contributed the full SUE specification to the Open Compute Project, making it available for others to build upon.

At a high level, SUE defines:

  • Interface Semantics: A consistent model for command, data, and management traffic between XPUs and the switch.
  • Memory Model: A simple, cache-coherent-like interface that supports put, get, and atomic operations over Ethernet, designed to emulate tightly coupled memory access.
  • Packet Format: Lightweight, latency-optimized headers that minimize per-transfer overhead – crucial for small data units.
  • Congestion Management: Support for link-level retry (LLR), credit-based flow control (CBFC), and priority flow control (PFC) to ensure lossless operation and deterministic latency.
  • Fungible Interfaces: The same XPU port can be used for both scale-up and scale-out, allowing for dynamic partitioning of resources and simplified hardware design.

SUE also allows flexibility in bandwidth — with 800G per SUE instance and supporting multiple network planes for load balancing, fault isolation, and increased aggregate throughput — and supports scale-up systems up to 1,024 XPUs in a single domain.

Below, Ram Velaga, Senior Vice President and General Manager, Core Switching Group at Broadcom, sits down with The Cube and explains Scale Up Ethernet (SUE).

Unifying the fabric

AI infrastructure shouldn’t require two separate networks — one for intra-node and one for inter-node communication. With SUE, you can unify scale-up and scale-out on a single Ethernet fabric, using the same optical and copper interconnects, the same visibility tools, and the same platform-wide telemetry.

That translates to:

  • Lower system complexity
  • Greater agility in system design
  • Reduced TCO
  • Vendor-neutral innovation

Unlike proprietary or semi-custom interconnect models, SUE runs on Ethernet — open by design, not just by license. No special chiplets, PHYs, or switch fabrics required.

SUE enables multi-vendor system design without dictating board layout, rack architecture, or thermal envelope. It works with the global Ethernet ecosystem, without needing specialized integration or vendor certification.

Conclusion

AI infrastructure is evolving fast — and the networking needs within an XPU node are now just as critical as those between nodes.

With Scale-Up Ethernet, we’ve made scale-up simple, efficient, and open. We’re excited to see how the industry builds on this foundation.

As Technovera Co., we officially partner with well-known vendors in the IT industry to provide solutions tailored to our customers’ needs. Technovera makes the purchase and guarantee of all these vendors, as well as the installation and configuration of the specified hardware and software.

Source