The Future of High Performance Networking: Ultra Ethernet Explained
In this blog
Introduction
The evolution of networking technology has been marked by continual advancements aimed at supporting faster speeds, lower latency and more robust data handling. Ultra Ethernet represents the latest innovation in this lineage, promising to revolutionize high-performance networking for modern data-centric applications like artificial intelligence (AI), machine learning (ML) and large-scale data analytics.
Collaboration on Ultra Ethernet began in 2023 and now includes more than 100 member companies. Version 1.0 of the specification has been published, but the actual development (silicon, OS, application) is ongoing.
Ethernet challenges
Typical enterprise traffic is highly varied and has a flexible range of requirements. AI/ML networks, on the other hand, have consistent but stringent requirements. They have powerful endpoints and a small number of large and homogenous flows, are highly intolerant of latency and unreliability, and no part of the job is finished until every part of the job is finished.
Where traditional Ethernet has fallen short:
- Load balancing: Ethernet doesn't really load balance; rather, it does "statistical load distribution" that makes use of the standard 5 tuples pulled from IP addresses, ports and protocols. With limited data to differentiate between extra-large and long flows, Ethernet can (and does) try to cram too much data down one pipe while leaving the rest unused.
- Order delivery: Ethernet makes no guarantees of in-order delivery (something HPC requires) and relies on the transport and application layer to deliver such.
- Reliability: Similarly, Ethernet makes no guarantees that a datagram sent will be a datagram delivered. RDMA operations address this via a Go-Back-N approach: drop anything and then rewind to the "last known good" message and restart from there. A small number of drops can have an outsized (60%+) impact on job completion time.
- DCQCN: The existing PFC/ECN/DCQCN congestion management systems are difficult to scale, tune and adapt.
This is precisely why InfiniBand has been the go-to transport for RDMA over the last 20 years, and why the Ultra Ethernet Consortium (UEC) was established. The goals of the UEC are simple:
- Match/Exceed InfiniBand performance
- Retain the scale and flexibility of Ethernet
- Update RDMA for the modern era
In keeping with this, Ultra Ethernet introduces a refined packet structure optimized for high-performance workloads. Key elements of the Ultra Ethernet specification include:
- Extended header fields: The packet headers feature extended fields for advanced metadata, enabling more precise flow classification and improved routing decisions. These fields also support enhanced quality of service (QoS) and security tagging.
- Traffic profiles: Ultra Ethernet includes three basic traffic profiles for specific traffic types.
- AI Base: Focuses on core "common" AI use cases, including distributed inference and training.
- AI Full: AI Base, plus specific functions like Deferrable Send, exact-matching tagging, and extended atomic operations.
- HPC: Advanced tagging semantics, more atomic operations, and full rendezvous support for HPC workloads beyond AI.
- In-network collectives: Ultra Ethernet packets include markers that facilitate in-network processing, such as reductions or transformations. This is an evolution/application of FPGA (Field Programmable Gate Array) technology and is particularly valuable for AI/ML workloads where intermediate computations can be performed in transit.
- High-precision timestamps: Each packet carries high-precision timestamps, ensuring synchronization across distributed systems and enabling accurate latency measurements.
- Congestion management: Mechanisms for throttling back (or opening up) throughput becomes more refined. This includes features such as:
- Ephemeral connections to go from zero to wire rate with minimal delay
- Optimizations for ultra-low latency "short RTT" environments
- Packet spraying for optimal bandwidth utilization
- Quick & granular recovery: Ultra Ethernet moves some retransmit and recovery functions to the Link Layer, and packet trimming allows for granular/selective retransmits of lost data.
UET protocol stack
UET (Ultra Ethernet Transport, the product of the UEC) is still under development, with full-stack supporting hardware shipping by late 2025 or early 2026.
It's important to understand that most hardware today is documented as "supporting Ultra Ethernet" (meaning it will pass the packets as valid), but not yet implementing the full suite of enhancements that will ultimately be part of the specification.
Overview
Ultra Ethernet primarily consists of updated elements of the Physical, Link, and Transport layers and improved congestion management methods that scale to very large environments.
This graphic represents the UET protocol stack and includes comments on major features at each layer.
Details
The following is a breakdown of the functionality delivered by each layer of the UET protocol stack:
PHY layer
PHY updates are fairly straightforward:
- SerDes are specified to provide 800/1600 gbps speed.
- Forward Explicit Congestion updates provide granular data on traffic that will be leveraged for upper layer telemetry necessary for performance tuning and the efficient retransmission of data.
Link layer
- LLR (Link Level Retry): Ultra Ethernet moves the classic retransmit mechanisms from the Transport Layer to a much lower point in the stack. This vastly improves error recovery time and reduces tail latency. In operation, it is similar to TCP functionality: frames are tracked by sequence numbers with ACK/NACK messages exchanged on either side of the conversation. If a frame is NACKed, that specific frame is resent (vs clumsier "go-back-N" methods).
- PRI (Packet Rate Improvement): While elephant flows are of major concern, their opposite (mouse flows) also emphasize certain inefficiencies in the existing ethernet stack stemming from legacy and redundant features. UET will compress IP and Ethernet headers to reduce their impact.
- LLDP (Link Level Discovery Protocol): While nothing new, LLDP will be updated to include the capacity for endpoints to exchange compatible features (e.g., LLR and PRI) when negotiating capabilities.
Transport layer
The transport layer has been almost completely redesigned to improve on legacy RDMA shortcomings. At a high level, these updates include:
- Low-latency transport enhancements: Ultra Ethernet's RDMA implementation minimizes transport-level overhead, enabling ultra-low-latency memory-to-memory transfers. This is critical for distributed databases and high-frequency trading applications.
- Dynamic buffer allocation: Advanced RDMA in Ultra Ethernet features dynamic buffer allocation, which optimizes memory usage and reduces congestion in high-traffic scenarios.
- Scalability: Ultra Ethernet supports very large fabrics (1,000,000+ enpoints) with enhanced flow control mechanisms. These improvements ensure seamless scalability without degradation in performance.
- Security: Built-in encryption and authentication.
- Simplified API: The updated API includes UET extensions for Libfabric 2.0 and replaces the more awkward IB VERBS.
Reference table
Conclusion
Ultra Ethernet represents a significant leap forward, offering speed, scalability, granularity and reliability needed for modern AI/ML workloads. While challenges remain, the potential benefits far outweigh the hurdles, making Ultra Ethernet a cornerstone of next-generation high-performance infrastructure.
How WWT can help
Ultra Ethernet is an excellent example of "science that needs to be verified". While the advancements of Ultra Ethernet are extensions of existing technologies, their real-world impact on AI/ML workloads has yet to be gauged. Similarly, the hardware on which it will run (even if it currently exists) is unintegrated with the entire UET stack. In short, the best practices have yet to be created.
World Wide Technology has over 10 years of experience in the design and implementation of Big Data and AI/ML solutions. WWT's AI Proving Grounds (AIPG, established in 2023) provides an ecosystem of best-of-breed hardware, software and architecture where customers can answer pressing questions in AI infrastructure and design.
Sources
Ultra Ethernet Consortium: https://Ultra Ethernet.org
Open Compute Project UEC Overview: https://www.youtube.com/watch?v=y9b4ztb4C-4