Nvidia's Spectrum-X is Ethernet on steroids for cloud AI

Nvidia has unveiled a new Ethernet networking platform called Spectrum-X designed for hyperscale AI workloads in shared environments, and built on components including a 51Tb/sec Ethernet switch.

The product aims to address a sweet spot of demanding networking requirements for generative AI running in multi-tenant clouds, for which the powerful but expensive and restrictive InfiniBand (an interconnect technology used in high-performance computing or HPC) may not suit.

It is a fully standards-based Ethernet with support for open Ethernet stacks (SONiC, Linux Switch) at cloud scale, Nvidia said, but tuned for GPT and BERT LLMs, distributed training and parallel processing, natural language processing, computer vision, high-performance simulations and data analytics, or inference applications. It is available now.

See the Spectrum-X Technical whitepaper

The Nvidia Spectrum-X release was one of a flurry of announcements from the company ahead of Taiwan’s COMPUTEX conference, during which CEO Jensen Huang took to the stage for a two-hour presentation.

They also included its new MGX server specification, which aims to provide system manufacturers with a “modular reference architecture to… cost-effectively build more than 100 server variations to suit a wide range of AI, high performance computing and Omniverse applications.

"ASRock Rack, ASUS, GIGABYTE, Pegatron, QCT and Supermicro will adopt MGX, which can slash development costs by up to three-quarters and reduce development time by two-thirds to just six months” Nvidia said.

Nvidia is now worth more in terms of market capitalisation (just under $1 trillion as The Stack writes) than AMD, Broadcom, Intel, and Qualcomm combined, after a dizzying surge in valuation triggered by a sales forecast for this quarter that was $4 billion higher than analysts expected.

Huang attributed the surge in sales to a wholesale ongoing shift around data centres from CPU-centric and “dumb” network interface card (NIC)-powered workloads, to one of accelerated computing in which critical workloads ranging from AI application data processing through to data centre security and networking are offloaded from CPUs onto a range of accelerators including GPUs, SmartNICs and other platforms.

So what is Nvidia Spectrum-X?

NVIDIA Spectrum-X is fundamentally built on two key components: Spectrum-4, "The world’s first 51Tb/sec Ethernet switch built specifically for AI networks" with advanced RoCE (a fine-grained load balancing technology) extensions that work in concert across the Spectrum-4 switches, along with BlueField-3 DPUs (accelerators that can offload and isolate software-defined networking, storage, security, and management functions) to create an "end-to-end 400GbE network that is optimized for AI clouds.”

Spectrum-X will enable “unprecedented scale of 256 200Gb/s ports connected by a single switch, or 16,000 ports in a two-tier leaf-spine topology to support the growth and expansion of AI clouds while maintaining high levels of performance and minimizing network latency.”

Partners offering Spectrum-X include Dell, Lenovo and Supermicro.

Nvidia said it will be building a “hyperscale generative AI supercomputer to be deployed in its Israeli data center on Dell PowerEdge XE9680 servers” as a testbed and showcase for the networking technology which will help move data serving AI applications rapidly in of, out of and around racks of servers; not least because emerging AI applications are highly demanding: "Image in, video out, video in, text out, image in, proteins out, text in, 3D out, video in, in the future, 3D graphics out” as Huang put it to analysts.

Subscribe to The Stack