Partner POV | 5 Reasons to Choose HPE Cray XD670
In this article
This article was created and contributed by our partner, HPE.
HPE Cray XD670 is specifically designed and optimized for AI workloads that are heavily parallelized, requiring GPU acceleration for optimum performance. Examples are AI training and tuning, natural language processing, and large language model (LLM) and multimodal training. If you are an AI service provider, or if your organization is already past the initial AI piloting phase and you are building and training models, or extending AI across your enterprise, you will greatly benefit from the scale and power of this system.
Here are five reasons to choose HPE Cray XD670:
Breakthrough Performance
MLCommons™, a leading AI engineering consortium built on a philosophy of open collaboration to improve AI systems, has recently published results from two benchmark suites. The MLPerf™ Inference v4.0 (1) benchmark suite delivers industry‑standard machine learning (ML) system performance benchmarking in an architecture‑neutral, representative, and reproducible manner. In this benchmark, HPE Cray XD670, powered by NVIDIA® H100 GPUs, achieved the #1 spot for NLP with BERT 99.0 Offline scenario, and was a top performer in all the categories it participated, including generative AI (GenAI), computer vision (CV), and LLMs. The MLPerf Training v4.0 benchmark suite measures how fast systems can train models to a target quality metric.(2) In this benchmark, HPE Cray XD670 was the #2 fastest single-node system when compared to other servers featuring 8 NVIDIA H100 GPUs, in both NLP (BERT model) training and LLM (Llama2 model for LoRA) fine-tuning. These leading results on a range of AI benchmarks are proof of the superior HPE Cray XD670 performance for AI environments.
Leading-edge GPU Acceleration
Powered by eight NVIDIA H200 Tensor Core GPUs, HPE Cray XD670 boosts AI performance with massive GPU acceleration. NVIDIA H200 delivers 141 GB of HBM3e memory at 4.8 TB/s. This represents almost double the capacity of its predecessor, the NVIDIA H100 Tensor Core GPU—with 1.4x more memory bandwidth.(3) This larger, faster memory delivers superior performance within the same power profile as the NVIDIA H100. This means higher performance with better energy efficiency and lower total cost of ownership. Fully integrated with the NVIDIA stack, HPE Cray XD670 is also available with NVIDIA H100 Tensor Core GPUs.
Sustainable Cooling Options
HPE Cray XD670 is offered as an air-cooled solution or with plug-and-play direct liquid cooling (DLC). Air cooling capabilities fully utilize heat-to-air transfer, whereas DLC uses facility water to transfer a substantial amount of produced heat to the liquid. DLC provides a fully rack-contained and integrated warm water-cooled IT system, which comes prefilled and is ready to run. Each rack is self-contained and integrated into our liquid-cooled factory. The racks can be filled as you go at a linear cost. With extremely efficient cooling capabilities, liquid cooling supports the latest GPU technologies, enabling more servers per rack with fewer rack requirements and more effective heat capture for reduced cooling power requirements. Ultimately, DLC increases energy efficiencies, thereby helping you advance your sustainability goals.
Scalability to Meet Growing AI Workloads
With the dramatic increase in compute requirements for large-scale AI workloads, AI service providers and ambitious enterprises need large-scale GPU environments. HPE Cray XD670 offers a complete solution for AI that can seamlessly scale with an extremely dense configuration. Environments can scale out from one to thousands of nodes to support growing AI environments. Leading AI cloud providers in both the U.S. and Europe are deploying HPE Cray XD670–based environments with tens of thousands of GPUs, proof of the platform's vast scalability.
Best-in-class Expertise
HPE Cray XD670 systems are backed by HPE Services, providing decades of industry experience tailored to planning, deploying, and managing complex solutions. We offer professional, advisory, and operational services to guide you in creating a road map to your specific goals and deploying the ideal solution for your initiatives.
The recommended support service for HPE Cray XD670 is HPE Complete Care Service, a modular, AI-powered, edge-to-cloud IT environment service designed to help optimize your entire IT environment and achieve agreed-upon IT outcomes and business goals. It can also help you quantify, report, and optimize your energy consumption and IT environmental impact. All delivered by an assigned team of HPE Services experts, HPE Complete Care Service provides a complete coverage approach and fully personalized engagement, an enhanced incident management experience with priority access, and a digitally enabled customer experience.
Additionally, you can drive greater circularity into your AI environment with financing and asset management solutions from HPE Financial Services that help you extend the use of tech and lower long-term environmental impact.
References
(1) "New MLPerf Inference Benchmark Results Highlight The Rapid Growth of Generative AI Models," MLCommons, March 27, 2024
(2) MLCommons MLPerf Training v4.0 results, June 2024
(3) "NVIDIA H200 Tensor Core GPU," NVIDIA, November 2023