Article written by Ravi Kuppuswamy and provided by AMD

That said, AI is still unquestionably in its early stages, and the path forward isn't always clear. Many enterprises are still drawing up plans for ways to leverage the technology. Once an organization has a vision in mind, the next challenge is implementation….

  • What is the right compute environment for your AI use case?
  • What new resources do you need to power your AI tools?
  • How do you integrate those resources into your existing environments?

AI isn't a monolithic tool. Enterprises have different goals, different priorities and different technical considerations. That means their AI workloads will look different and have perhaps widely varying infrastructure requirements. The path is likely evolutionary.

The reality is that many enterprises will have to rely on CPUs as well as GPUs. This should really be of no surprise given the massive installed base of x86-based CPUs that have driven business computing for decades and which host the vast data stores that businesses will seek to mine and evolve using AI techniques. And, in many cases, the CPUs themselves will serve the need effectively and efficiently. While large language models such as ChatGPT have a lot to offer and require enormous compute power, we believe that many enterprises will get real benefit from smaller, targeted models run on a more modest infrastructure.

Where do your organization's workloads fall on this spectrum? "It depends" is never a satisfying answer, but in many cases, it's the right one.  But it is also one that AMD can help you navigate with confidence. With AMD EPYCâ„¢ processor-based servers, AMD offers the enterprise a balanced platform that can leverage high-performance, energy-efficient CPUs and also host leading high-performance GPUs when workload needs demand it. You might understand from a leading GPU provider's marketing that GPUs will be the best way to support your AI workloads. Alternatively, a CPU company's marketing initiatives may likely suggest that its CPUs are clearly and consistently the best option. Yet to implement AI in a way that makes the most sense for your enterprise—with an evolving mix of AI- and non-AI augmented workloads, you'll need a platform that can handle both options and everything in between -- an efficient, effective and adaptable platform such as AMD EPYC CPU-based servers. 

Make space for AI

Whatever your AI plans may be, a first step often needs to be making space for it in your data center. These days, data centers are typically running at or near capacity–in terms of available power, available space, or both. If this is true in your data center, one of the more effective solutions is to consolidate your existing workloads.

Once you have the space (and energy) to continue, AMD can help you choose the right compute options to invest in. For mixed workloads (for example, AI-augmented engineering simulation tools, or AI-enhanced collaboration systems), small-to-medium models and classic machine learning, AMD EPYC CPUs offer impressive performance. They're also effective for batch and small-scale real-time inference uses where the cost and performance of adding GPUs is not warranted or efficient. Even if you're building a large language model that's specifically tailored to your business - with perhaps a few billion parameters, as opposed to OpenAI's GPT-3's 175 billion parameters - CPUs can offer strong performance and efficiency options at a reasonable cost.

AMD EPYC servers are also a compelling platform for jobs that require the power of GPUs, such as medium-to-large models and large-scale real-time inference. The AMD Instinctâ„¢ and AMD Radeonâ„¢ GPU families are increasingly proving to be strong solutions to enable outstanding AI performance, but there are certainly other options. A growing number of EPYC processor-based servers are certified to run an assortment of NVIDIA GPUs, allowing you to simply plug in your NVIDIA accelerators with known, trusted server brands to get the performance scalability you need. With AMD EPYC processor-based servers, no matter the accelerators added, you won't just get the performance you need, but also the memory capacity, bandwidth and robust security capabilities you want.

There is no one set journey to AI enablement.  Many enterprises will follow different paths based on their specific goals, business and technical priorities and other considerations. But no matter where your enterprise is headed in the AI era, AMD EPYC processor-powered servers offer the platform to take you there as your needs evolve.

Learn more about AMD EPYC processor-based servers  Connect with an expert

End note:

SP5TCO-055:  This scenario contains many assumptions and estimates and, while based on AMD internal research and best approximations, should be considered an example for information purposes only, and not used as a basis for decision-making over actual testing. The Bare Metal Server Greenhouse Gas Emissions TCO (total cost of ownership) Estimator Tool – v9.37 Pro Refresh, compares the selected AMD EPYCâ„¢ and Intel® Xeon® CPU based server solutions required to deliver a TOTAL_PERFORMANCE of 80,000 units of integer performance based on the published scores for these specific Intel Xeon and AMD EPYC CPU based servers as of June 1, 2023.  This estimation reflects a 3-year time frame with a PUE of 1.7 and a power US power cost of $0.128 / kWh.  This analysis compares a 2P AMD 32 core EPYC 9334 CPU powered server with a SPECrate®2017_int_base a score of 725, https://spec.org/cpu2017/results/res2023q1/cpu2017-20230102-33282.pdf ; to a 2P Intel Xeon 16 core Gold_6143 based server with a SPECrate®2017_int_base score of 197, https://spec.org/cpu2017/results/res2017q4/cpu2017-20171114-00863.pdf.

Due to the wide variation of costs for real estate or admins, this TCO does not include their costs in this analysis. New AMD-powered server OpEx consists of power only.  The OpEx for the legacy install base of servers with Intel CPUs consists of power plus the extended warranty costs. Cost to extend the server warranty support is calculated to be 20% annually of the initial purchase price which is calculated using 2023 costs. Using this and the power costs mean that the AMD solution for a 3 year TCO is more than $2.5 million less (62% less) and has a $1.2 million or 93% lower annual OpEx.

For more detail see https://www.amd.com/en/claims/epyc4#SP5TCO-055  

Technologies