Beyond GPU-Only Solutions: A Strategic Approach to Enterprise Generative AI Architecture

Written by: Matt Halcomb, WWT, and Jim Greene, AMD

Generative AI (GenAI), and the underlying infrastructure it relies on, have undergone a transformation over the past 18 months. While many believe graphics processing unit (GPU) is the default way to go for GenAI, you do have other options to evaluate. A scalable solution leveraging both central processing units (CPU) and GPU technologies often enables you to succeed at any level with the right balance of cost and energy efficiency.

Today, organizations are finding that the secret to successful enterprise GenAI isn't just raw GPU power—it's thoughtfully balancing GPU and CPU capabilities for a dynamic and power-efficient solution that adapts to tomorrow's needs. Partners, like WWT and AMD, take this approach to heart with the mission of helping customers right-size their environment and scale it accordingly.

The evolution and current state of enterprise GenAI

The brief history of GenAI implementation encompasses a rapid evolution in both technology and approach. Only 18 months ago, organizations were focused on large foundational models requiring substantial GPU investments, but experience and a refined understanding of practical enterprise use cases have revealed a better way. This period demonstrated that success in GenAI isn't about maximizing GPU usage—it's about finding the right balance of computing resources that best support the data, model and database a client chooses for the GenAI solution that suits a business need.

The shift from homegrown large foundational models to pre-trained solutions marked a new stage, and the first step toward more efficient implementations. Organizations discovered that building on existing foundations often proved more effective than training large models from scratch.

This approach evolved further with the emergence of Retrieval-Augmented Generation (RAG), which combines smaller language models with vector databases to maintain efficiency while ensuring accuracy.

Today, things have changed again, with many companies investing in dedicated teams focused on developing, implementing, and governing GenAI—creating what's often called an AI Center of Excellence. They vary significantly based on infrastructure capabilities, future scalability requirements, model sizes and deployment strategies. Organizations must carefully consider these elements to avoid operational and possibly legal challenges related to power, cooling, and computing resources, as well as alignment with existing processes and compliance/regulation requirements.

CPU and GPU in modern AI

At the heart of any GenAI implementation are two critical infrastructure components: CPUs and GPUs. Each plays a distinct and complementary role in processing AI workloads. A vast majority of the existing enterprise software and business data sets are already well-tuned for running on the tried-and-true x86 CPU architecture.

In contrast, GPUs contain thousands of simpler cores designed for parallel processing and often contain dedicated on-board memory. Originally developed for graphics rendering, these processors have proven invaluable for AI computations that can be broken down into numerous similar, relatively simple operations executed simultaneously at scale.

This architectural distinction forms the foundation for their different performance, energy consumption and cost characteristics and the related optimal use cases in AI implementations. Performance characteristics vary significantly between these technologies and the use models they are tasked to support.

CPUs handle diverse workloads, making them particularly valuable for general computing tasks and certain AI inference operations—from small and medium model inferencing to decision trees and expert systems.

GPUs, with their parallel processing architecture, deliver superior performance for tasks that benefit from massive parallel computation, such as training large language models and processing complex visual data.

From large to small language models

As noted, organizations initially approached GenAI by implementing large foundational models, often requiring significant infrastructure investments for training and tuning on their data. However, this approach has generally fallen by the wayside for many organizations as more efficient strategies gain increasing popularity.

Pre-trained models emerged as a step toward efficiency, allowing organizations to build on existing foundations rather than starting from scratch. This shift enabled companies to focus their resources on fine-tuning models for specific use cases rather than training large models from the ground up.

The adoption of RAG marked another significant advancement. By combining language models with vector databases, organizations could integrate AI directly with their existing data infrastructure and business processes. While RAG systems require new technical infrastructure, they enable faster, more accurate responses by leveraging specific datasets. The technology continues to evolve, with agentic RAG emerging as a more sophisticated approach that can autonomously interact with multiple data sources and business systems.

Small language models now represent the latest evolution in this journey. While GPUs remain optimal for conventional demanding AI workloads, organizations have discovered that in many use cases, AI-enabled applications can run efficiently on high-performance CPUs without the addition of GPU acceleration. This approach is more attainable and sustainable, working within existing data center power (and budget) constraints while delivering excellent results for specific use cases.

This practical shift toward CPU-based deployment deserves further elaboration: a significant portion of training and tuning can be efficiently handled by CPUs within existing data center environments. This capability to leverage current CPU infrastructure represents a practical and cost-effective approach for organizations looking to advance their AI initiatives without completely overhauling their infrastructure.

Strategic planning and implementation considerations

GenAI requires more planning than traditional IT projects. So, what does it cost to build your AI Center of Excellence to support your organization's AI initiatives?

Cost factors depend on such things as current infrastructure and capabilities, prioritized use cases, geographic location and deployment strategy, model and dataset sizes, and scale-out plans
Right-sizing AI projects is critical to avoid operational challenges related to power, cooling, computing resources and existing compliance/regulation requirements
A comprehensive needs assessment across data scientists, data engineers, end users and line of business leaders and support (e.g., HR, legal, security)
Technical evaluation factors include model size requirements (100K to 100B+ parameters), training approach (self-training vs pre-trained), processing scale and concurrent user loads, latency requirements, infrastructure availability (local vs cloud), data source and destination considerations and model sharing requirements

Governance and risk management

Successful GenAI implementation also requires a thorough approach to governance that encompasses people, processes, silicon, software, data, security, locality and facilities. Similar to enterprise resource planning systems, GenAI touches numerous aspects of an operation and often involves regulatory considerations. Organizations must carefully balance innovation with risk management to ensure their GenAI initiatives don't create unintended consequences.

Critical considerations include:

Data protection - Privacy safeguards, intellectual property protection and usage controls
Regulatory compliance - Hallucination mitigation, bias prevention and anti-discrimination measures
The human element - Bridge data science and technology teams, balance distinct needs and objectives, early stakeholder engagement, infrastructure reliability and performance, and comprehensive governance frameworks

Infrastructure design and optimization

Building an open, scalable system requires careful attention to infrastructure design. The AMD product portfolio and roster of trusted ecosystem partners help organizations right-size their environments by providing technology options ranging from CPUs and GPUs but also data processing units (DPUs) and field-programmable gate arrays (FPGAs) that can efficiently tackle IA and related networking and data processing workloads. This flexibility allows organizations to adapt their infrastructure as needs evolve.

As noted earlier, while GPUs excel at certain workloads, including large-scale training, CPUs continue to play a vital role in the AI infrastructure landscape.

The AMD portfolio for enterprise GenAI

The comprehensive AMD technology portfolio directly addresses the varying needs of GenAI implementations.

The AMD EPYC™ processor family, with up to 192 cores and 6TB DDR5 memory per socket, delivers the balanced architecture needed for AI workloads. While efficiently handling inference for models up to 13 billion parameters, it also supports RAG implementations, machine learning applications and diverse non-GenAI applications. The infrastructure leverages Kubernetes with distributed control planes across CPU and GPU nodes, creating a flexible environment where teams can optimize workload placement based on performance requirements. This cloud-native approach enables seamless integration of AI processing with existing enterprise applications.

For more demanding applications, the AMD Instinct™ MI300 Series marks a breakthrough in GPU design. Its integrated CPU-GPU architecture with unified memory fundamentally changes how language models are processed. With up to 192GB of unified, 3rd generation high bandwidth memory (HBM3) and optimized matrix multiplication engines, the MI300 Series excels at training any size model as well as inference for medium to large models, from LLMs for text creation, Generative Adversarial Networks (GANs) for image, video and music creation, or mixed models capable of both text and image creation. Support for leading AI frameworks ensures seamless integration with existing development ecosystems.

Sustainability and resource optimization

Power consumption presents a growing concern for AI infrastructure. Organizations must carefully evaluate their infrastructure capabilities and adopt balanced solutions to optimize resource usage.

Partner ecosystem and support

Success in GenAI implementation often depends on working with experienced partners who can reduce learning curves and accelerate time to value. Organizations should select partners who understand the GenAI journey and have experience with similar architectural transitions. These partnerships help ensure success while reducing your implementation burden. Opt for vendors committed to innovation, like AMD, which supports open standards and partner enablement for flexible AI infrastructure development.

WWT and AMD help organizations build a future-ready infrastructure

Together, WWT and AMD develop integrated designs for AI workloads, leveraging CPU and GPU capabilities. WWT's AI Proving Ground is essential for testing and validating architectural decisions, optimizing power use, and analyzing cost-performance tradeoffs. It ensures AI implementations meet technical and business goals, maximizing infrastructure investment value.

Looking ahead

As GenAI continues to evolve, the flexibility to leverage both CPU and GPU resources efficiently becomes increasingly valuable.

Industry trends suggest organizations will implement hundreds of small language models for every large language model deployment, underscoring the importance of having diverse processing capabilities and infrastructure flexibility.

The comprehensive portfolio of AMD technologies, combined with WWT's implementation expertise, enables organizations to build scalable, efficient infrastructure that meets both current and future requirements.

By understanding this evolution and implementing a balanced, efficient infrastructure, you can leverage GenAI effectively while maintaining control over costs, power consumption and computational resources.

The future of GenAI lies not in maximizing any single type of computing resource, but in building flexible, efficient infrastructure that can adapt to evolving requirements. Organizations that embrace this balanced approach will be best positioned to leverage GenAI's full potential while maintaining operational efficiency and cost-effectiveness.