The AI Factory Revolution: Engineering Away Resource Contention with F5, NVIDIA & WWT

Imagine a busy commercial kitchen. Every chef needs access to the same equipment---ovens are running at capacity, prep stations are packed, and limited storage space is in high demand. When these shared resources become overwhelmed, the entire kitchen's efficiency grinds to a halt.

The same challenge exists with AI environments. AI applications fundamentally differ from traditional software in how they consume resources. During model training, they process "elephant flows"—massive volumes of data that must move efficiently through systems. For real-time AI interactions, applications need consistent low-latency responses, especially when retrieving additional information. And the list goes on.

As organizations build AI factories—specialized environments to develop and deploy AI capabilities—they face this resource crunch on a larger scale. This is particularly challenging when multiple teams or customers need to share the same infrastructure, creating multi-tenancy issues.

Without specialized solutions that can secure and optimize these shared resources, organizations end up dedicating separate systems to each project or customer—a tremendously inefficient approach that wastes expensive computing resources.

The AI Factory Framework

AI factories are being built to develop, train, deploy, and operate AI at scale. Unlike heritage, or traditional data centers, AI factories are purpose-built to handle the unique workload characteristics of AI applications. AI factories offer a powerful framework for AI infrastructure if implemented effectively.

AI workloads fundamentally differ from traditional applications in their resource consumption patterns. As you make strategic investment decisions, understand that successful AI factories require five architectural elements:

1. Training Pods

These GPU-powered compute environments process massive datasets during model development:

High-density GPU clusters deliver the raw computing power needed for complex models
Specialized cooling and power infrastructure supports these compute-intensive environments
Investment here directly impacts how quickly your teams can develop and improve AI models

2. Inference Pods

These environments deliver AI capabilities to your users and customers:

Optimized for consistent, low-latency responses rather than raw processing power
Directly impacts user experience and customer satisfaction
Often requires different hardware configurations than training environments

3. Storage Infrastructure

AI's massive data requirements demand purpose-built storage solutions:

High-throughput systems capable of feeding data to compute resources without bottlenecks
Tiered approaches balancing performance and cost-effectiveness
Often the most overlooked component, yet frequently the first bottleneck in AI initiatives

4. Network Fabric

The connectivity layer must handle AI's unique data movement patterns:

Supports massive elephant flows (gigantic data volumes that must be efficiently handled to maintain model training performance) during training without disrupting other business traffic
Enables efficient scaling of both training and inference capabilities
Legacy network infrastructures typically cannot meet these specialized requirements

5. Data Pipelines

Data pipelines are the connective tissue that makes an AI factory operate as a cohesive system rather than disconnected components.

Orchestrates the flow of data from acquisition through processing to model training and inference
Ensures data quality, transformation, and feature engineering at scale
Manages versioning of both data and models to maintain reproducibility
Automates the cycle from development to deployment, creating a continuous improvement loop

Four Key Technical Requirements for AI Factories

Organizations building AI factories face four overarching technical requirements, such as:

1. Multi-tenancy and tenant isolation: supporting multiple teams, applications, or customers on shared infrastructure while maintaining security and performance isolation.

Many AI factories are being built to host GPU-as-a-service or AI-as-a-service offerings. Large enterprises want to host multiple internal users on the same AI infrastructure. In practice, some organizations have resorted to dedicating individual clusters to each customer or team---an approach that significantly reduces infrastructure efficiency and increases costs.

The ability to support multi-tenancy while ensuring tenant isolation has become a requirement for cost-effective AI infrastructure. Without proper isolation, organizations are forced to over-provision resources, resulting in uneven utilization (some clusters sitting idle while others are oversubscribed), duplicated management overhead, and significantly increased costs. This inefficiency is particularly acute with expensive GPU resources, where dedicated clusters can lead to utilization rates far below optimal levels.

Shared infrastructure offers compelling benefits beyond just cost savings, including higher resource utilization rates, more flexible allocation of GPU resources across projects, and reduced time-to-access for teams needing computational resources. However, these benefits cannot be realized without robust security mechanisms that provide tenant isolation, API security, and comprehensive observability. To this point, many organizations are still looking at where the data is rather than how important securing the API is in this environment.

2. Observability: when running multiple customers on shared infrastructure, comprehensive monitoring becomes essential. Organizations need robust observability capabilities that provide both tenant-specific insights and cross-tenant visibility.

Key observability requirements for AI factories include:

Real-time monitoring of GPU utilization, memory consumption, and I/O patterns across tenants
End-to-end visibility into data pipelines that span storage, network, and compute resources
The ability to collect metrics, maintain proper logging, and provide effective troubleshooting and debugging capabilities
Custom dashboards that present both tenant-specific views and cross-tenant comparisons

The complexity increases when supporting multiple tenants, as each tenant's performance needs to be monitored independently while maintaining a holistic view of the entire infrastructure.

3. API Security: the increased use of APIs in AI applications creates substantial new attack surfaces requiring protection. When discussing application security in the AI context, we must include API security as a fundamental component, as most modern applications communicate with each other and retrieve Retrieval-Augmented Generation (RAG) data through APIs. These interfaces have become the renewed frontline of security attacks in AI environments.

A particular challenge emerges when organizations integrate legacy systems into modern AI workflows. Many enterprises are attempting to extract value from data trapped in organizational silos, but these heritage applications typically lack the security features found in newer systems. This creates significant vulnerabilities when these systems are connected to AI environments without proper security controls and API protection—an essential consideration as organizations transition from proof-of-concept (POC) to production AI deployments.

4. Resource Optimization: AI applications fundamentally differ from traditional workloads in their resource consumption patterns and data movement requirements. During model training, AI systems process elephant flows—massive data volumes that must be efficiently handled to maintain model training performance. For inference operations, consistently low latency becomes critical for the user experience, especially when applications need to retrieve additional data for RAG.

The infrastructure supporting these workloads demands specialized optimization techniques, with DPU offloading emerging as a critical strategy. Without DPUs or without utilizing their full capabilities, networking and security functions would consume valuable host CPU resources.

This DPU offloading approach addresses several core challenges in AI factory environments by optimizing workload distribution across specialized processors, improving resource utilization, and maximizing return on infrastructure investment.

The cost savings for customers building larger AI factories are significant because they can dedicate their CPUs to revenue-generating applications instead of overhead functions. The technology builds on proven platforms already operating in demanding environments, making it a reliable solution for organizations deploying the thousands of GPUs needed for modern AI workloads, particularly in sovereign AI deployments where efficient resource utilization is paramount.

A Multi-faceted Solution: F5, NVIDIA, and WWT

Addressing the infrastructure challenges of AI factories requires a multi-faceted solution combining hardware innovation, specialized software, and deep implementation expertise. This is where the partnership between F5, NVIDIA, and WWT creates transformational value.

The solution combines:

F5: Application Delivery and Security Expertise

F5 has decades of experience in application delivery and security to the AI domain. The BIG-IP Next for Kubernetes solution delivers high-performance traffic management and security specifically designed for AI workloads, including:

Traffic management optimized for AI data flows
Multi-tenancy support for hosting multiple customers on shared infrastructure
Comprehensive security controls to protect APIs and applications
Tenant isolation to ensure separate customers are secured and their data is protected

F5 is bringing its heritage of application delivery and security to this new AI infrastructure paradigm. Most modern applications rely heavily on APIs for communication with other systems and for pulling data for RAG purposes, making API security a critical component of any comprehensive AI security strategy.

BIG-IP Next for Kubernetes demonstrates proven product maturity through its established deployment history:

Already deployed in major telecommunications networks since 2022.
Currently running in a tier-one North American 5G network.
Serving nearly 50 million subscribers in production environments.

This production heritage provides confidence that the solution can meet mission-critical requirements for securing AI infrastructure.

NVIDIA: Revolutionary Hardware Acceleration

NVIDIA contributes its NVIDIA BlueField®-3 Data Processing Units (DPUs), which fundamentally transform infrastructure efficiency through hardware acceleration and CPU offloading.

The DPU architecture represents a fundamental shift in how infrastructure resources are allocated, moving appropriate functions to specialized hardware to optimize overall system performance.

A pivotal solution to the resource contention challenge leverages F5's BIG-IP Next for Kubernetes running on NVIDIA BlueField-3 DPUs. This innovative approach offloads networking and security functions from host CPUs, which accomplishes two critical objectives simultaneously: it frees up valuable computing resources for AI workloads while enabling secure multi-tenancy. This architecture allows organizations to maximize their AI investments without compromising on security or performance.

WWT: Implementation Expertise and Integration

WWT completes the solution with comprehensive consulting and implementation expertise, helping organizations navigate the complex journey from AI experimentation to production-grade infrastructure.

WWT has been working closely with F5 and NVIDIA for over a decade, with deep knowledge of the technology and operational challenges involved in transitioning from POC to production environments.

WWT serves a broad range of clients, including the hyper-scalers, large governments (state and local), large service providers, and almost all of the Fortune 100, representing a large swath of the biggest enterprises in the world.

The Technical Side: Powering AI Factories with Proven Infrastructure

Scalable Application Delivery and Security

BIG-IP Next for Kubernetes provides the scalable foundation needed to support the unique demands of AI workloads, which involve moving enormous amounts of data during model training phases while simultaneously securing the APIs that form the backbone of modern AI applications. Most AI systems rely heavily on these APIs for system communication and for retrieving data for RAG purposes, making API security a critical component of any comprehensive AI security strategy.

BIG-IP Next's traffic management capabilities—a core strength of F5 since the late 1990s—are particularly well-suited for AI factories, where managing high-speed, high-volume workloads efficiently is essential. As organizations move from POC to production this scalable foundation becomes increasingly important, enabling the transition from experimental AI projects to enterprise-grade AI factories capable of supporting multiple business units or customers with the performance, reliability, and security needed for mission-critical deployments.

Native Kubernetes Integration for Multi-Tenant AI

The solution's native integration with Kubernetes addresses one of the most pressing challenges for organizations building AI factories: multi-tenancy and tenant isolation. Many organizations have attempted to solve this by dedicating individual clusters per customer, which creates significant inefficiencies. BIG-IP Next enables organizations to securely host multiple customers or business units on shared AI infrastructure while maintaining strict isolation between tenants. This capability is critical for both large enterprises supporting various internal business units and service providers offering GPU-as-a-Service or AI-as-a-Service.

Key Technical Benefits

The combination of BIG-IP Next running on NVIDIA BlueField-3 DPUs delivers several significant benefits for AI infrastructure:

CPU Offload: By running networking and security functions on the DPU's ARM cores rather than host CPUs, the solution frees up valuable compute resources. This offloading allows CPUs to focus on revenue-generating applications, translating to direct cost savings for organizations building large AI factories.

Power Consumption Reduction: The DPU architecture enables more efficient power utilization across the infrastructure stack, an increasingly important consideration as AI workloads consume significant energy resources.

Increased Workload Density: The CPU offload enables higher densities of AI workloads per server, maximizing the utilization of expensive GPU resources and improving overall infrastructure efficiency.

ROI Maximization: For organizations making substantial investments in AI infrastructure, these efficiency gains translate directly to improved return on investment, allowing more AI workloads to run on the same physical infrastructure.

Enhanced Observability for Complex AI Environments: Another critical innovation is the solution's observability capabilities. When running multiple customers on shared infrastructure, collecting metrics, maintaining proper logging, and providing troubleshooting capabilities become essential functions. This visibility is crucial when managing complex, multi-tenant AI environments where performance issues can directly impact business outcomes.

By bringing together production-proven technology, native Kubernetes integration, and advanced technical capabilities, F5's BIG-IP Next for Kubernetes enables organizations to build secure, efficient AI factories that can support the next generation of AI workloads.

For service providers building sovereign AI platforms (particularly in Asia Pacific and Europe), these multi-tenancy capabilities are essential for delivering secure, compliant AI services at scale. Similarly, large enterprises building internal AI Factories need these capabilities to support multiple business units on shared infrastructure.

Supporting the Full AI Application Lifecycle

While much attention in AI infrastructure focuses on model training, organizations increasingly recognize that successful AI implementation requires supporting the entire application lifecycle. The F5-NVIDIA-WWT solution addresses this broader perspective by optimizing each phase:

1. Data Ingestion: AI models require massive datasets for training and operation. BIG-IP Next for Kubernetes ensures efficient movement of data into AI environments, managing traffic flows to prevent bottlenecks and ensure consistent performance.

2. Model Training: During the computationally intensive training phase, the solution maximizes infrastructure utilization by offloading network functions to DPUs, leaving more GPU and CPU resources available for the training process itself.

3. Inference: When models are deployed to production, consistent low-latency responses become critical. The solution provides traffic management capabilities that ensure optimal performance for inference operations.

4. RAG: Many modern AI applications supplement model outputs with information retrieved from external sources. BIG-IP Next for Kubernetes secures and optimizes the API calls needed for these operations.

This comprehensive approach ensures that organizations can build AI infrastructure that supports their needs across the entire application lifecycle, not just during the initial development and training phases.

Multi-Tenancy and Security

BIG-IP Next for Kubernetes enables organizations to securely support multiple users or customers on shared AI infrastructure with several key benefits:

Shared infrastructure benefits:

Cost efficiency: Reduced capital and operational expenses through consolidated resources
Resource allocation: Intelligent distribution of infrastructure resources based on tenant needs
Tenant isolation: Complete separation of workloads, data, and traffic between tenants
Administrative visibility: Comprehensive monitoring and management across all tenants

Security capabilities:

API threat protection: Detection and prevention of malicious activity targeting tenant APIs
Web vulnerability defense: Protection for tenant-specific web applications and services
Firewall and DDoS mitigation: Tenant-aware traffic filtering and attack prevention
Efficient encryption: Secure communications for each tenant without performance degradation

This multi-tenant architecture eliminates the inefficient practice of dedicating separate clusters to each customer or business unit, maximizing the return on AI infrastructure investments while maintaining strict security boundaries.

WWT's Strategic Role: From POC to Production

WWT's ability to bridge technological capabilities with operational realities helps clients navigate the challenging middle ground between promising POCs and successful production deployment. WWT's strategic role in AI implementations focuses mainly on:

Proof-to-Production Transition

Providing operational expertise that many organizations lack when moving beyond experimentation.
Supporting clients during the critical transition phase where many AI initiatives typically falter.
Leveraging decades of infrastructure implementation experience to navigate complex production deployments.
Delivering specialized infrastructure optimization, security implementation, and performance monitoring capabilities essential for moving AI from controlled test environments to complex production landscapes.
Providing comprehensive risk assessment, change management procedures, and specialized training for IT teams managing production AI systems

The WWT AI Proving Ground

Offering pre-deployment validation capabilities to ensure solutions work as expected.
Creating cost-effective testing environments before full production investment.
Facilitating multi-vendor integration to build comprehensive solutions

A centerpiece of WWT's approach is its AI Proving Ground, a specialized environment where organizations can validate their AI infrastructure designs before committing to production deployment.

The Proving Ground is a place of validation and integration testing for WWT customers.

This validation capability is particularly important as multiple vendors develop solutions for NVIDIA DPUs. There's now a race for DPU real estate, with many vendors developing solutions and creating overlapping use cases. Organizations need guidance to determine which security elements to deploy on their DPUs for maximum benefit.

Take the Next Step in Your AI Infrastructure Journey

Ready to address the resource contention challenges in your AI infrastructure? Connect with WWT to explore how BIG-IP Next for Kubernetes can transform your AI factory. Our experts will help you evaluate your current environment, identify optimization opportunities, and build a roadmap for secure, efficient AI operations.

Visit the WWT AI Proving Ground to experience the solution firsthand and validate it against your specific requirements. Contact us today to begin optimizing your AI infrastructure for the challenges of tomorrow.

Learn more about AI Solutions and F5 Connect with a WWT expert

About the Authors

Jason Cook, Chief Information Security Officer in Residence at WWT

Todd Hathaway, Global Head of AI Security & Cyber Innovation at WWT

Rich Lopez, Senior Strategic Architect at F5