Facility Infrastructure Considerations to Ensure Your Data Center is AI Ready

This article was co-authored by Victor Avelar, Chief Research Analyst, Energy Management Research Center at Schneider Electric.

As GenAI continues to evolve and demand more from data centers, it's important to consider the impact on infrastructure. IT leaders need to rethink infrastructure and embrace advanced power distribution technologies to ensure optimal performance, reliability, and energy efficiency for AI deployments.

Most Data Centers are Built for Traditional Loads, Not GenAI

Most data centers are built to handle traditional loads which require less power. The addition of GenAI results in such things as significant increases in power density from 10-12 kW to upwards of 100 kW per rack, necessitating new cooling and power solutions beyond traditional methods. Furthermore, as liquid cooling becomes essential for handling AI training cluster workloads, the need for higher voltage power distribution emerges as critical.

Organizations deploying AI workloads in their on-premises data centers often overlook the need for additional power and cooling requirements primarily due to silos. The worlds of IT and facilities are typically built with different business goals in mind. IT structures are naturally developed to manage information flow, while facilities are part of the physical real estate. This siloed structure between IT and facilities oftentimes results in limited visibility and communication between the two departments and lead to production outages and downtime.

However, if you think through your facilities infrastructure needs early in the process of your AI strategy you can avoid these setbacks as you deploy AI workloads.

When deploying AI workloads on-premises without considering the necessary facilities, power, and cooling requirements, organizations can face significant challenges. IT infrastructure, which consumes between 2 to 5 percent of an organization's gross revenue, is often built with the assumption that support services such as power, cooling, cabling, and space are adequately prepared. This assumption can lead to serious setbacks, including an inability to deploy AI projects immediately as well as increased risk of service disruptions and potential outages, very possibly undermining the competitive advantage of GenAI.

Ready to transform your data center for AI success? Get started today

Data Centers are Struggling to Pack More Power and Remove Additional Heat

Data centers are increasingly facing difficulties in accommodating additional power and heat demand as they scale up operations.

Our enterprise clients are encountering significant challenges in securing sufficient power due to limitations from utility companies, which often do not have enough capacity available. This issue is exacerbated by the substantial lead times required for infrastructure upgrades, which can range from one to three years. The rapid ramp-up of AI deployment in enterprise organizations further strains existing capacities, leading to unpredictable environments where downtime is costly. In sum, power and heat constraints are causing insufficient deployments and operational challenges in data centers.

Increased Power Needs for GenAI Workloads

The power requirements for AI workloads have escalated dramatically, signaling a pivotal shift in the industry's energy needs.

Two decades ago, loads connected to a typical Rack Power Distribution Unit (PDU) demanded between 2 to 4 kW, which later evolved to 10 to 12 kW in recent years. And now with the advent of GenAI, this figure has surged to 20 kW to 100 kW per rack—representing a stark contrast to the gradual increases of the last 20 years. For example, NVIDIA's latest design references a staggering 130 kW per rack. This surge necessitates not only taller, wider, and heavier racks but also a higher number of power strips—up to six per rack.

To enhance efficiency, higher voltages such as 415 volts (3-phase) are now directed to racks, with some setups potentially moving to 480 volts, which are requiring new power supplies and adaptations by OEMs. The largest 415 V "off the shelf" power strip currently supports 60 amps at 34 kW, with some OEMs offering 100 amps capable of delivering 57 kW. Typically, these configurations are deployed in sets of two to support two servers per rack, but with increased power demands, even more is needed. Furthermore, this shift toward higher voltages necessitates additional transformations within the data center, such as the installation of transformers to step down the voltage and modifications to infrastructure software management for efficiency amid soaring power needs.

The Cooling Demands

The cooling demands within the industry are rapidly evolving as well, necessitating a flexible and precise approach that combines traditional air cooling with more advanced fluid cooling techniques. Standard solutions such as dry coolers, condensers, and packaged chillers are being utilized to effectively expel heat outdoors.

Despite the shift towards liquid cooling for AI workloads, 10 to 20 percent of the loads will still require air cooling. Room cooling systems can handle 15 to 20 kW, whereas more predictable in-row setups can manage 30-50 kW, and rear door cooling systems range from 30 to 80 kW. For setups requiring 30 to 40 kW, a combination of raised floors with in-row air conditioners and rear-door technologies can meet demands. However, for the more intensive 100 to 130 kW racks that companies such as NVIDIA are developing, a new, better approach is essential. Liquid cooling, which involves circulating cold liquids directly through equipment to cool it efficiently without the spacing required by in-row coolers, becomes a necessary upgrade to meet these higher thermal loads.

Want to know more about the Schneider Electric, NVIDIA and WWT partnership?Learn more

Five Essential Tips for AI Planning and Integration

When embarking on the integration of AI into your operations, it's important to approach the process with a strategic mindset. These quick tips are designed to guide you through the complexities of AI deployment, ensuring that your technological advancements align with both immediate and long-term objectives. From engaging IT and facilities, to considering energy demands and sustainability, these insights will help you navigate the evolving landscape of AI technology efficiently.

Quick Tip #1: Be aware of your long-term goals as well as your immediate goals.

Always keep an eye on the horizon. When planning, it's imperative to address not only your immediate needs but also potential future requirements. As you consider your growth strategy, remember that future needs might alter your trajectory. Ensuring you have a long runway for growth will help you adapt to changes as they arise.

Quick Tip #2: Engage IT first and facilities afterward.

When contemplating the integration of AI into your operations, start by clarifying your goals and desired outcomes to IT leaders as a first step with questions such as:

What specific functions do you need AI to perform?
What objectives are you aiming to achieve through AI implementation?

Next, evaluate where your AI operations should physically take place:

Consider using colocation centers to rent compute resources for training large language models (LLMs), which might be more cost-effective than on-site operations.
Alternatively, you might rent space and utilize AI tools provided by the host or purchase a pre-trained model and fine-tune it with your data—note that training models from scratch is typically viable only for large enterprise organizations.

These decisions will determine the amount and density of compute resources required on-site initiatives. For example:

Using the NVIDIA Grace Hopper ™ Superchip as an example, which requires 10 kW per server, you could fit up to six servers in a rack, amounting to 60 kW. This is a significant increase from the typical 10-14 kW per rack, and future capacities are expected to rise even further.

Once you have a solid plan, your facilities management can perform its critical role:

Facilities workshops provide an educational walkthrough of different power and cooling technologies, outlining their advantages and limitations to help you make informed decisions.
AI assessments of existing sites can optimize your current resources to fully leverage your facilities in line with your AI requirements.

Facilities engagements are important for designing and implementing the necessary infrastructure modifications to support your future AI needs effectively.

Quick Tip #3: Stay informed of impending changes.

Many OEMs and vendors have new technologies and tools coming out which can change a lot of AI dynamics. Stay close to the sources of truth and be aware of implications of new technologies. Partners such as Schneider and WWT, industry blogs, thought leadership articles, and AI-focused events are invaluable resources that will keep you abreast of what's on the horizon but also what's working now.

Quick Tip #4: Consider energy conservation and sustainability.

With the increased focus on sustainable computing and energy efficiency in past years, businesses have made great strides. Yet, AI is projected to drive energy demands to an all new high. Be sure to consider things such as increased use of infrastructure software management to drive efficiency as AI will need more power than previously imagined.

Quick Tip #5: Work with trusted partners with vast resources and world-class expertise and toolsets.

Schneider Electric, a WWT facilities infrastructure (FIT) partner, helps ensure our clients' data centers are ready to handle AI workloads. Put another way, they make sure facilities have the necessary power and cooling to support your AI workloads.

WWT's value-add includes a dedicated team of AI experts with extensive industry knowledge and hands-on experience in design and deployment, focusing on the physical layer that underpins the IT stack.

Our people hold the highest levels of certifications and specializations in both AI and facilities technologies. They are equipped to help clients plan for a high-density dynamic environment, ensuring your UPS and cooling infrastructure are robust for years to come.

WWT also provides expert configuration and installation services for enterprise organizations. Within our Advanced Technology Center (ATC), WWT offers numerous facilities environments that demonstrate the value of our solutions and explore cutting-edge technology potentials.

What's more, WWT's AI Proving Ground allows you to test AI technologies before you buy, enabling comparisons and evaluations of different power and cooling options. Additionally, we facilitate the integration of your technology at the WWT North American Integration Center (NAIC), saving you time, effort, and money.

Explore and Engage with WWT's Workshops and Labs for AI Excellence

Discover the full potential of AI at the WWT AI Proving Ground. Accelerate your AI strategy by learning from others' experiences and witnessing real-world applications in action. Two good first steps are signing up for a facilities workshop or an AI assessment.