Partner POV | 3 Reasons Why the Right Infrastructure Support is Essential for AI
Article written by Jamie Thomas, General Manager for Strategy & Development, IBM.
Adopting Generative AI (gen AI) is no longer a matter of future speculation. With the vast potential it offers, companies are already maximizing its use to streamline operations, boost productivity, and pass these benefits on to their clients.
This transformation comes with new challenges. As clients begin implementing AI on premises, the first step is to evaluate whether their data centers are ready: upgrading the IT infrastructure involves adequate power and cooling, preparing the network to handle large data volumes, optimizing and expanding infrastructure capacity, and implementing protection measures while enabling scalability. According to a report by the IBM Institute for Business Value (IBM IBV), in collaboration with Oxford Economics, which surveyed 2,500 leaders across 34 countries and 26 industries, 43% of C-level technology executives say their concerns about their technology infrastructure have increased over the past six months because of gen AI, and they are now focused on upgrading it for scaling the technology.
Organizations must have an implementation strategy that helps ensure efficient operations, minimal downtime and prompt responses to IT requirements, while addressing regulatory compliance, ethical considerations and security threats. Having a key partner with in-house AI expertise and the ability to manage the full lifecycle of this underlying infrastructure is important to use the benefits of such a technological evolution.
IBM Technology Lifecycle Services (TLS) offers a comprehensive suite of solutions for infrastructure support and services from deployment to decommissioning, helping organizations optimize their IT infrastructure with availability and resiliency. IBM TLS assists in the upgrade of data centers to be AI-ready, using a global supply chain and logistics framework to meet the demands of high-intensity AI workloads for IBM products and various Original Equipment Manufacturers (OEMs), at scale. Here are some of the main challenges data centers can face when running AI workloads, along with ways IBM TLS addresses them:
1. Managing a complex AI infrastructure stack with multiple vendor technologies
Today's data centers have become more complex due to the adoption of AI and reliance on technologies from multiple vendors. According to the report "Navigating the Evolving AI Infrastructure Landscape" from TechTarget Enterprise Strategy Group, 30% of organizations expect to deploy AI in hybrid cloud environments, which underscores the need to have a modernized infrastructure and effective connectivity.
Maintaining operational resiliency demands up-to-date infrastructure and proactive risk management, but overseeing various contracts and troubleshooting issues can be difficult and costly for the IT internal staff. IBM TLS enhances clients' existing capabilities not only by deploying and supporting IBM products (IBM Z, Power and Storage), but also by integrating new, AI-compatible multi-vendor technologies.
Large language models require significant resources and multiple computers operating in parallel within large network cluster configurations. As the backbone of the infrastructure, this network must support high-bandwidth, low-latency and scalable architectures, with specific optimizations for GPU communication, storage access and distributed AI tasks. The IDC "2023 AI View" report notes that the network was the largest infrastructure spending item for gen AI training, accounting for 44%. By offering an integrated, holistic approach focused on resiliency and availability, with specialized teams across the globe and strategic partnerships, IBM TLS acts as a one-stop shop for clients and as an advisor to procure, plan, deploy, support, optimize and refresh data centers' infrastructure (servers, network, storage and software), facilitating a smooth transition to AI-ready environments.
If AI brings increasingly complex hurdles to data centers, addressing these issues might also benefit from the use of AI itself. At the forefront of this shift, IBM TLS integrates AI into tools and processes to empower agents and enhance the customer experience. For a more detailed look at how IBM TLS uses AI, read what Bina Hallman, Vice President at TLS Support Services for IBM Infrastructure, has to say.
2. Improving resiliency and protecting data
Gen AI systems, which rely on complex components like GPUs, network and storage, can face higher failure rates due to intense workloads, and the vast amounts of data being processed and shared might also increase vulnerability. Unplanned downtime and potential data breaches are costly for businesses, but proactive support speeds up problem resolution and anticipates issues before they happen.
IBM IBV survey "The CEO's guide to generative AI: Platforms, data, and governance" reveals that most of them say concerns about data lineage and provenance (61%) and data security (57%) will be a barrier to adopting gen AI. To tackle these challenges, IBM TLS offers solutions like IBM Support Insights, which manages an inventory of over 3,000 clients and 3.5 million IT assets, identifying and alerting over 1.5 million active security vulnerabilities with recommendations for resolution. This approach helps to maintain AI infrastructure integrity, mitigate outages and support issues from expired contracts. Also, IBM TLS assists clients with erasing data from legacy assets and provides media destruction services, helping ensure the sanitization complies with the U.S. National Institute of Standards and Technology (NIST) Guidelines for Media Sanitization.
IBM TLS offers premium support tiers in Expert Care for IBM products and Multivendor Enterprise Care for some non-IBM products, which feature quick repair times for critical issues and provide a dedicated Technical Account Manager (TAM) for the clients. The TAM is a Subject Matter Expert (SME) who reviews the entire IT environment, serves as a single point of contact and focuses on proactive measures and problem resolution to enhance operational efficiency for the business.
3. Advising on power consumption and carbon emissions
The growing energy demands of data centers, resulting from increased AI integration, might lead to higher operational expenses from power consumption and carbon emissions, hampering sustainability goals. As reported by the International Energy Agency (IEA) in January, global data center electricity consumption could rise to over 1,000 TWh in 2026, up from an estimated 460 TWh in 2022. The adoption of AI must not overlook sustainability targets, and the IBM TLS portfolio helps clients make informed decisions by evaluating workload demands and infrastructure utilization, as well as monitoring power consumption and carbon footprint. IBM IT Sustainability Optimization Assessment uses IBM Turbonomic software, which runs selected "what if" planning scenarios to understand data center optimization possibilities and impacts. Following the assessment, clients receive a detailed report with recommended actions, estimated cost reductions, projected energy consumption and improvements in carbon footprint, helping them align their AI initiatives with sustainability objectives.
As new obstacles arise, being well-prepared, anticipating potential issues and partnering with a trusted and experienced IT support and services partner can impact the success of AI adoption and ongoing maintenance. For decades, IBM has followed core principles that support a complete AI solution stack with multiple vendor technologies. No matter where clients are on their journey, IBM is positioned to harness its expertise to help organizations with infrastructure for AI opportunities, customized product offerings, extensive consulting, technology lifecycle services and collaboration with our expansive partner ecosystem.