WWT had a very significant presence at the annual GTC conference in San Jose this year, with a booth near the main stage and over one hundred employees attending. During the keynote address, high-performance storage was specifically mentioned as one of the new and upcoming capabilities to help usher in the next generation of AI infrastructure. Below is a summary of the announcements made at GTC by the leading vendors providing high-performance storage solutions for AI and HPC use cases.

Pure Storage – FlashBlade//EXA

Prior to the conference, Pure Storage unveiled the FlashBlade//EXA, which is a significant upgrade to the AI-focused capabilities of their FlashBlade//S platform. The new disaggregated architecture is projected to deliver over 10 terabytes per second read performance in a single namespace. The FlashBlade//EXA's parallel design allows for independent scaling of data and metadata, utilizing off-the-shelf data nodes and standard networking protocols to simplify deployment and management while maximizing performance for the most demanding AI environments. The read-and-write throughput numbers given by Pure puts the //EXA solution at the front of the pack for high-performance storage providers.

Pure Storage has always had industry-leading Flash storage performance and can now scale as well as parallel file system competitors for larger multi-SU AI deployments. The FlashBlade//EXA addresses this by using a purpose-built solution to handle the nature of modern AI workloads, which involve processing diverse data types like text, images, and videos simultaneously. The //EXA solution will be able to provide the throughput needed for large-scale AI deployments in a much smaller footprint, which will reduce both power/cooling requirements and management complexity.

VAST Data InsightEngine

VAST Data unveiled its new InsightEngine offering, a turnkey solution that includes infrastructure components, real-time data ingestion and orchestration into a unified platform. By integrating reference design architecture and utilizing the AI-Q NVIDIA Blueprint and NVIDIA NeMo Retriever and NVIDIA NIM™ microservices, VAST's InsightEngine allows businesses to extract AI-driven insights from their data rapidly. The integrated approach streamlines AI adoption, making it easier for organizations to scale their AI initiatives while minimizing the complexities typically associated with such deployments.

InsightEngine's key benefits include real-time data processing, scalable AI performance, and high-speed data access. Powered by VAST's DASE architecture, the solution is designed to simplify deployment and empower enterprises to leverage their data for advanced AI reasoning workloads. 

WEKA – Augmented Memory Grid

WEKA's AMG announcement focuses on streamlining inference for AI use cases. The surge in AI inference and agentic systems demands more efficient token management. The traditional approach requires repeatedly recomputing tokens, which wastes GPU cycles and memory and increases the overall cost of inference. WEKA's Augmented Memory Grid addresses these issues by creating a "token warehouse" - a persistent, NVMe-backed storage system that extends GPU memory and enables near-memory speed token retrieval. This approach transforms tokenized data from transient artifacts to durable, reusable assets, which in turn drastically reduces latency and results in better resource utilization.

WEKA's storage solution allows for persistent token storage and retrieval at microsecond latencies, facilitating real-time access for both inference and training and eliminating the need for redundant computations. By rearchitecting AI infrastructure for token-level performance and reuse, WEKA's AMG enables organizations to operate ultra-efficient token warehouses, crucial for scaling AI applications and achieving a competitive advantage in the rapidly evolving landscape of AI economics.

IBM – Content-Aware Storage

IBM announced a new type of real-time Retrieval-Augmented Generation (RAG) driven by the storage platform. Traditional RAG deployments must be updated on a regular basis to keep the content relevant, which means AI-driven results are out of data and unable to provide insight in real time. IBM's Content Aware Storage (or CAS) directly addresses the challenge of AI chatbots delivering incomplete responses when processing unstructured enterprise data. By integrating advanced natural language processing within IBM's Storage Scale platform, the system actively ingests diverse data formats like documents, presentations, and multimedia as they are written to the data store. IBM CAS goes beyond simple keyword matching, empowering AI tools with a deeper understanding of context and significantly improving the accuracy and relevance of their responses. This capability effectively bridges the gap between raw data and actionable AI insights.

Content-aware storage revolutionizes AI data pipelines by automating and accelerating Retrieval-Augmented Generation (RAG) processes. Leveraging innovations from IBM Research and the NVIDIA AI Blueprint for RAG with NeMo™ Retriever, Storage Scale enables real-time updates to AI models as data evolves. This streamlined approach minimizes data movement and latency, leading to faster insights, reduced costs, and enhanced performance. By embedding compute, data pipelines and vector database capabilities directly within the CAS solution, businesses can ensure AI assistants deliver precise, up-to-date information and data-driven actions across their organization.

Summary

That's a wrap on GTC 2025. Storage vendors focused on AI are working hard to innovate and stand out in an increasingly crowded AI space. It is very exciting to see high-performance storage being more prominently included in AI architecture discussions. WWT is working to build out demos for all of these capabilities in our AI Proving Ground in the coming months. Stay tuned!

Technologies