Accelerate your AI Journey with NVIDIA NIM

Since the introduction of ChatGPT in 2022 to the rapid development of curated and domain-specific AI models today, more and more enterprises are looking for ways to leverage generative AI, whether that be to enhance existing applications or to bring new capabilities to market. Demand for generative AI spans across industries and verticals; no single domain has been unaffected by its impact on the technology market, and new use cases are discovered by the day.

However, many organizations struggle with the ever-present question and challenge of how to put these models into production. NVIDIA aims to address that challenge with the introduction of NVIDIA Inference Microservices.

AI deployment options before NIM

While the excitement around AI and Large Language Models (LLMs) is palpable, putting it into use for practical applications is not so simple. Before diving into NIM itself, it might be helpful to take a step back and reflect on the past. Prior to the advent of NIM, enterprises had two paths for putting AI to work with their applications.

Managed AI services

A quick and easy way to get started would be to leverage a managed service which hosts various AI capabilities. This is essentially a website-driven service where an API is utilized to interact with an AI model, hosted by a service provider of some kind. From there, you simply integrate the API calls into your application.

While this is an easy-to-use method for getting off the ground quickly, it lacks a semblance of control that many enterprises may need. Because it is a managed service, you have no influence over the infrastructure the AI workloads are running on. Similarly, privacy can become a concern as you must feed your prompts and associated data to this external service. With this path, you are completely at the mercy of the service provider.

Open-source deployment

The other end of the deployment spectrum is the do-it-yourself approach, which involves leveraging open-source technology to deploy your own AI infrastructure. Key benefits of this approach are that you can deploy it anywhere, and you are provided much greater control over your data since the solution is self-hosted.

While you gain more control and privacy with this deployment model, it comes at great engineering cost in that you now must build most of the infrastructure from scratch and perform model development and training yourself. You must also consider the overhead of ongoing testing, validation, updates, and other maintenance. All of these tasks require expertise in areas many organizations do not have or cannot afford to procure, and doing so would likely challenge the core competencies of the enterprise itself.

What is NVIDIA NIM and how does it help?

At its core, NIM is a pre-built Docker container and Helm chart that comes with everything that is needed to quickly and easily deploy generative AI models at scale while obtaining the best possible performance on GPU-enabled infrastructure. NIM is rigorously benchmarked and validated across different NVIDIA hardware platforms, cloud service providers, and Kubernetes distributions, ensuring it is ready for enterprise production environments. Additionally, NIM provides the peace of mind that there is no need to share prompts or other data with external third-party services.

Essentially, NIM is designed to bridge the gap between the complex world of AI development and the operational needs of enterprise environments. The result for businesses is that they get the best aspects from both of the deployment approaches discussed previously, plus the benefits listed below.

Decreased time-to-market

Businesses around the world have a desire to rapidly integrate AI offerings into their products and services, but often lack the in-house expertise to do so. NIM removes the complex burden of AI model development and containerization by providing an efficient and cost-effective package that is easy to deploy and scale on your infrastructure. Additionally, the software stack and models are tuned and optimized for the best possible performance. This translates to lower cost of ownership and great user experience due to low latency for responses to prompts. This ease of use saves a tremendous amount of time, enabling enterprises to shorten their time-to-market and time-to-value.

Deploy anywhere

The fact that NIM utilizes a containerized architecture means it can be deployed and run anywhere you can run a container - from a laptop running Docker to an enterprise-grade Kubernetes cluster running in your on-premises data center or in the public cloud. That portability also enables model development on high-performance infrastructure, including NVIDIA DGX™️, NVIDIA DGX™️ Cloud, NVIDIA-Certified Systems™️, and NVIDIA RTX™️ workstations. This flexibility of choice means NIM can be utilized in each step of your software pipeline, from development to testing to production.

Leverage the power of Kubernetes

Many organizations have application modernization efforts which have included the adoption of a Cloud-Native Platform built on the foundation of Kubernetes. Unlike running a simple Docker deployment on a workstation, Kubernetes provides enterprise-grade container orchestration and scaling, and provides a consistent experience across data center and public cloud.

Deploying NIM on your modern application platform allows you to utilize your existing tooling for monitoring, logging, cost optimization, and security. Developers and operations personnel alike can continue using their existing processes and workflows, ultimately lowering overall deployment time.

Industry-standard APIs

We've just learned that NIM provides a solution which is flexible and easy to deploy on a variety of infrastructure footprints, but how do we actually integrate it into our applications? Ease of deployment doesn't count for much if development teams are forced to go through a steep learning curve of proprietary APIs and methods to use the solution.

Fortunately, this is not the case with NIM. Developers are provided with an OpenAI API compatible programming model and custom NVIDIA extensions for additional functionality. The industry-standard API allows developers to easily integrate NIM into their existing applications and infrastructure without extensive customization or specialized expertise, making the solution very accessible.

Model support for various use cases

AI architects and developers must always keep in mind that there are many AI models available, and this is because models tend to be trained for certain use cases. Most practitioners are already familiar with popular foundation models such as GPT-4 or Llama 2, which are ideal for generalized large language model use cases. However, some models are built for domain-specific use cases, such as computer vision, drug discovery, medical imaging, and more.

Regardless of the use case, NVIDIA NIM offers wide support for available models, including community models, NVIDIA AI Foundations models, and custom AI models provided by NVIDIA partners. Beyond that, NIM addresses the need for domain-specific performance optimizations through several key features. For example, NIM packages domain-specific CUDA libraries and specialized code to various domains such as language, speech, video processing, healthcare, and more. This ensures customers have the tools necessary for their specific use case.

NVIDIA NIM supports a variety of models for various use cases

Getting started

Most organizations will want their developers and AI practitioners to evaluate the various NIM offerings pertaining to their use case with a more constrained footprint, whereas others may want to skip straight to an enterprise-class deployment. Regardless of where your business is in its AI journey, you can begin taking advantage of these powerful microservices.

From a testing point of view, you have the option of using the NVIDIA API Catalog to give your application access to the model of your choice via an NVIDIA-hosted NIM endpoint. This provides a quick method for gaining the benefits of NIM with your application without hosting the endpoint yourself. Developers and engineers who wish to experiment with the solution can do so free of charge with a simple signup that provides access to API endpoint access. If you want to download and run the container yourself, simply sign up for the NVIDIA Developer Program to gain access. Once downloaded, the container of your choice can be run on your local machine using Docker.

When ready for production, organizations can gain access to the entire suite of production-grade microservices, including enterprise support, via NVIDIA AI Enterprise. Through your subscription, the NIM of your choice can be downloaded via the NVIDIA Container Registry for hosting on your public or private cloud infrastructure. Once your NIM endpoint is deployed and scaled to your needs, it can be easily integrated into your application via the industry-standard OpenAI API.

Conclusion

NVIDIA is at the forefront of building state-of-the-art software and tools to help enterprises to not only take generative AI models and put them into production, but to also build retrieval-augmented generation (RAG) pipelines. NVIDIA NIM provides the fastest path to AI inference, and is a powerful addition to the NVIDIA AI Enterprise portfolio. Learn more about how WWT can guide you through your AI journey.