The first release of NVIDIA NIM Operator simplified the deployment and lifecycle management of inference pipelines for NVIDIA NIM microservices, reducing the workload for MLOps, LLMOps engineers, and Kubernetes admins. It enabled easy and fast deployment, auto-scaling, and upgrading of NIM on Kubernetes clusters. Learn more about the first release.
Our customers and partners have been using the NIM Operator to efficiently manage inference pipelines for their applications, such as chatbots, agentic RAG, and virtual drug discovery. Our partners, on the Cisco Compute Solutions team, are using the NIM Operator to deploy the NVIDIA AI Blueprint for RAG as part of the Cisco Validated Design.
“We strategically integrate the NVIDIA NIM Operator with Cisco Validated Design (CVD) into our AI-ready infrastructure, enhancing enterprise-grade retrieval-augmented generation pipelines. The NIM Operator significantly streamlines the deployment, autoscaling, and rollout processes for NVIDIA NIM. The NIM Operator’s highly efficient model caching greatly improves AI application performance, and the NIMPipeline custom resource unifies management of multiple NIM services through a single, declarative configuration file. The combination of streamlined operations and efficient resource management significantly boosts overall operational efficiency when deploying and managing NIM on Cisco infrastructure.” — Paniraja Koppa, technical marketing engineering leader, Cisco Systems
With the release of NVIDIA NIM Operator 2.0, we added the ability to deploy and manage the lifecycle of NVIDIA NeMo microservices. NeMo microservices are a collection of tools to build AI workflows, such as an AI data flywheel, on your Kubernetes cluster, whether on-premises or in the cloud.
NVIDIA is introducing new Kubernetes custom resource definitions (CRDs) to deploy three core NeMo microservices:
NeMo Customizer: Facilitates the fine-tuning of large language models (LLMs) using supervised and parameter-efficient fine-tuning techniques.
NeMo Evaluator: Provides comprehensive evaluation capabilities for LLMs, supporting academic benchmarks, custom automated evaluations, and LLM-as-a-Judge approaches.
NeMo Guardrails: Adds safety checks and content moderation to LLM endpoints, protecting against hallucinations, harmful content, and security vulnerabilities.
Figure 1. NIM Operator architecture
Core capabilities and benefits
This release includes several new and updated features, including the following.
Easy and fast deployments
The NIM Operator simplifies deploying NIM and NeMo microservices for AI workflows in just a few steps, and supports two types of deployment:
Quick start provides curated dependencies such as databases and OTEL servers, to quickly run your AI workflows. Learn how to get started.
Figure 2. NIM Operator 2.0 deployment
Custom configuration, supporting customization of NeMo microservices CRDs to use your production-grade dependencies, and pick and choose which microservices to deploy. Get started with our documentation.
Simplified Day 2 operations
The NIM Operator makes it easy to manage Day 2 operations. It supports configuring rolling upgrades, ingress, and autoscaling. This includes:
Simplified upgrades: Support for rolling upgrades of NeMo microservices with a customizable rolling strategy. Change the version number of the NeMo microservices CRDs, and the NIM Operator updates the deployments in the cluster, managing any database schema changes.
Configurable Ingress Rules: Kubernetes ingress rules for NeMo microservices, enabling custom host/path access to APIs.
Autoscale: Supports auto-scaling the NeMo microservices deployment and its ReplicaSet using Kubernetes Horizontal Pod Autoscaler (HPA). The NemoCustomizer, NemoEvaluator, and NemoGuardrails CRDs work with all the familiar HPA metrics and scaling behaviors.
Figure 3. NIM Operator Day 2 operations
Simplified AI workflow management: The NIM Operator can simplify deployment of AI workflows. For example, to deploy a trusted LLM chatbot, users can manage a single guardrails NIM pipeline that deploys all the necessary components: LLM NIM and NeMo Guardrails NIM for content safety, jailbreak, and topic control.
Extended support matrix: NVIDIA NIM microservices across many domains, such as reasoning, retrieval, speech, and biology. We test a wide variety of Kubernetes platforms and have added many platform-specific security settings or documented resource constraints.
We are continuously expanding the list of supported NVIDIA NIM and NVIDIA NeMo microservices. For more information about the full list of supported NIM and NeMo microservices, see Platform Support.
Get started
By automating the deployment, scaling, and lifecycle management of both NVIDIA NIM and NVIDIA NeMo microservices, the NIM Operator makes it easier for enterprise teams to adopt AI workflows. This effort aligns with our commitment to making AI workflows easy to deploy with NVIDIA AI Blueprints and quickly move them to production. The NIM Operator is part of NVIDIA AI Enterprise, providing enterprise support, API stability, and proactive security patching.
Get started through NGC or from the GitHub repo. For technical questions on installation, usage, or issues, please file an issue on the GitHub repo.