Much of the world’s data is still untapped, and companies are looking to create next-generation generative AI applications to extract value from this data. According to the NVIDIA Tech Blog, the Augmented Search Generation (RAG) pipeline is a key part of this, allowing users to interact with massive amounts of data and transform documents into conversational AI applications.
Challenges of implementing the RAG pipeline
Enterprises face several challenges when implementing RAG pipelines. Handling both structured and unstructured data is complex, processing and searching the data is computationally intensive, and these pipelines must also incorporate privacy and security.
To address these challenges, NVIDIA and Oracle have collaborated to demonstrate how different segments of the RAG pipeline can leverage the NVIDIA accelerated computing platform on Oracle Cloud Infrastructure (OCI). This integration aims to help enterprises better leverage their data to improve the quality and reliability of their generated AI output.
Generating Embeddings Using NVIDIA GPUs and Oracle Autonomous Database
In data-rich enterprise environments, leveraging massive amounts of text data for AI is critical to driving efficiency and productivity. NVIDIA and Oracle have demonstrated how customers can access NVIDIA GPUs via Oracle Machine Learning (OML) Notebooks in Autonomous Database. This allows users to load data directly from Oracle Database tables into an OCI NVIDIA GPU-accelerated virtual machine (VM) instance, generate vector embeddings using GPUs, and store those vectors in Oracle Database for efficient searching using AI vector search.
Accelerated Vector Search Index and Oracle Database 23ai
NVIDIA cuVS is an open source library for GPU-accelerated vector search and clustering. A key feature of cuVS is its ability to significantly improve index build times, a critical component of vector search. NVIDIA and Oracle have demonstrated a proof of concept that accelerates vector index builds for the Hierarchical Navigable Small World (HNSW) algorithm. This approach combines GPUs and CPUs to make index creation faster and improve the performance of AI workloads.
High-Performance LLM Inference Using NIM in OCI
NVIDIA NIM provides containers that self-host GPU-accelerated inference microservices for pre-trained and custom AI models in a variety of environments. NIM microservices are designed for NVIDIA accelerated infrastructure, allowing for seamless integration with existing tools and applications. Developers can quickly deploy LLMs with minimal code on-premises or in Kubernetes-managed cloud environments.
Deploying NVIDIA NIM on OCI offers several benefits, including improved total cost of ownership (TCO) with low-latency, high-throughput inference, faster time to market with pre-built microservices, and greater security and control of applications and data.
In an Oracle CloudWorld demo, NVIDIA and Oracle demonstrated how NIM for LLM can achieve higher throughput than existing open source alternatives, especially for text generation and translation use cases.
Get started
NVIDIA, in collaboration with OCI and the Oracle Database team, demonstrated how to use NVIDIA GPUs and software to accelerate the bulk generation of vector embeddings, HNSW index creation, and inference elements. This approach helps organizations leverage the performance gains available on NVIDIA accelerated computing platforms to leverage AI to manage massive amounts of data stored in Oracle databases.
Learn more about cuVS. To try out NVIDIA NIM, visit ai.nvidia.com and join the NVIDIA Developer Program to get immediate access to microservices. You can also use NVIDIA GPU-enabled notebooks with Autonomous Database and Oracle Database 23ai AI Vector Search using Oracle Database 23ai Free.
Image source: Shutterstock