NVIDIA NIM Powers RAG Applications for Veterinary AI

Iris Coleman
27 Aug 2024 19:56

NVIDIA NIM improves Augmented Search Generation (RAG) applications, streamlining AI solutions for specialized fields like veterinary medicine.

The advent of large-scale language models (LLMs) has provided significant benefits to the AI industry, providing versatile tools that can generate human-like text and handle a wide range of tasks. However, while LLMs demonstrate impressive general knowledge, their performance in specialized fields such as veterinary medicine is limited when used outside the box. To enhance their utility in specific fields, the industry typically adopts two main strategies: fine-tuning and augmented retrieval generation (RAG).

Fine tuning vs RAG

Fine-tuning involves training models on carefully curated and structured datasets, requiring significant hardware resources and the involvement of domain experts, a process that is often time-consuming and expensive. Unfortunately, in many fields, accessing domain experts in a way that is compatible with business constraints is prohibitively difficult.

In contrast, RAG involves building a comprehensive knowledge literature corpus along with an effective retrieval system that extracts relevant text chunks to process user queries. By adding this retrieved information to user queries, LLM can generate better answers. This approach still requires subject matter experts to curate the best sources for the dataset, but it is easier to handle and more business-friendly than fine-tuning. In addition, since it does not require extensive training of the model, this approach is less computationally intensive and more cost-effective.

NVIDIA NIM and NLP Pipeline

NVIDIA NIM simplifies the design of NLP pipelines using LLM. These microservices simplify the deployment of generative AI models across platforms, allowing teams to self-host LLM while providing a standard API to build applications.

NIM abstracts model inference internals, such as the execution engine and runtime operations, ensuring optimal performance using TensorRT-LLM, vLLM, etc. Key features include:

Scalable deployment
Support for various LLM architectures through optimized engines
Flexible integration into existing workflows
Enterprise-grade security with SafeTensor and continuous CVE monitoring

Developers can run NIM microservices with Docker and perform inference using the API. They can also modify the container command to use trained model weights specialized for specific tasks, such as document parsing.

Reimagining veterinary care using AI

AITEM is part of the NVIDIA Inception Program for startups, and its collaboration with NVIDIA has focused on AI-based solutions across a range of industries and life sciences. In the veterinary field, AITEM is developing LAIKA, an innovative AI pilot designed to support veterinarians by processing patient data and providing diagnostic suggestions, guidance, and explanations.

LAIKA integrates multiple LLM and RAG pipelines. The RAG component retrieves relevant information from a curated dataset of veterinary resources. During the preparation phase, each resource is divided into chunks, and embeddings are computed and stored in the RAG database. During the inference phase, queries are preprocessed, embeddings are computed, and compared to embeddings in the RAG database using a geometric distance metric. The closest match is selected as the most relevant and used to generate a response.

Due to the potential redundancy in the RAG database, multiple chunks retrieved may contain the same information, which limits the diversity of concepts provided to the answering system. To address this, LAIKA uses the Maximum Marginal Relevance (MMR) algorithm to minimize chunk redundancy and ensure a wider range of relevant information.

NVIDIA NeMo Retriever Re-Ranking NIM Microservices

The NVIDIA API Catalog includes the NeMo Retriever NIM microservice, which enables organizations to seamlessly connect custom models to a variety of business data and provide highly accurate responses. The NVIDIA Retrieval QA Mistral 4B Rerank NIM microservice is designed to assess the probability that a given text passage contains relevant information to answer a user query. Integrating this model into a RAG pipeline allows you to filter out searches that fail the reranking model’s evaluation, ensuring that only the most relevant and accurate information is used.

To evaluate the impact of this step on the RAG pipeline, AITEM designed the following experiment:

We extract a dataset consisting of approximately 100 anonymized questions from LAIKA users.
Currently running the RAG pipeline to retrieve chunks for each question.
Sorts the retrieved chunks based on the probabilities provided by the re-ranking model.
Evaluate each chunk for relevance to the query.
We analyze the probability distribution of the re-ranking model based on the relevance determined in step 4.
Compare the chunk rankings in step 3 with the relevance in step 4.

LAIKA user questions can vary considerably in form. Some questions include a detailed description of the situation but do not ask specific questions. Others include precise questions about the study, while others seek guidance or differential diagnosis based on clinical cases or analysis documents.

Because of the large number of chunks per question, AITEM used the Llama 3.1 70B Instruct NIM microservice, which is also available in the NVIDIA API Catalog, for evaluation.

To better understand the performance of the reranking model, we examined specific queries and model responses in detail. Table 1 highlights the top and bottom reranking chunks for a sample query related to differential diagnosis of a weight-losing cat.

Text	Logit reranking
Causes of weight loss that can be particularly difficult to diagnose include gastrointestinal conditions that don’t cause vomiting, intestinal conditions that don’t cause vomiting or diarrhea, and liver disease.	3.3125
Differential diagnosis for nonspecific signs such as anorexia, weight loss, vomiting, and diarrhea… Acute pancreatitis is rare in cats,… and signs are nonspecific and poorly defined (anorexia, lethargy, weight loss).	2.3222
Severe weight loss (with or without increased appetite) may be seen in cancerous cachexia, maldigestion/malabsorption. Some conditions, such as hyperthyroidism in cats, may cause increased appetite. However, a normal appetite does not exclude the presence of a serious condition.	2.2265
Overall, weight loss was the most common symptom, with little difference between groups.	-5.0078
Other customer complaints include lethargy, loss of appetite, weight loss, and vomiting.	-7.3672
There were six British Shorthairs, four European Shorthairs, and one Bengal cat. The clinical signs reported by the owners were: decreased or anorexia…	-10.3281

Table 1. The three highest-ranked chunks and the three lowest-ranked text chunks

Figure 4 compares the re-ranking model probability output distribution (in logits) between relevant (good) and irrelevant (bad) chunks. The probability of good chunks is higher than that of bad chunks, and a t-test confirms that this difference is statistically significant, with a p-value lower than 3e-72.

*Figure 4. Distribution of re-ranked model output according to logit*

Figure 5 shows the difference in the distribution of the reordering-induced sorting positions. Good chunks are mostly in the upper positions, while bad chunks are in the lower positions. The Mann-Whitney test confirmed that this difference is statistically significant, with a p-value lower than 9e-31.

distribution-good-bad-clump-model-alignment-625x458.png — *Figure 5. Distribution of re-ranking model-induced alignments among retrieved chunks.*

Figure 6 shows the rank distribution and helps to define an effective cutoff point. Most chunks in the top 5 positions are good, but most chunks in positions 11 to 15 are bad. Therefore, keeping only the top 5 searches or some other selected number can be one way to effectively exclude most bad chunks.

*Figure 6. Balance between good and bad chunks by position in the sort induced by the re-ranking model.*

By pairing lightweight embedding models with the NVIDIA reranking NIM microservice, we can improve search accuracy while optimizing the search pipeline and minimizing ingestion costs. Execution time can be improved by 1.75x (Figure 7).

*Figure 7. Comparison of NVIDIA re-ranked NIM microservices.*

Better Answers with NVIDIA Reranking NIM Microservices

Results show that adding the NVIDIA Rerank NIM microservice to the LAIKA RAG pipeline has a positive impact on the relevance of the retrieved chunks. By delivering more accurate and specialized information to the downstream response LLM, it provides the model with the knowledge needed for highly specialized fields such as veterinary medicine.

The NVIDIA Rerank NIM microservice, available in the NVIDIA API Catalog, simplifies adoption by making it easy to pull in models, run them, and infer them via APIs. It is pre-quantized and optimized with NVIDIA TensorRT for virtually every platform, eliminating the stress of setup and manual optimization.

For more information and latest updates on LAIKA and other AITEM projects, visit AITEM Solutions and follow LAIKA and AITEM on LinkedIn.

Image source: Shutterstock

NVIDIA NIM Powers RAG Applications for Veterinary AI

It flashes again in July

Stablecoin startups surpass 2021 venture capital peaks as institutional money spills.

Gala Games improves leader board rewards and introduces preference systems.

Rich Miner plan aims to audit a stable encryption.

Tethers in September, completing USDT support for Omni, Bitcoin Cash SLP, KUSAMA, EOS and Algorand

21.72% of encryption in the second quarter of 2025

Arthur Hayes will continue to predict the super -large Altcoin season.

Watt protocol audit summary -ACKEE blockchain

MultiBank Group Confirms $MBG Token TGE Set For July 22, 2025

BTC, LTC, XRP and other crypto hobby holders can earn $5282 per day – SWL Miner

What It Means For Crypto Investors

PUMP.FUN tokens are traded at 40% premium at ICO prices.

Mine Bitcoin And Dogecoin For Free With DL Mining! UK Compliance Platform Officially Opened

PEPESCAPE Launches Crypto Presale, Combining Memecoin Culture With Decentralized Finance Ecosystem

Top Insights

Rich Miner plan aims to audit a stable encryption.

Tethers in September, completing USDT support for Omni, Bitcoin Cash SLP, KUSAMA, EOS and Algorand

21.72% of encryption in the second quarter of 2025

Most Popular

Crypto Games of the Week: Saga Airdrop, Notcoin Trading, and PlayStation NFT Plans

How to Spot the Bull Checker Scam Threatening Solana Users

Crypto Trader says that Ethereum competitors who are currently offering opportunities at the current level say they say, but they have catch.

NVIDIA NIM Powers RAG Applications for Veterinary AI

Fine tuning vs RAG

NVIDIA NIM and NLP Pipeline

Reimagining veterinary care using AI

NVIDIA NeMo Retriever Re-Ranking NIM Microservices

Better Answers with NVIDIA Reranking NIM Microservices

Related Posts