Crypto Flexs
  • DIRECTORY
  • CRYPTO
    • ETHEREUM
    • BITCOIN
    • ALTCOIN
  • BLOCKCHAIN
  • EXCHANGE
  • ADOPTION
  • TRADING
  • HACKING
  • SLOT
Crypto Flexs
  • DIRECTORY
  • CRYPTO
    • ETHEREUM
    • BITCOIN
    • ALTCOIN
  • BLOCKCHAIN
  • EXCHANGE
  • ADOPTION
  • TRADING
  • HACKING
  • SLOT
Crypto Flexs
Home»ADOPTION NEWS»NVIDIA NIM Powers RAG Applications for Veterinary AI
ADOPTION NEWS

NVIDIA NIM Powers RAG Applications for Veterinary AI

By Crypto FlexsAugust 28, 20247 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
NVIDIA NIM Powers RAG Applications for Veterinary AI
Share
Facebook Twitter LinkedIn Pinterest Email

Iris Coleman
27 Aug 2024 19:56

NVIDIA NIM improves Augmented Search Generation (RAG) applications, streamlining AI solutions for specialized fields like veterinary medicine.





The advent of large-scale language models (LLMs) has provided significant benefits to the AI ​​industry, providing versatile tools that can generate human-like text and handle a wide range of tasks. However, while LLMs demonstrate impressive general knowledge, their performance in specialized fields such as veterinary medicine is limited when used outside the box. To enhance their utility in specific fields, the industry typically adopts two main strategies: fine-tuning and augmented retrieval generation (RAG).

Fine tuning vs RAG

Fine-tuning involves training models on carefully curated and structured datasets, requiring significant hardware resources and the involvement of domain experts, a process that is often time-consuming and expensive. Unfortunately, in many fields, accessing domain experts in a way that is compatible with business constraints is prohibitively difficult.

In contrast, RAG involves building a comprehensive knowledge literature corpus along with an effective retrieval system that extracts relevant text chunks to process user queries. By adding this retrieved information to user queries, LLM can generate better answers. This approach still requires subject matter experts to curate the best sources for the dataset, but it is easier to handle and more business-friendly than fine-tuning. In addition, since it does not require extensive training of the model, this approach is less computationally intensive and more cost-effective.

NVIDIA NIM and NLP Pipeline

NVIDIA NIM simplifies the design of NLP pipelines using LLM. These microservices simplify the deployment of generative AI models across platforms, allowing teams to self-host LLM while providing a standard API to build applications.

NIM abstracts model inference internals, such as the execution engine and runtime operations, ensuring optimal performance using TensorRT-LLM, vLLM, etc. Key features include:

  • Scalable deployment
  • Support for various LLM architectures through optimized engines
  • Flexible integration into existing workflows
  • Enterprise-grade security with SafeTensor and continuous CVE monitoring

Developers can run NIM microservices with Docker and perform inference using the API. They can also modify the container command to use trained model weights specialized for specific tasks, such as document parsing.

Reimagining veterinary care using AI

AITEM is part of the NVIDIA Inception Program for startups, and its collaboration with NVIDIA has focused on AI-based solutions across a range of industries and life sciences. In the veterinary field, AITEM is developing LAIKA, an innovative AI pilot designed to support veterinarians by processing patient data and providing diagnostic suggestions, guidance, and explanations.

LAIKA integrates multiple LLM and RAG pipelines. The RAG component retrieves relevant information from a curated dataset of veterinary resources. During the preparation phase, each resource is divided into chunks, and embeddings are computed and stored in the RAG database. During the inference phase, queries are preprocessed, embeddings are computed, and compared to embeddings in the RAG database using a geometric distance metric. The closest match is selected as the most relevant and used to generate a response.

Due to the potential redundancy in the RAG database, multiple chunks retrieved may contain the same information, which limits the diversity of concepts provided to the answering system. To address this, LAIKA uses the Maximum Marginal Relevance (MMR) algorithm to minimize chunk redundancy and ensure a wider range of relevant information.

NVIDIA NeMo Retriever Re-Ranking NIM Microservices

The NVIDIA API Catalog includes the NeMo Retriever NIM microservice, which enables organizations to seamlessly connect custom models to a variety of business data and provide highly accurate responses. The NVIDIA Retrieval QA Mistral 4B Rerank NIM microservice is designed to assess the probability that a given text passage contains relevant information to answer a user query. Integrating this model into a RAG pipeline allows you to filter out searches that fail the reranking model’s evaluation, ensuring that only the most relevant and accurate information is used.

To evaluate the impact of this step on the RAG pipeline, AITEM designed the following experiment:

  1. We extract a dataset consisting of approximately 100 anonymized questions from LAIKA users.
  2. Currently running the RAG pipeline to retrieve chunks for each question.
  3. Sorts the retrieved chunks based on the probabilities provided by the re-ranking model.
  4. Evaluate each chunk for relevance to the query.
  5. We analyze the probability distribution of the re-ranking model based on the relevance determined in step 4.
  6. Compare the chunk rankings in step 3 with the relevance in step 4.

LAIKA user questions can vary considerably in form. Some questions include a detailed description of the situation but do not ask specific questions. Others include precise questions about the study, while others seek guidance or differential diagnosis based on clinical cases or analysis documents.

Because of the large number of chunks per question, AITEM used the Llama 3.1 70B Instruct NIM microservice, which is also available in the NVIDIA API Catalog, for evaluation.

To better understand the performance of the reranking model, we examined specific queries and model responses in detail. Table 1 highlights the top and bottom reranking chunks for a sample query related to differential diagnosis of a weight-losing cat.

TextLogit reranking
Causes of weight loss that can be particularly difficult to diagnose include gastrointestinal conditions that don’t cause vomiting, intestinal conditions that don’t cause vomiting or diarrhea, and liver disease.3.3125
Differential diagnosis for nonspecific signs such as anorexia, weight loss, vomiting, and diarrhea… Acute pancreatitis is rare in cats,… and signs are nonspecific and poorly defined (anorexia, lethargy, weight loss).2.3222
Severe weight loss (with or without increased appetite) may be seen in cancerous cachexia, maldigestion/malabsorption. Some conditions, such as hyperthyroidism in cats, may cause increased appetite. However, a normal appetite does not exclude the presence of a serious condition.2.2265
Overall, weight loss was the most common symptom, with little difference between groups.-5.0078
Other customer complaints include lethargy, loss of appetite, weight loss, and vomiting.-7.3672
There were six British Shorthairs, four European Shorthairs, and one Bengal cat. The clinical signs reported by the owners were: decreased or anorexia…-10.3281
Table 1. The three highest-ranked chunks and the three lowest-ranked text chunks

Figure 4 compares the re-ranking model probability output distribution (in logits) between relevant (good) and irrelevant (bad) chunks. The probability of good chunks is higher than that of bad chunks, and a t-test confirms that this difference is statistically significant, with a p-value lower than 3e-72.

logit-distribution-good-bad-chunks-625x450.png
Figure 4. Distribution of re-ranked model output according to logit

Figure 5 shows the difference in the distribution of the reordering-induced sorting positions. Good chunks are mostly in the upper positions, while bad chunks are in the lower positions. The Mann-Whitney test confirmed that this difference is statistically significant, with a p-value lower than 9e-31.

distribution-good-bad-clump-model-alignment-625x458.png
Figure 5. Distribution of re-ranking model-induced alignments among retrieved chunks.

Figure 6 shows the rank distribution and helps to define an effective cutoff point. Most chunks in the top 5 positions are good, but most chunks in positions 11 to 15 are bad. Therefore, keeping only the top 5 searches or some other selected number can be one way to effectively exclude most bad chunks.

Good-Bad-Chunk-Balanced-Model-Aligned-625x472.png
Figure 6. Balance between good and bad chunks by position in the sort induced by the re-ranking model.

By pairing lightweight embedding models with the NVIDIA reranking NIM microservice, we can improve search accuracy while optimizing the search pipeline and minimizing ingestion costs. Execution time can be improved by 1.75x (Figure 7).

nv-rerankqa-mistral4b-v3-comparison-625x711.png
Figure 7. Comparison of NVIDIA re-ranked NIM microservices.

Better Answers with NVIDIA Reranking NIM Microservices

Results show that adding the NVIDIA Rerank NIM microservice to the LAIKA RAG pipeline has a positive impact on the relevance of the retrieved chunks. By delivering more accurate and specialized information to the downstream response LLM, it provides the model with the knowledge needed for highly specialized fields such as veterinary medicine.

The NVIDIA Rerank NIM microservice, available in the NVIDIA API Catalog, simplifies adoption by making it easy to pull in models, run them, and infer them via APIs. It is pre-quantized and optimized with NVIDIA TensorRT for virtually every platform, eliminating the stress of setup and manual optimization.

For more information and latest updates on LAIKA and other AITEM projects, visit AITEM Solutions and follow LAIKA and AITEM on LinkedIn.

Image source: Shutterstock


Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

VISA uses AI to improve payment security and personalization.

May 18, 2025

NVIDIA unveils the LLAMA-SNEMOTRON data set to improve the AI ​​model training.

May 18, 2025

Did Ether Lee Rium go back to $ 3K in May? The latest rebound says ETH Price still has more gas.

May 18, 2025
Add A Comment

Comments are closed.

Recent Posts

VISA uses AI to improve payment security and personalization.

May 18, 2025

More than 26,000 Ether Rigu Wallet Integrated PECTRA Upgrade Functions Leading the adoption of smart wallets.

May 18, 2025

Binance Coin: Spot vs. FutureS Traders- Who controls the price of BNB?

May 18, 2025

NVIDIA unveils the LLAMA-SNEMOTRON data set to improve the AI ​​model training.

May 18, 2025

Powerful Etherum Price -Points to a new upward seat.

May 18, 2025

SONIC’s emergence and meaning of Defi: Report

May 18, 2025

Did Ether Lee Rium go back to $ 3K in May? The latest rebound says ETH Price still has more gas.

May 18, 2025

Dogecoin Price: ETF momentum and institutional demand set Doge to exceed $ 1, but can you win $ 0.07 RTX?

May 18, 2025

Bitfinex improves user experience with version 1.115 updates.

May 18, 2025

Understanding Challenge of Adopting Encourage: Insights by WorldCoin

May 18, 2025

Is it time to rotate with Cardano Traders, ADA $ 0.78 DIP Zone? – evaluation …

May 18, 2025

Crypto Flexs is a Professional Cryptocurrency News Platform. Here we will provide you only interesting content, which you will like very much. We’re dedicated to providing you the best of Cryptocurrency. We hope you enjoy our Cryptocurrency News as much as we enjoy offering them to you.

Contact Us : Partner(@)Cryptoflexs.com

Top Insights

VISA uses AI to improve payment security and personalization.

May 18, 2025

More than 26,000 Ether Rigu Wallet Integrated PECTRA Upgrade Functions Leading the adoption of smart wallets.

May 18, 2025

Binance Coin: Spot vs. FutureS Traders- Who controls the price of BNB?

May 18, 2025
Most Popular

eth2 quick update number 7

February 14, 2024

Bitcoin inheritance plan and management through Vault12

March 13, 2025

Gala music strengthens artist rewards by integrating Spotify.

February 13, 2025
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms and Conditions
© 2025 Crypto Flexs

Type above and press Enter to search. Press Esc to cancel.