According to the NVIDIA Technical Blog, NVIDIA has unveiled NIM microservices for speech and translation, part of the NVIDIA AI Enterprise product line. These microservices allow developers to self-host GPU-accelerated inference for both pre-trained and custom AI models in the cloud, in the data center, and on their workstations.
Advanced voice and translation features
The new microservices leverage NVIDIA Riva to provide automatic speech recognition (ASR), neural machine translation (NMT), and text-to-speech (TTS) capabilities. The integration aims to improve global user experiences and accessibility by integrating multilingual voice capabilities into applications.
Developers can leverage these microservices to build customer service bots, conversational voice assistants, multilingual content platforms, and optimize high-performance AI inference at scale with minimal development effort.
Interactive browser interface
Users can perform basic inference tasks such as transcribing speech, translating text, and generating synthetic speech directly through the browser using a conversational interface available in the NVIDIA API catalog. This capability provides a convenient starting point for exploring the capabilities of the speech and translation NIM microservices.
These tools are flexible enough to be deployed in a variety of environments, from local workstations to cloud and data center infrastructures, making them scalable to meet a variety of deployment requirements.
Running Microservices with NVIDIA Riva Python Client
The NVIDIA Tech Blog details how to clone the nvidia-riva/python-clients GitHub repository and use the provided scripts to run a simple inference job on the NVIDIA API Catalog Riva endpoint. Users will need an NVIDIA API key to access these commands.
Examples provided include transcribing audio files in streaming mode, translating text from English to German, and generating synthetic speech. These tasks demonstrate practical applications of microservices in real-world scenarios.
Local deployment with Docker
If you have a high-end NVIDIA data center GPU, you can run the microservices locally using Docker. Detailed instructions are provided on how to set up the ASR, NMT, and TTS services. You will need an NGC API key to pull the NIM microservices from NVIDIA’s container registry and run them on your local system.
Integration with RAG pipeline
This blog also covers how to connect the ASR and TTS NIM microservices to a basic augmented search generation (RAG) pipeline. This setup allows users to upload articles to the knowledge base, ask questions verbally, and receive answers in synthesized speech.
The instructions include setting up the environment, starting the ASR and TTS NIMs, and configuring the RAG web app to query large-scale language models with text or speech. This integration demonstrates the potential of combining speech microservices with advanced AI pipelines for enhanced user interactions.
Get started
Developers looking to add multilingual voice AI to their applications can start by exploring the Voice NIM microservices. These tools provide a seamless way to integrate ASR, NMT, and TTS across multiple platforms to deliver scalable, real-time voice services to global audiences.
For more information, visit the NVIDIA Technology Blog.
Image source: Shutterstock