lawrence jenga
January 25, 2025 09:06
Together AI is introducing the Together Audio API, which leverages Cartesia Sonic’s low-latency, multilingual voice model to help developers create advanced voice applications across a variety of industries.
Together AI announced the launch of the Together Audio API, based on Cartesia Sonic, a state-of-the-art low-latency and hyper-realistic speech model. This collaboration allows developers to support a variety of voices and languages with direct access to Sonic models through the Together API. According to Together AI, the plan will expand the platform’s capabilities, allowing the creation of multimodal applications that integrate chat, images, audio, and more through a single platform.
Key features and compliance
Powered by Cartesia Sonic, the Together Audio API boasts cutting-edge low latency and ultra-realistic voice capabilities. Developers can build enterprise voice applications on the Together platform that complies with HIPAA and SOC2 standards. The platform also offers a cookbook to help developers get started, including creating NotebookLM-style podcasts using agent workflows.
Building multimodal applications
The introduction of audio capabilities is a significant milestone for Together AI, which aims to help developers build and orchestrate multimodal applications. These applications can integrate multiple AI models, including chat, images, audio, and code, through the Together API platform. This platform allows you to seamlessly orchestrate AI models such as speech-to-text, large-scale language models, and text-to-speech, minimizing latency without the need for multiple API providers.
Voice AI use cases
Voice AI is transforming industries, with 85% of companies expecting widespread deployment within the next five years. Developers can leverage voice capabilities for AI-powered customer support, content creation, and custom voice assistants. For example, combining LLM with Sonic’s natural responses can enhance customer inquiries, while AI can automate the creation of audio content for podcasts and media.
Why choose Cartesia Sonic?
Cartesia Sonic outperforms other voice models in blind human preference tests, offering ultra-low latency and superior content processing capabilities. With streaming latency of just 90ms, Sonic delivers the fastest end-to-end voice applications. Cartesia’s innovative state space model architecture allows it to excel at handling complex inputs and offers a variety of voice options in 15 languages.
Getting started
Developers interested in developing with voice AI can join the Together AI developer community on Discord to share projects and ideas. Together Audio API and Cartesia Sonic provide the opportunity to develop advanced voice applications to improve user experience across a variety of sectors.
Image source: Shutterstock