AssemblyAI has launched its latest speech recognition model, Universal-1, which sets a new benchmark for automatic speech recognition (ASR) accuracy. The model is designed to achieve near-human transcription accuracy even in challenging audio environments with accents, background noise, and complex phrases. According to AssemblyAI, the Universal-1 model is now accessible through the same web API as the previous ASR model.
New pricing tiers for Universal-1
With the launch of Universal-1, AssemblyAI has unveiled two new pricing tiers: Best and Nano. The top tier is optimized for maximum accuracy, while the nano tier provides a cost-effective solution supporting transcription in 99 languages. This flexibility allows developers to choose the right balance of accuracy and cost to fit their specific needs.
Getting started with AssemblyAI Python SDK
To facilitate the transcription process, AssemblyAI provides an official Python SDK. Developers can easily install the SDK using the following command:
pip install --upgrade assemblyai
After installation, users must register for an AssemblyAI account to obtain an API key needed to authorize API calls from Python scripts.
Copy audio files using Universal-1
Once set up, developers can create a Python script to copy audio files. By default, the SDK uses the highest tier for transcription to ensure the highest accuracy. This process includes importing the SDK, configuring the API client with an API key, and specifying an audio file URL or local path.
import assemblyai as aai
aai.settings.api_key = "YOUR_API_KEY"
transcriber = aai.Transcriber()
audio_file = "https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3"
transcript = transcriber.transcribe(audio_file)
if transcript.error:
print(transcript.error)
else:
print(transcript.text)
Running the script outputs transcription results to the terminal, demonstrating the impressive capabilities of the model.
Nano Hierarchy Navigation
If you want a more economical option, switching to the Nano tier is simple. The developer TranscriptionConfig
Objects that utilize the Nano model by setting up speech_model
Set the parameter to “nano”.
config = aai.TranscriptionConfig(speech_model="nano")
transcriber = aai.Transcriber(config=config)
transcript = transcriber.transcribe(audio_file)
This flexibility allows you to use your resources efficiently while benefiting from AssemblyAI’s powerful transcription capabilities.
Beyond Warriors: Additional Features
AssemblyAI’s services extend beyond basic transcription. The platform provides advanced features such as entity detection, content moderation, PII redaction, and applying Large Language Models (LLMs) to audio data. These features enhance the usability of transcription services, making them suitable for a wide range of applications.
Developers interested in taking advantage of these capabilities can explore AssemblyAI’s documentation and research resources to gain additional insight into building advanced voice AI solutions.
Image source: Shutterstock