A recent report from AssemblyAI found that in a comprehensive analysis of leading Speech-to-Text models, AssemblyAI’s Universal-2 performed best when compared to OpenAI’s Whisper variant. The evaluation focused on real-world use cases, evaluating the model on tasks essential to creating accurate transcriptions, including proper noun recognition, alphanumeric transcription, and text formatting.
Model Comparison
The analysis compared Universal-2 and its predecessor, Universal-1, with OpenAI’s Whisper Large-v3 and Whisper Turbo models. Each model was evaluated based on parameters such as Word Error Rate (WER), Proper Noun Error Rate (PNER), and other metrics important for Speech-to-Text tasks.
performance indicators
Universal-2 achieved the lowest word error rate (WER) at 6.68%, a 3% improvement over Universal-1. The Whisper model was competitive but had a slightly higher error rate, with the larger v3 achieving a WER of 7.88% and the Turbo at 7.75%.
In proper noun recognition, Universal-2 showed excellent accuracy with 13.87% PNER, outperforming both Whisper Large-v3 and Turbo. The model also excelled in text format, achieving a U-WER of 10.04%, indicating better handling of punctuation and capitalization.
Alphanumeric and Hallucination Rate
Whisper Large-v3 showed the lowest error rate of 3.84% for alphanumeric notation, slightly ahead of Universal-2’s 4.00%. However, the reduced hallucination rate of Universal-2 was a 30% reduction compared to the Whisper model, which was an important advantage as it allowed it to be used more reliably in real-world applications.
conclusion
The advancements of Universal-2 over Universal-1 are evident through improvements in accuracy, proper noun handling, and formatting. Despite Whisper’s strengths in certain areas, his susceptibility to hallucinations makes consistent performance difficult.
For additional insights and detailed metrics, you can check out AssemblyAI’s official report for the full evaluation.
Image source: Shutterstock