Anthropic’s recently released Claude 3.5 Sonnet sets a new industry benchmark for a variety of LLM tasks. The model excels at complex coding, nuanced literary analysis, and demonstrates exceptional contextual awareness and creativity.
According to AssemblyAI, users can now learn how to leverage Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku with audio or video files in Python.
Some use cases for this pipeline include:
- Create a summary of a long podcast or YouTube video
- Ask a question about audio content
- Create action items in meetings
How does it work?
Since language models primarily work with text data, they first need to transcribe audio data. Multimodal models can solve this, but they are still in the early stages of development.
To achieve this, AssemblyAI’s LeMUR framework is used. LeMUR simplifies the process by allowing you to combine industry-leading Speech AI models with LLM in just a few lines of code.
SDK Setup
To get started, install the AssemblyAI Python SDK, which includes all of LeMUR’s features.
pip install assemblyai
Then import the package and set up an API key, which you can get for free here.
import assemblyai as aai
aai.settings.api_key = "YOUR_API_KEY"
Transcribe audio or video files
Next, set up your audio or video file to transcribe. Transcriber
And call transcribe()
Function. You can pass a local file or a publicly accessible URL. For example, you could use an episode of Lenny’s podcast featuring Dalton Caldwell of Y Combinator.
audio_url = "https://storage.googleapis.com/aai-web-samples/lennyspodcast-daltoncaldwell-ycstartups.m4a"
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_url)
print(transcript.text)
Using Claude 3.5 Sonnet with Audio Data
The Claude 3.5 Sonnet is Anthropic’s most advanced model to date, outperforming the Claude 3 Opus in a number of evaluations while also being more cost-effective.
Please call to use Sonnet 3.5. transcript.lemur.task()
A flexible endpoint that allows you to specify any prompt. It automatically adds the script as additional context to your model.
Specify aai.LemurModel.claude3_5_sonnet
When calling LLM on a model. Here’s an example of a simple summary prompt:
prompt = "Provide a brief summary of the transcript."
result = transcript.lemur.task(
prompt, final_model=aai.LemurModel.claude3_5_sonnet
)
print(result.response)
Using Claude 3 Opus with audio data
Claude 3 Opus is adept at handling complex analyses, long-term tasks with multiple steps, and high-level mathematical and coding tasks.
To use Opus, specify: aai.LemurModel.claude3_opus
When calling LLM for a model, here is an example of a prompt that extracts specific information from a transcript.
prompt = "Extract all advice Dalton gives in this podcast episode. Use bullet points."
result = transcript.lemur.task(
prompt, final_model=aai.LemurModel.claude3_opus
)
print(result.response)
Using Claude 3 Haiku with audio data
The Claude 3 Haiku is our fastest and most cost-effective model, ideal for light workloads.
To use Haiku, specify: aai.LemurModel.claude3_haiku
When you call an LLM, it’s about the model. Here’s an example of a simple prompt to ask a question:
prompt = "What are tar pit ideas?"
result = transcript.lemur.task(
prompt, final_model=aai.LemurModel.claude3_haiku
)
print(result.response)
Learn more about Prompt Engineering
Applying the Claude 3 model to audio data is straightforward using the AssemblyAI and LeMUR frameworks. To maximize the benefits of the LeMUR and Claude 3 models, please refer to the additional resources provided by AssemblyAI.
Image source: Shutterstock