AssemblyAI has announced significant upgrades to its PII Redaction and Entity Detection capabilities, aimed at strengthening data security and extracting key insights from audio transcripts. According to AssemblyAI, the latest update adds PII Text Redaction support for 47 languages and 16 new entity types to its Entity Detection model, bringing the total to 44.
Enhanced PII editing capabilities
The updated PII text editor now supports 47 languages, providing comprehensive protection for personally identifiable information (PII) in a variety of regions. This upgrade allows users to identify and remove sensitive data, such as addresses, phone numbers, and credit card information, from their transcripts. Users can also create transcripts with PII removed or use the tool to “bleep out” sensitive information from audio files.
AssemblyAI provides an example of how to use its API for editing PII.
import assemblyai as aai
aai.settings.api_key = "YOUR API KEY"
audio_url = "https://github.com/AssemblyAI-Community/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"
config = aai.TranscriptionConfig(speaker_labels=True).set_redact_pii(
policies=(
aai.PIIRedactionPolicy.person_name,
aai.PIIRedactionPolicy.organization,
aai.PIIRedactionPolicy.occupation,
),
substitution=aai.PIISubstitutionPolicy.hash,
)
transcript = aai.Transcriber().transcribe(audio_url, config)
for utterance in transcript.utterances:
print(f"Speaker utterance.speaker: utterance.text")
print(transcript.text)
Users can refer to AssemblyAI’s documentation for more detailed examples and in-depth coverage of updates.
Extended entity detection
The entity detection model has been upgraded with 16 new entity types to automatically identify and classify important information in transcripts. This brings the total number of supported entity types to 44, including names, organizations, addresses, and more. The model guarantees 99% accuracy in major languages, making it a powerful tool for extracting valuable insights from audio data.
An example of how to use the API for entity detection is also provided.
import assemblyai as aai
aai.settings.api_key = "YOUR API KEY"
audio_url = "https://github.com/AssemblyAI-Community/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"
config = aai.TranscriptionConfig(entity_detection=True)
transcript = aai.Transcriber().transcribe(audio_url, config)
for entity in transcript.entities:
print(entity.text)
print(entity.entity_type)
print(f"Timestamp: entity.start - entity.end\n")
Additional Materials
AssemblyAI also shared several new blog posts and tutorials to help users get the most out of their product. Topics include using Claude 3.5 Sonnet with audio data, understanding Microsoft’s Florence-2 image model, and creating a real-time language translation service using AssemblyAI and DeepL in JavaScript.
To learn more about these updates or explore additional resources, visit the official AssemblyAI blog.
Image source: Shutterstock