AssemblyAI Unveils Enhanced PII Redaction and Entity Detection Capabilities

Jessie A Ellis
26 Jul 2024 05:52

AssemblyAI introduces advanced PII redaction capabilities in 47 languages and adds 16 new entity types to its entity detection model, ensuring 99% accuracy.

AssemblyAI has announced significant upgrades to its PII Redaction and Entity Detection capabilities, aimed at strengthening data security and extracting key insights from audio transcripts. According to AssemblyAI, the latest update adds PII Text Redaction support for 47 languages and 16 new entity types to its Entity Detection model, bringing the total to 44.

Enhanced PII editing capabilities

The updated PII text editor now supports 47 languages, providing comprehensive protection for personally identifiable information (PII) in a variety of regions. This upgrade allows users to identify and remove sensitive data, such as addresses, phone numbers, and credit card information, from their transcripts. Users can also create transcripts with PII removed or use the tool to “bleep out” sensitive information from audio files.

AssemblyAI provides an example of how to use its API for editing PII.

import assemblyai as aai

aai.settings.api_key = "YOUR API KEY"

audio_url = "https://github.com/AssemblyAI-Community/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"

config = aai.TranscriptionConfig(speaker_labels=True).set_redact_pii(
  policies=(
    aai.PIIRedactionPolicy.person_name,
    aai.PIIRedactionPolicy.organization,
    aai.PIIRedactionPolicy.occupation,
  ),
  substitution=aai.PIISubstitutionPolicy.hash,
)

transcript = aai.Transcriber().transcribe(audio_url, config)

for utterance in transcript.utterances:
  print(f"Speaker utterance.speaker: utterance.text")
  
print(transcript.text)

Users can refer to AssemblyAI’s documentation for more detailed examples and in-depth coverage of updates.

Extended entity detection

The entity detection model has been upgraded with 16 new entity types to automatically identify and classify important information in transcripts. This brings the total number of supported entity types to 44, including names, organizations, addresses, and more. The model guarantees 99% accuracy in major languages, making it a powerful tool for extracting valuable insights from audio data.

An example of how to use the API for entity detection is also provided.

import assemblyai as aai

aai.settings.api_key = "YOUR API KEY"

audio_url = "https://github.com/AssemblyAI-Community/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"

config = aai.TranscriptionConfig(entity_detection=True)

transcript = aai.Transcriber().transcribe(audio_url, config)

for entity in transcript.entities:
  print(entity.text)
  print(entity.entity_type)
  print(f"Timestamp: entity.start - entity.end\n")

Additional Materials

AssemblyAI also shared several new blog posts and tutorials to help users get the most out of their product. Topics include using Claude 3.5 Sonnet with audio data, understanding Microsoft’s Florence-2 image model, and creating a real-time language translation service using AssemblyAI and DeepL in JavaScript.

To learn more about these updates or explore additional resources, visit the official AssemblyAI blog.

Image source: Shutterstock

AssemblyAI Unveils Enhanced PII Redaction and Entity Detection Capabilities

Algorand (Algo) Get momentum in the launch and technical growth.

It flashes again in July

Stablecoin startups surpass 2021 venture capital peaks as institutional money spills.

Crypto Company is a bank license in the US during Ripple, Circle and Bito Target

HeraldEX Defines The Future With Its One-Stop Crypto Platform For Businesses

BSGM Engages CXG To Acquire FINRA/SEC-Registered Broker-Dealer To Expand Publicly Traded RWA Tokenization Operations

Tornado cash Roman storms insist on Doj Botched Key Telegram evidence.

HBAR prices overtake Bitcoin cash after a 4 -month high and 27% rise.

Algorand (Algo) Get momentum in the launch and technical growth.

Floki Eyes 120% Rally Valhalla launches $ 10K prizes after explosive weekly growth

Crypto Digital Marketing Agency to Elevate Your Project

Encryption responded to US-Vietnamese trade transactions. BTC wiped $ 110K

Rich Miner plan aims to audit a stable encryption.

Tethers in September, completing USDT support for Omni, Bitcoin Cash SLP, KUSAMA, EOS and Algorand

Top Insights

Crypto Company is a bank license in the US during Ripple, Circle and Bito Target

HeraldEX Defines The Future With Its One-Stop Crypto Platform For Businesses

BSGM Engages CXG To Acquire FINRA/SEC-Registered Broker-Dealer To Expand Publicly Traded RWA Tokenization Operations

Most Popular

VanEck Announces Upcoming Webinar Series on Various Investment Topics

CoinDCX’s Okto Chain allocates 1% of OKTO supply as developer subsidies.

Mysterious $1.17 Million Bitcoin Transfer to Bitcoin Creator Nakamoto Wallet

AssemblyAI Unveils Enhanced PII Redaction and Entity Detection Capabilities

Enhanced PII editing capabilities

Extended entity detection

Additional Materials

Related Posts