AssemblyAI provides insight into how to build a real-time language translation service using JavaScript in a comprehensive tutorial. The tutorial leverages AssemblyAI to perform real-time speech-to-text conversion and leverages DeepL to translate the converted text into multiple languages.
Introducing Real-Time Translation
Translation plays a vital role in communication and accessibility across languages. For example, foreign tourists may have difficulty communicating if they do not understand the local language. AssemblyAI’s Streaming Speech-to-Text service transcribes speech in real time and then translates it using DeepL to enable seamless communication.
Project Settings
The tutorial starts by setting up a Node.js project. The required dependencies are installed, including Express.js to create a simple server, dotenv to manage environment variables, and the official libraries for AssemblyAI and DeepL.
mkdir real-time-translation
cd real-time-translation
npm init -y
npm install express dotenv assemblyai deepl-node
The API keys for AssemblyAI and DeepL are stored in the .env file to keep them secure and prevent them from being exposed to the frontend.
Create backend
The backend is designed to keep API keys safe and generate temporary tokens for secure communication with the AssemblyAI and DeepL APIs. Routes are defined to provide the frontend and handle token generation and text translation.
const express = require("express");
const deepl = require("deepl-node");
const AssemblyAI = require("assemblyai");
require("dotenv").config();
const app = express();
const port = 3000;
app.use(express.static("public"));
app.use(express.json());
app.get("https://blockchain.news/", (req, res) =>
res.sendFile(__dirname + "/public/index.html");
);
app.get("/token", async (req, res) =>
const token = await client.realtime.createTemporaryToken( expires_in: 300 );
res.json( token );
);
app.post("/translate", async (req, res) =>
const text, target_lang = req.body;
const translation = await translator.translateText(text, "en", target_lang);
res.json( translation );
);
app.listen(port, () =>
console.log(`Listening on port $port`);
);
Front-end development
The frontend consists of an HTML page with text areas for displaying transcripts and translations, and buttons to start and stop recording. The AssemblyAI SDK and RecordRTC library are used for real-time audio recording and transcription.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Voice Recorder with Transcription</title>
<script src="https://cdn.tailwindcss.com"></script>
</head>
<body>
<div class="min-h-screen flex flex-col items-center justify-center bg-gray-100 p-4">
<div class="w-full max-w-6xl bg-white shadow-md rounded-lg p-4 flex flex-col md:flex-row space-y-4 md:space-y-0 md:space-x-4">
<div class="flex-1">
<label for="transcript" class="block text-sm font-medium text-gray-700">Transcript</label>
<textarea id="transcript" rows="20" class="mt-1 block w-full p-2 border border-gray-300 rounded-md shadow-sm"></textarea>
</div>
<div class="flex-1">
<label for="translation" class="block text-sm font-medium text-gray-700">Translation</label>
<select id="translation-language" class="mt-1 block w-full p-2 border border-gray-300 rounded-md shadow-sm">
<option value="es">Spanish</option>
<option value="fr">French</option>
<option value="de">German</option>
<option value="zh">Chinese</option>
</select>
<textarea id="translation" rows="18" class="mt-1 block w-full p-2 border border-gray-300 rounded-md shadow-sm"></textarea>
</div>
</div>
<button id="record-button" class="mt-4 px-6 py-2 bg-blue-500 text-white rounded-md shadow">Record</button>
</div>
<script src="https://www.unpkg.com/assemblyai@latest/dist/assemblyai.umd.min.js"></script>
<script src="https://www.WebRTC-Experiment.com/RecordRTC.js"></script>
<script src="main.js"></script>
</body>
</html>
Real-time transcription and translation
The main.js file handles audio recording, transcription, and translation. The AssemblyAI real-time transcription service processes the audio, and the DeepL API translates the final transcription into the language of your choice.
const recordBtn = document.getElementById("record-button");
const transcript = document.getElementById("transcript");
const translationLanguage = document.getElementById("translation-language");
const translation = document.getElementById("translation");
let isRecording = false;
let recorder;
let rt;
const run = async () => {
if (isRecording)
if (rt)
await rt.close(false);
rt = null;
if (recorder)
recorder.stopRecording();
recorder = null;
recordBtn.innerText = "Record";
transcript.innerText = "";
translation.innerText = "";
else {
recordBtn.innerText = "Loading...";
const response = await fetch("/token");
const data = await response.json();
rt = new assemblyai.RealtimeService( token: data.token );
const texts = ;
let translatedText = "";
rt.on("transcript", async (message) =>
let msg = "";
texts(message.audio_start) = message.text;
const keys = Object.keys(texts);
keys.sort((a, b) => a - b);
for (const key of keys)
if (texts(key))
msg += ` $texts(key)`;
transcript.innerText = msg;
if (message.message_type === "FinalTranscript")
const response = await fetch("/translate",
method: "POST",
headers:
"Content-Type": "application/json",
,
body: JSON.stringify(
text: message.text,
target_lang: translationLanguage.value,
),
);
const data = await response.json();
translatedText += ` $data.translation.text`;
translation.innerText = translatedText;
);
rt.on("error", async (error) =>
console.error(error);
await rt.close();
);
rt.on("close", (event) =>
console.log(event);
rt = null;
);
await rt.connect();
navigator.mediaDevices
.getUserMedia( audio: true )
.then((stream) =>
recorder = new RecordRTC(stream,
type: "audio",
mimeType: "audio/webm;codecs=pcm",
recorderType: StereoAudioRecorder,
timeSlice: 250,
desiredSampRate: 16000,
numberOfAudioChannels: 1,
bufferSize: 16384,
audioBitsPerSecond: 128000,
ondataavailable: async (blob) =>
if (rt)
rt.sendAudio(await blob.arrayBuffer());
,
);
recorder.startRecording();
recordBtn.innerText = "Stop Recording";
)
.catch((err) => console.error(err));
}
isRecording = !isRecording;
};
recordBtn.addEventListener("click", () =>
run();
);
conclusion
This tutorial shows how to build a real-time language translation service using AssemblyAI and DeepL in JavaScript. These tools can significantly improve user communication and accessibility in a variety of linguistic contexts. For more detailed instructions, visit the original AssemblyAI tutorial.
Image source: Shutterstock