Discord, a popular instant messaging and social media platform, is widely favored by online communities, streamers, and gamers. One of its most valuable features is voice channels, which allow members to connect via voice and video. Another important advantage of Discord, especially for developers, is its customization capabilities, allowing them to create bots that add new functionality. According to AssemblyAI, this tutorial will walk you through the process of developing a Discord bot that can join a voice channel, transcribe audio, generate intelligent responses with ChatGPT, and then convert those responses back into voice.
Setting up a bot
To build a Discord bot, you will use Node.js along with third-party services such as AssemblyAI for speech-to-text, OpenAI for intelligent responses, and ElevenLabs for text-to-speech. Knowledge of JavaScript and Node.js, setting up a Node.js project, installing dependencies, and writing basic asynchronous code is assumed.
First, make sure you have Node.js (version 18 or higher) installed and have access to your Discord server with administrator privileges. Create a project directory and initialize your Node.js project.
mkdir discord-voice-bot && cd discord-voice-bot
npm init -y
Install the required dependencies.
npm install discord.js libsodium-wrappers ffmpeg-static @discordjs/opus @discordjs/voice dotenv assemblyai elevenlabs-node openai
Save your API key in the following location: .env
Files for security:
OPENAI_API_KEY=
ASSEMBLYAI_API_KEY=
ELEVENLABS_API_KEY=
DISCORD_TOKEN=
Set up a Discord developer account, create an application, enable the necessary permissions, and save your bot token. .env
File. Add the bot to your server using the generated URL.
Development of Discord Voice Bot feature
The bot joins a voice channel, records audio, transcribes it using AssemblyAI, generates responses using ChatGPT, and then converts those responses into speech using ElevenLabs.
Join the voice channel
To make the bot respond !join
Enter a command, enter the voice channel, and update. index.js
file:
const joinVoiceChannel, VoiceConnectionStatus = require("@discordjs/voice");
client.on(Events.MessageCreate, async (message) => {
if (message.content.toLowerCase() === "!join")
channel = message.member.voice.channel;
if (channel)
const connection = joinVoiceChannel(
channelId: channel.id,
guildId: message.guild.id,
adapterCreator: message.guild.voiceAdapterCreator,
);
connection.on(VoiceConnectionStatus.Ready, () =>
message.reply(`Joined voice channel: $channel.name!`);
listenAndRespond(connection, message);
);
else
message.reply("You need to join a voice channel first!");
});
Audio recording and transcription
Capture audio streams from voice channels and transcribe them using AssemblyAI.
const AssemblyAI = require("assemblyai");
const assemblyAI = new AssemblyAI( apiKey: process.env.ASSEMBLYAI_API_KEY );
const transcriber = assemblyAI.realtime.transcriber( sampleRate: 48000 );
transcriber.on("transcript", (transcript) =>
if (transcript.message_type === "FinalTranscript")
transcription += transcript.text + " ";
);
async function listenAndRespond(connection, message)
const audioStream = connection.receiver.subscribe(message.author.id);
const prism = require("prism-media");
const opusDecoder = new prism.opus.Decoder( rate: 48000, channels: 1 );
audioStream.pipe(opusDecoder).on("data", (chunk) =>
transcriber.sendAudio(chunk);
);
audioStream.on("end", async () =>
await transcriber.close();
const chatGPTResponse = await getChatGPTResponse(transcription);
const audioPath = await convertTextToSpeech(chatGPTResponse);
playAudio(connection, audioPath);
);
Generating responses with ChatGPT
Generate intelligent responses using OpenAI’s GPT-3.5 Turbo model.
const OpenAI = require("openai");
const openai = new OpenAI( apiKey: process.env.OPENAI_API_KEY );
async function getChatGPTResponse(text)
const response = await openai.completions.create(
model: "gpt-3.5-turbo",
prompt: text,
max_tokens: 100,
);
return response.choices(0).text.trim();
Convert text to speech with ElevenLabs
Convert ChatGPT responses to speech using ElevenLabs:
const ElevenLabs = require("elevenlabs-node");
const voice = new ElevenLabs( apiKey: process.env.ELEVENLABS_API_KEY );
async function convertTextToSpeech(text)
const fileName = `$Date.now().mp3`;
const response = await voice.textToSpeech( fileName, textInput: text );
return response.status === "ok" ? fileName : null;
conclusion
This tutorial showed how to build a sophisticated Discord voice bot that uses AssemblyAI to transcribe speech, OpenAI’s GPT-3.5 Turbo model to provide intelligent responses, and ElevenLabs to synthesize speech. This project demonstrates the potential of modern AI and voice technologies to create conversational, accessible, and engaging applications.
Image source: Shutterstock