How to build a voice-to-text Discord bot with Gladia real-time transcription API

Published on Sep 21, 2023
How to build a voice-to-text Discord bot with Gladia real-time transcription API

Discord, the leading communication platform for gamers and communities, is designed for seamless communication with other users, be it through text channels, DMs, 1-1 calls or even collective voice channels.

Based on multiple request from our Discord members, we’ve built a custom JavaScript bot that makes use of Gladia’s live transcription API to transcribe speech in real time directly on the Discord server.

What can you do with Discord bot?

First, you can transcribe voice in real time directly on Discord’s voice channels. Ex. you’re streaming a game on Discord and want to access some learnings and tips received during the sessions. Or, you’re having your group gathers on the platform and want to be able to review the talking points after – just like with any other virtual meeting platform. 

Beyond that, a bot like this could be used for real-time moderation to flag hate speech and ban users. With additional tools like ChatGPT, you could also create command-based notes to provide meeting summaries and helps you catch up with meetings you may have missed.

Screenshot of a transcription bot on Discord

How to implement the Discord.js v14 bot + Gladia real-time transcription

Step 1: Register your bot

Create a Discord bot that you'd like to use for transcription. If you’ve never  built one before, here’s a useful resource to help.

First, install all the required package by running:


npm install

Then, you will to setup the index.js script with your Discord keys, guild ID (Server ID), and the Voice Channel ID.

Step 2: Retrieve API key

Sign up for our speech-to-text API at app.gladia.io and obtain your API key. Documentation for Gladia live transcription can be found here.

Step 3:  Code integration

Once everything is set up properly, simply run:


npm run start YOUR_GLADIA_TOKEN

Your bot should then join the channel corresponding to the channel ID you configured in the index.js file.

 Step 4: Configure Discord permissions

  • Make sure your bot is invited on the server;
  • Give the bot the required voice permissions.

Bear in mind that the current v1 implementation of the bot is not fully optimized, so you might experiences inaccuracy regarding language changes & words.

And you’re good to go!

🔗 Source GitHub repository is available here.

We hope you enjoyed this short tutorial. Given how much audio data still goes to wasted, we’re always curious to explore the many ways in which transcription tech can be used to remedy that. Let us know if you went on to build a bot or used our API for others apps on Discord or beyond, we’d love to hear from you. 

About Gladia

At Gladia, we built an optimized version of Whisper in the form of an API, adapted to real-life use cases and distinguished by exceptional accuracy, speed, extended multilingual capabilities and state-of-the-art features, including speaker diarization and word-level timestamps.

Contact us

280
Your request has been registered
A problem occurred while submitting the form.

Read more

Speech-To-Text

STT API Benchmarks: How to measure accuracy, latency, and real-world Performance

Every product that depends on voice input lives or dies by its speech-to-text performance. Whether you're enriching CRM data from support calls, powering live captions in meetings, or triggering downstream actions via LLMs, transcription accuracy and speed aren’t just nice-to-haves. They’re essential to product functionality. If your STT engine stalls on latency or mistranscribes a customer’s request, it can break automations, derail user experiences, and create costly manual work downstream.

Speech-To-Text

New: Buyer's Guide to Speech-to-Text APIs

As the landscape of speech-to-text APIs continues to evolve—with growing demands around latency, language support, and compliance—it’s more important than ever to ensure that your setup aligns with your product’s direction.

Product News

Gladia and Pipecat partner to push the boundaries of real-time voice AI

We’re thrilled to announce a strategic partnership between Gladia and Daily, the team behind Pipecat, aimed at revolutionizing real-time conversational AI. This collaboration combines our cutting-edge audio intelligence capabilities with their flexible 100% open-source framework, empowering developers to create more dynamic, multilingual, and context-aware voice AI applications.

Read more