Introducing Solaria, the first truly universal speech-to-text model

Published on April 2, 2025

Voice is the most natural way we communicate. As AI continues to redefine the way businesses interact with customers, the ability to accurately and instantly transcribe speech across languages is no longer a luxury—it’s a necessity. Enter Solaria, the breakthrough speech-to-text model designed to power the next era of global AI-driven conversations.

Whether you’re developing industry-specific voice agents or delivering high-performance customer experiences, Solaria provides the foundation you need —with unmatched language coverage without compromising on quality or speed.

With best-in-class real-time transcription at an industry-leading 94% WAR (Word Accuracy Rate) in English and other common languages, exclusive support of 100+ languages, 42 of which are unique to Gladia, and ultra-low 270 ms latency, Solaria is setting the new standard for AI-driven voice interactions.

Leading accuracy in common languages

In industries where every word carries weight—whether it’s customer support, financial transactions, or legal transcriptions—precision is key. Solaria is engineered to capture even the most nuanced speech patterns with human-level accuracy, making it the go-to model for enterprises that demand nothing less than excellence.

We deliver industry-leading accuracy of transcription across English, Spanish, and other widely spoken languages while outperforming competitors in complex scenarios such as noisy environments, accented speech, and domain-specific terminology.

To ensure results reflect real-world performance, we benchmarked Solaria using public datasets like Mozilla’s Common Voice and Google’s FLEURS—both designed to challenge STT models with diverse accents, dialects, and audio conditions.

While many providers test only on Common Voice version 16, we evaluated across multiple versions to avoid tuning our model to a single release. We also ran tests using anonymized enterprise datasets from contact center environments, measuring Solaria against each competitor’s most advanced model—beating Deepgram’s English-only Nova-3 with our multilingual model.

*Key performance metrics in accuracy and latency when compared to competition*

Low-latency for seamless communication

In voice AI, latency doesn’t just impact performance—it defines the experience. Whether you're building real-time voice assistants or multilingual support bots, responsiveness is what makes interactions feel natural. Solaria delivers ultra-low latency across the board, enabling smooth, human-like conversations.

To understand how we measure latency and ensure our API delivers the best performance in real-life, enterprise environments, it's useful to make a distinction between two key metrics: latency on interrupt and latency on final, which we break down below.

Latency on Interrupt (aka 'Time to First Byte')

When a user starts speaking, how long does it take for the AI to begin responding? This is known as latency on interrupt, or Time to First Byte (TTFB)—and it’s one of the most important benchmarks in voice AI. When companies claim their voice tech is “more responsive” or “more natural,” this is the metric they’re pointing to.

Imagine speaking to a voice assistant like Siri and interrupting mid-sentence. The delay before the system reacts? That’s interruption latency. The faster the response, the more human the interaction feels.

With an average response time of 270 milliseconds, Solaria positions itself among the most responsive speech-to-text models available today—delivering fluid, real-time interactions that feel intuitive and immediate, while making sure to strike the right balance between latency and accuracy of transcription across languages.

Latency on Final

Equally important is latency on final—how quickly the system delivers a complete transcript once the user finishes speaking. This determines when your downstream AI (like an LLM) can start processing the request.

Solaria delivers final transcripts in just ~698ms, outperforming competitors by more than half a second. That speed can significantly accelerate AI response times, making your overall system feel faster, smarter, and more responsive.

We speak the languages they don’t

Expanding globally means more than just supporting a handful of dominant languages. It requires an AI model that understands the full spectrum of linguistic diversity and can deliver native-level recognition no matter where it’s deployed.

Solaria is the only speech AI model offering native-level accuracy across 100 languages, including 42 that are completely unsupported by competitors. This includes widely spoken but underserved languages such as:

High-population markets: Bengali, Punjabi, Tamil, Urdu, Persian, Marathi.
Critical business regions: Hebrew, Pashto, Kazakh, Georgian, Mongolian.
Emerging voice AI frontiers: Haitian Creole, Maori, Javanese, Malagasy.

Being truly multilingual means more than just transcription. Solaria enables real-time code-switching, allowing users to shift naturally between languages—a must for global customer interactions. It also supports real-time translation across all supported languages, helping teams eliminate communication barriers and connect with users anywhere.

Enterprise-grade adaptability & customization

Precision in speech AI isn’t just about general accuracy—it’s about being accurate where it matters the most for your business. Solaria delivers best-in-class custom vocabulary and named entity recognition (NER) for real-time applications, allowing platforms to:

Custom vocabulary training
From medical diagnoses to financial terms, Solaria can be trained to recognize industry-specific jargon, ensuring specialized language is captured with precision.

Brand and product name adaptation
Unlike generic speech models, Solaria-1 recognizes and accurately transcribes company names, product terms, and acronyms, reducing transcription errors in brand-sensitive environments.

Key data extraction
‍The model can identify and extract phone numbers, email addresses, and postal addresses, allowing businesses to automate and streamline workflows in a way that traditional speech recognition tools cannot.
Fine-tuned language sensitivity
‍Solaria minimizes false positives and misinterpretations by adjusting sensitivity settings per language, ensuring that technical terms, slang, and colloquialisms are properly understood.

Whether handling legal transcriptions, technical support calls, or AI-driven sales conversations, Solaria ensures every interaction is smarter, faster, and more precise.

Designed for global scalability

Large-scale voice AI applications require infrastructure that’s not just powerful, but scalable and reliable. Solaria is built to handle enterprise-level deployments with ease, offering:

Robust multi-region infrastructure: Built for stability, with dedicated deployment options in both US & EU-based regions to meet diverse operational needs.
Future-proof AI: Constantly evolving, Solaria is designed to integrate seamlessly with next-generation voice technologies, keeping businesses ahead of the curve. ‍
Enterprise-grade data security: In time for the release, we're now fully compliant with GDPR, HIPAA, and SOC 2.

New partnerships

As part of the Solaria launch, we’re excited to announce partnerships with two leading developer frameworks in the voice agent space: LiveKit and Daily (the team behind Pipecat).

LiveKit is now using our live translation capabilities in AI-driven applications, while Daily has built a demo chatbot powered by Gladia that can switch languages on the fly—try it out for fun! Our API is now natively integrated with both libraries, all powered by Solaria.

Unlock the next frontier of AI communication

Voice AI is transforming industries, redefining customer interactions, and creating new possibilities for automation. Whether you're building multilingual voice agents or embedding speech understanding into your product, Solaria gives you a competitive edge with unmatched speed, accuracy, and language reach.

For developers, Solaria offers plug-and-play APIs, real-time performance, and support for advanced customization—so you can build fast, scale effortlessly, and ship smarter.

For decision-makers, Solaria delivers enterprise-grade accuracy, global reach, and the infrastructure to support mission-critical applications—now and in the future.

Ready to build the future of voice? Start building with Solaria now or book a demo to learn more.

Bonus feature: The story behind Solaria—or the sci-fi roots of our mission

Gladia is a character from Isaac Asimov’s novel The Naked Sun, a story set on the planet Solaria—a world defined by solitude, advanced technology, and the tension between human and robotic interaction. Gladia, the character, stands out as someone striving to bridge emotional and social gaps, breaking away from the norms of her society to foster deeper human connections.

These qualities that distinguish Gladia—empathy, inclusivity, and ethical consideration—are at the core of what we, as a company, aim to bring into our products and culture.

With the recent release of Solaria, a model designed to bridge linguistic divides across global markets, the company’s story has come full circle. The sci-fi inspiration that helped shape its visual identity continues to influence our mission: building tools that connect people, across languages, borders—and perhaps one day, even beyond.

The team at Gladia remains proud to carry this vision forward, and hope you join us on this cosmic journey to push the boundaries of innovation together.

Gladia Solaria, catching voice across the universe (image) — *Gladia Solaria, catching voice across the universe*

Contact us

Your request has been registered

A problem occurred while submitting the form.

How real-time STT empowers multilingual support & unlocks international growth

Businesses expanding globally face an immediate language barrier. Customers want service in their native tongue, but most companies and call center providers don’t have enough multilingual agents to meet that demand.

Speech-To-Text

Live transcription made simple with Twilio, Python & Gladia

Live voice AI is no longer a concept of the future. From customer support to smart IVR (Interactive Voice Response) systems, speech is now transcribed in real time—often before the speaker finishes a sentence.

Product News

Getting started with Gladia: How to build with our STT API features

Whether you’re using Gladia’s speech-to-text (STT) API during a free trial or a long-term integration, you care about one thing: getting accurate, reliable transcriptions that work for your product and users.