Introducing Solaria, the first truly universal speech-to-text model
Published on April 2, 2025
Voice is the most natural way we communicate. As AI continues to redefine the way businesses interact with customers, the ability to accurately and instantly transcribe speech across languages is no longer a luxury—it’s a necessity. Enter Solaria, the breakthrough speech-to-text model designed to power the next era of global AI-driven conversations.
Whether you’re developing industry-specific voice agents or delivering high-performance customer experiences, Solaria provides the foundation you need —withunmatched language coveragewithout compromising on quality or speed.
With best-in-class real-time transcription at an industry-leading 94% WAR (Word Accuracy Rate) in English and other common languages, exclusive support of 100+ languages, 42 of which are unique to Gladia, and ultra-low 270 ms latency, Solaria is setting the new standard for AI-driven voice interactions.
Solaria at a glance
Leading accuracy in common languages
In industries where every word carries weight—whether it’s customer support, financial transactions, or legal transcriptions—precision is key. Solaria is engineered to capture even the most nuanced speech patterns with human-level accuracy, making it the go-to model for enterprises that demand nothing less than excellence.
We deliver industry-leading accuracy of transcription across English, Spanish, and other widely spoken languages while outperforming competitors in complex scenarios such as noisy environments, accented speech, and domain-specific terminology.
To ensure results reflect real-world performance, we benchmarked Solaria using public datasets like Mozilla’s Common Voice and Google’s FLEURS—both designed to challenge STT models with diverse accents, dialects, and audio conditions.
While many providers test only on Common Voice version 16, we evaluated across multiple versions to avoid tuning our model to a single release. We also ran tests using anonymized enterprise datasets from contact center environments, measuring Solaria against each competitor’s most advanced model—beating Deepgram’s English-only Nova-3 with our multilingual model.
Key performance metrics in accuracy and latency when compared to competition
Low-latency for seamless communication
In voice AI, latency doesn’t just impact performance—it defines the experience. Whether you're building real-time voice assistants or multilingual support bots, responsiveness is what makes interactions feel natural. Solaria delivers ultra-low latency across the board, enabling smooth, human-like conversations.
To understand how we measure latency and ensure our API delivers the best performance in real-life, enterprise environments, it's useful to make a distinction between two key metrics: latency on interrupt and latency on final, which we break down below.
Latency on Interrupt (aka 'Time to First Byte')
When a user starts speaking, how long does it take for the AI to begin responding? This is known as latency on interrupt, or Time to First Byte (TTFB)—and it’s one of the most important benchmarks in voice AI. When companies claim their voice tech is “more responsive” or “more natural,” this is the metric they’re pointing to.
Imagine speaking to a voice assistant like Siri and interrupting mid-sentence. The delay before the system reacts? That’s interruption latency. The faster the response, the more human the interaction feels.
With an average response time of 270 milliseconds, Solaria positions itself among the most responsive speech-to-text models available today—delivering fluid, real-time interactions that feel intuitive and immediate, while making sure to strike the right balance between latency and accuracy of transcription across languages.
Latency on Final
Equally important is latency on final—how quickly the system delivers a complete transcript once the user finishes speaking. This determines when your downstream AI (like an LLM) can start processing the request.
Latency on Final among leading STT providers (lower is better)
Solaria delivers final transcripts in just ~698ms, outperforming competitors by more than half a second. That speed can significantly accelerate AI response times, making your overall system feel faster, smarter, and more responsive.
We speak the languages they don’t
Expanding globally means more than just supporting a handful of dominant languages. It requires an AI model that understands the full spectrum of linguistic diversity and can deliver native-level recognition no matter where it’s deployed.
Solaria is the only speech AI model offering native-level accuracy across 100 languages, including 42 that are completely unsupported by competitors. This includes widely spoken but underserved languages such as:
Critical business regions: Hebrew, Pashto, Kazakh, Georgian, Mongolian.
Emerging voice AI frontiers: Haitian Creole, Maori, Javanese, Malagasy.
Being truly multilingual means more than just transcription. Solaria enables real-time code-switching, allowing users to shift naturally between languages—a must for global customer interactions. It also supports real-time translation across all supported languages, helping teams eliminate communication barriers and connect with users anywhere.
Enterprise-grade adaptability & customization
Precision in speech AI isn’t just about general accuracy—it’s about being accurate where it matters the most for your business. Solaria delivers best-in-class custom vocabulary andnamed entity recognition (NER) for real-time applications, allowing platforms to:
Custom vocabulary training From medical diagnoses to financial terms, Solaria can be trained to recognize industry-specific jargon, ensuring specialized language is captured with precision.
Brand and product name adaptation Unlike generic speech models, Solaria-1 recognizes and accurately transcribes company names, product terms, and acronyms, reducing transcription errors in brand-sensitive environments.
Key data extraction The model can identify and extract phone numbers, email addresses, and postal addresses, allowing businesses to automate and streamline workflows in a way that traditional speech recognition tools cannot.
Fine-tuned language sensitivity Solaria minimizes false positives and misinterpretations by adjusting sensitivity settings per language, ensuring that technical terms, slang, and colloquialisms are properly understood.
Whether handling legal transcriptions, technical support calls, or AI-driven sales conversations, Solaria ensures every interaction is smarter, faster, and more precise.
Designed for global scalability
Large-scale voice AI applications require infrastructure that’s not just powerful, but scalable and reliable. Solaria is built to handle enterprise-level deployments with ease, offering:
Robust multi-region infrastructure: Built for stability, with dedicated deployment options in both US & EU-based regions to meet diverse operational needs.
Future-proof AI: Constantly evolving, Solaria is designed to integrate seamlessly with next-generation voice technologies, keeping businesses ahead of the curve.
Enterprise-grade data security: In time for the release, we're now fully compliant with GDPR, HIPAA, and SOC 2.
Unlock the next frontier of AI communication
Voice AI is transforming industries, redefining customer interactions, and creating new possibilities for automation. Whether you're building multilingual voice agents or embedding speech understanding into your product, Solaria gives you a competitive edge with unmatched speed, accuracy, and language reach.
For developers, Solaria offers plug-and-play APIs, real-time performance, and support for advanced customization—so you can build fast, scale effortlessly, and ship smarter.
For decision-makers, Solaria delivers enterprise-grade accuracy, global reach, and the infrastructure to support mission-critical applications—now and in the future.
Bonus feature: The story behind Solaria—or the sci-fi roots of our mission
Gladia is a character from Isaac Asimov’s novel The Naked Sun, a story set on the planet Solaria—a world defined by solitude, advanced technology, and the tension between human and robotic interaction. Gladia, the character, stands out as someone striving to bridge emotional and social gaps, breaking away from the norms of her society to foster deeper human connections.
These qualities that distinguish Gladia—empathy, inclusivity, and ethical consideration—are at the core of what we, as a company, aim to bring into our products and culture.
With the recent release of Solaria, a model designed to bridge linguistic divides across global markets, the company’s story has come full circle. The sci-fi inspiration that helped shape its visual identity continues to influence our mission: building tools that connect people, across languages, borders—and perhaps one day, even beyond.
The team at Gladia remains proud to carry this vision forward, and hope you join us on this cosmic journey to push the boundaries of innovation together.
Gladia Solaria, catching voice across the universe
Contact us
Your request has been registered
A problem occurred while submitting the form.
Read more
Product News
Introducing Solaria, the first truly universal speech-to-text model
Voice is the most natural way we communicate. As AI continues to redefine the way businesses interact with customers, the ability to accurately and instantly transcribe speech across languages is no longer a luxury—it’s a necessity. Enter Solaria, the breakthrough speech-to-text model designed to power the next era of global AI-driven conversations.
Gladia x pyannoteAI: Speaker diarization and the future of voice AI
Speaker recognition is advancing rapidly. Beyond merely capturing what is said, it reveals who is speaking and how they communicate, paving the way for more advanced communication platforms and assistant apps
2025 marks a significant shift in AI-driven automation with the emergence of Agentic AI—intelligent, autonomous systems capable of reasoning, goal-setting, and adaptive decision-making.