Humyn Labs, based in Bengaluru, put out a detailed benchmarking report on May 11, 2026. They call it BRIDGE. The team tested 15 commercial speech-to-text models in 22 non-English languages from the Global South. They didn’t just look at OpenAI and Google. Indian providers like Sarvam AI were in the mix too. The results? Big accuracy gaps, especially when these systems try to handle regional dialects or mixed-language speech.
Key Takeaways
- BRIDGE looks at 15 AI models. It covers 22 languages and 22 Indian states. They used seven different metrics.
- A lot of popular AI tools miss about one in three words when they process Indian language audio. That’s a pretty big gap.
- ElevenLabs Scribe v2 leads in overall word accuracy with a 10.6% error rate.
- Sarvam AI, an Indian provider, beat Google and Microsoft on word error rate for Indic languages.
- Standard industry metrics often miss errors in Hinglish or code-switched speech. That’s a problem if you mix languages a lot, which, let’s be honest, most people do.

BRIDGE stands for Benchmark of Regional & International Data for Global Evaluation. The report moves away from lab tests and focuses on real-world use. Humyn Labs used audio from real two-person conversations, not just scripts or internet samples. This approach makes sense, since about 5.5 billion people in the Global South often mix their native languages with English when they talk.
One big takeaway: Word Error Rate (WER) just isn’t enough. WER was built for English, so it misses a lot when it comes to Indian speech. BRIDGE uses seven metrics instead, including Semantic Similarity and Code-Switch F1. That last one matters for India. It checks if an AI can handle switching languages in the middle of a sentence.
The numbers show a clear gap. Deepgram Nova-3 did well in code-switching, scoring 0.906. Amazon Transcribe lagged far behind at 0.199. OpenAI models didn’t do much better, staying below 0.4. Sometimes they dropped English words or got the spelling wrong.
Indian providers held their own. Sarvam AI’s saaras v3 came in third for word error rate at 20.2%. That’s better than Google, Microsoft, or Amazon. Still, its code-switching isn’t perfect. Even the best models struggle when there’s a lot of English mixed in.
Manish Agarwal, co-founder at Humyn Labs, pointed out that companies spend a lot based on accuracy scores that don’t match how people really talk. The data backs this up. A model that’s great for Spanish might not work for Tamil or Marathi. If you want to dig in, the full dataset is up on Hugging Face for researchers.
FAQs
Q1. What is the BRIDGE report by Humyn Labs?
A1. BRIDGE is an independent benchmarking study that evaluates how accurately AI voice models transcribe 22 different languages, primarily from India and the Global South.
Q2. Which AI models were tested in this benchmarking study?
A2. The study tested 15 models, including OpenAI GPT-4o, Gemini 2.5 Flash, ElevenLabs Scribe v2, Deepgram Nova-3, and Indian models like Sarvam saaras v3 and Gnani vachana v3.
Q3. Why do standard AI metrics fail for Indian languages?
A3. Standard metrics like Word Error Rate (WER) were built for English. They often fail to catch errors in code-switching, where speakers mix Indian languages with English, or they miss the preservation of meaning.
Q4. Which model had the best overall accuracy for Indian languages?
A4. ElevenLabs Scribe v2 recorded the lowest overall word error rate at 10.6%, while Deepgram Nova-3 performed best in handling code-switched speech.
Q5. Where can researchers access the BRIDGE data?
A5. The complete dataset is hosted on Hugging Face, and the full report is accessible at the Humyn Labs website.


