Async Launches Open Benchmark Revealing Critical Text-to-Speech Accuracy Gap in Production Voice Agents

Press Release · via Async · April 27 2026

Async, the AI-powered content creation platform formerly known as Podcastle, has released an open benchmark measuring how accurately commercial streaming TTS (text-to-speech) systems pronounce non-standard text. This includes dates, currencies, phone numbers, and similar content under real production conditions.

The voice AI market crossed $22 billion globally in 2026, with the voice AI agents segment alone projected to reach $47.5 billion by 2034. As real-time voice agents move from experimentation to production across industries, these limitations become increasingly visible. The mispronunciation of a payment confirmation or a callback number erodes trust and creates compliance risk. The results reveal significant failure rates across leading providers when tested through their streaming interfaces without preprocessing.

“Normalization failures in streaming TTS produce no signal in standard monitoring – latency is fine, the audio is valid, nothing errors out,” said Arto Yeritsyan, CEO of Async. “A model update that improves voice quality but breaks phone number handling looks like a win on every dashboard. Teams need to measure this explicitly before it reaches users.”

The evaluation covers 1,000+ sentences with 2,200+ non-standard words (e.g., “03/15/2024” or “$4.99”) across 31 categories. Every audio sample was generated through each provider’s streaming API with no text preprocessing applied, under the same conditions used in production voice agents.

Judging was performed by Gemini 2.5 Pro with category-specific rubrics, validated at over 90% agreement with human annotators. The full dataset, audio samples, and evaluation methodology are publicly available.

Async Flash v1.0 achieved the highest performance across both unit-level and sentence-level accuracy among the evaluated models.

The full benchmark dataset, audio samples, and evaluation methodology are publicly available on Hugging Face. Async invites the research community and TTS providers to examine the data and submit additional models for evaluation as the benchmark expands to new categories and languages.

About Async

Async (formerly Podcastle) empowers creators and teams by radically simplifying the end-to-end content creation process. The streamlined suite of AI-powered tools enables users to record, edit, transcribe, and publish audio & video content with unmatched simplicity. Backed by Tier-1 investors, including Andrew Ng’s AI Fund, Mosaic Ventures, RTP Global, Point Nine, and Sierra Ventures, the company has raised a total of $23.5M in funding to date. In 2024, the company closed a $13.5M Series A to scale its AI content creation platform.

From the Podnews directory

This is a press release which we link to from Podnews, our daily newsletter about podcasting and on-demand. We may make small edits for editorial reasons.

The latest...

Loading
.
.
.

Async Launches Open Benchmark Revealing Critical Text-to-Speech Accuracy Gap in Production Voice Agents

About Async

The latest...

Get a global view on podcasting and on-demand with our daily news briefing