Persian added to Speechmatics. Speechmatics is an automatic speech recognition software company in Cambridge, England.

"The key to understanding spoken Persian is variety. We want a mix of clean audio, such as audiobooks, and messier audio, like someone shouting next to a loud washing machine. The speech needs to include a range of vocabulary too, including technical language, informal vernacular, and regional-specific words. We try to create a bank of diverse voices that reflect how Persian is heard in the real world -- different contexts, quality of recordings, and accents."

"Capturing such a wide range of voices is significantly helped by our self-supervised approach. When building our bank of speech, we're not only looking for labeled data (i.e. audio recordings that come accompanied by a human-written transcript) but also unlabeled audio, of which there is much more. This opens the pool of audio to be learned from since we're not restricted to perfect datasets of recorded and transcribed Persian -- we can potentially use any spoken Persian."

110 million more voices are now understood

#solidstatelife #ai #speechtotext

There are no comments yet.