#speechtotext

waynerad@diasp.org

Persian added to Speechmatics. Speechmatics is an automatic speech recognition software company in Cambridge, England.

"The key to understanding spoken Persian is variety. We want a mix of clean audio, such as audiobooks, and messier audio, like someone shouting next to a loud washing machine. The speech needs to include a range of vocabulary too, including technical language, informal vernacular, and regional-specific words. We try to create a bank of diverse voices that reflect how Persian is heard in the real world -- different contexts, quality of recordings, and accents."

"Capturing such a wide range of voices is significantly helped by our self-supervised approach. When building our bank of speech, we're not only looking for labeled data (i.e. audio recordings that come accompanied by a human-written transcript) but also unlabeled audio, of which there is much more. This opens the pool of audio to be learned from since we're not restricted to perfect datasets of recorded and transcribed Persian -- we can potentially use any spoken Persian."

110 million more voices are now understood

#solidstatelife #ai #speechtotext

waynerad@diasp.org

Dubbing and subtitling still requires humans, says Tom Scott. He says he tried several AI translation tools, as well as just asking a large language model to translate his videos and subtitles, and as of right now, as of 2023, capturing the nuance and meaning of his speech definitely still requires humans. The video has numerous examples of tricky translations and even the ads are part of it. And the video is is subtitled (by humans, not auto-translated by YouTube) in English, French, Hindi, Japanese, Brazilian Portuguese, and (Latin American) Spanish.

Why don't subtitles match dubbing? - Tom Scott

#solidstatelife #ai #speechtotext #machinetranslation

philetmon@diaspora-fr.org

Depuis 15 ans que j'attendais ça, j'ai pu tester le speech to text sous linux. C'est au JDLL que j'ai entendu parler de Vosk sur le stand de OO. Suivi les instructions de ce site: https://www.suramya.com/blog/2022/01/nerd-dictation-a-fantastic-open-source-speech-to-text-software-for-linux/

Attention, marche uniquement sous X11, et chez moi n'est pas très rapide. Mais c'est une super étape. Est-ce que quelqu'un ici utilise ça de manière courante ?

#JDLL #SpeechtoText #libre