STT Data Sets
Lionbridge AI, Appen, Clickworker, Samasource, and DefinedCrowd are some of the companies that sell speech-to-text (STT), text-to-speech (TTS), or speech recognition data sets in French and German native languages. The findings below have also been entered in rows 1 to 3, columns B to F of the attached spreadsheet.
- Lionbridge AI supplies leading companies with audio training data for multilingual text-to-speech solutions in over 300 languages, including French and German.
- Appen provides licensable and fully transcribed speech and language data sets in over 180 languages, including French and German, for text-to-speech and voice recognition systems.
- Clickworker provides data sets for machine learning and artificial intelligence training, including human-generated audio training data in over 30 languages for speech recognition systems.
- Samasource provides training data and validation for AI technologies, including natural language processing, of the 25 percent of the Fortune 50. To deliver high-quality training sets for speech recognition and other NLP models, Samasource uses an industry-leading annotation platform called SamaHub.