DataSet Analysis

Part
01
of one
Part
01

STT Data Sets

Lionbridge AI, Appen, Clickworker, Samasource, and DefinedCrowd are some of the companies that sell speech-to-text (STT), text-to-speech (TTS), or speech recognition data sets in French and German native languages. The findings below have also been entered in rows 1 to 3, columns B to F of the attached spreadsheet.

Lionbridge AI

Appen

Clickworker

  • Clickworker provides data sets for machine learning and artificial intelligence training, including human-generated audio training data in over 30 languages for speech recognition systems.

Samasource

  • Samasource provides training data and validation for AI technologies, including natural language processing, of the 25 percent of the Fortune 50. To deliver high-quality training sets for speech recognition and other NLP models, Samasource uses an industry-leading annotation platform called SamaHub.

DefinedCrowd

  • DefinedCrowd provides training data for speech technologies in 50 languages and 79 dialects. The company claims to deliver speech data five to ten times faster than its competition.
Sources
Sources