STT/ASR Competitive Analysis

Part
01
of four
Part
01

STT/ASR Competitive Analysis: Google & IBM

Google's Cloud Speech-to-Text API is available in 120 languages as a cloud service, and as an application, while IBM's Watson STT is a cloud service available in 17 languages. The details of each product were added to the attached spreadsheet and a summary of our findings below.

Google

  • Google's Cloud Speech-to-Text API is available in 120 languages and different variants for free between 0 and 60 minutes and for a price between $0.004 and $0.009 per 15 seconds for speech-to-text recognition above 60 minutes.
  • While the product was created as a cloud service, it is also available for desktops, laptops, phones, or tablets.
  • Additionally, it allows users to customize up to 5,000 words, phrases, and other specific business words like years, addresses, currencies, conversions, etc.

IBM

Part
02
of four
Part
02

STT/ASR Competitive Analysis: Microsoft & Amzon's Web Services

Microsoft offers Cognitive, an STT service available for devices and cloud in 40 languages from $1 per hour to $2.10 per hour. Amazon's Transcribe provides cloud STT and ASR services in 31 languages from $0.006 per 10 seconds to $6.75 for 90 minutes. The details of our research were added to the attached spreadsheet and a summary of our findings below.

Microsoft

  • Microsoft Azure's STT service, Cognitive, provides multiple speech services, including STT in 40 languages and variations on the cloud and on devices.
  • The users get 5 hours of STT for free every month on the standard and custom models.
  • After the 5 hours, the price is $1 per audio hour on Standard, $1.40 per hour on custom model adding $0.0538 per hour per additional customization, and $2.10 per hour per conversation or multichannel audio transcriptions.
  • The service allows to customize models and languages, as shown in the attached example video.

Amazon

Part
03
of four
Part
03

STT/ASR Competitive Analysis

Two additional companies that offer Speech-to-Text or Automatic Speech Recognition products are Huawei and Speechmatics. The names of the companies were added to the attached spreadsheet under rows 6 and 7. Find an overview of the products below.

1. Huawei

2. Speechmatics

  • Speechmatics offers different speech recognition services, including ASR and STT, for audio and video.
  • Their coverage is global, with a wide list of languages supported and transcription results in minutes.
  • Its error rate is the lowest in comparison with competitors like Google and Kaldi.
Part
04
of four
Part
04

STT/ASR Competitive Analysis: Rows 6 and 7

Huawei Cloud is an ASR service available in four languages, from US$549.17 per 20 million calls to US$2,288.19 for 100 million calls. Speechmatics provides SST and ASR services in 74 languages, for $2.4 to $3.6 per hour of audio. The details of our research were added to the attached spreadsheet and a summary of our findings below.

Huawei

Speechmatics

Sources
Sources