Google finally opened its Cloud Speech API to the public, allowing third-party developers to convert audio-to-text within their own applications. Since March 2016, the web giant has provided access to the limited preview version through its developer website. Although it’s still in beta, Google’s speech recognition technology promises to offer developers an improved experience that’s better than current speech recognition providers.
Here are some key features of the new technology that you can use to power your own applications.
Accurate, automatic voice-to-text conversion
Google’s voice recognition tool is powered by an in-depth neural network; the same machine learning technology behind Google Voice search in the Google app and voice typing capabilities in Google Keyboard.
If you’ve used Google Voice search, you’re already aware of its high accuracy and quick performance. With Cloud Speech API, developers can take advantage of the same technology that powers Google’s products to translate spoken words into text automatically, and integrate support for voice-based commands.
Interestingly, the tool becomes more accurate over time as new speech samples are collected and the number of users increase. With larger data sets, new terms will be learned and introduced to the infrastructure.
Cloud Speech API can handle more than 80 languages, with some regional variants baked in. Since Google’s API covers the highest number of languages in the industry, it’s a vital tool when developing apps with speech recognition capabilities for a global audience. Furthermore, the speech-to-text API allows developers to set parameters to filter out unsuitable content in text results for some languages, enabling users to get the best value from the service.
The Google cloud speech API can be used even in loud environments, without the need for additional sophisticated signal processing or noise cancellations. With the service, users can effectively transcribe spoken text into written text, regardless of the environment.
Real-time and batch speech recognition
The speech-to-text API captures audio from an app’s microphone or in pre-recorded audio files. For batch processing, Google provides an integrated API, which allows audio files to be uploaded in the request or integrated with its cloud storage solutions, and the best-matching text will be returned. Different types of file formats are allowed, including Linear-16, AMR, FLAC, and PCMU.
For real-time streaming, the user’s audio input is directed to Google’s servers where it is converted into text. Consequently, it is streamed back to the API in real-time with no waiting times. As such, speech translation to text can be done even when the user is still speaking.
Google’s speech-to-text API can be tailored to meet the specific needs of app users. Using this feature, developers can add a set of custom words and phrases that users tend to speak, thereby improving the versatility of their apps. This way, word hints can be included with every API call to provide context-specific recognition.
Compatible across any device
Once integrated into an application, the cloud speech API tool works well across any device that can send a REST or gRPC request. Some of the devices include mobile phones, computers, tablets, and IoT products (such as cars, appliances, and speakers).
Google offers competitive, low-cost pricing tiers for its speech recognition software. When it launched, the API was available for free—a strategy that was intended to attract developers. However, since August 2016, the service is only free for the initial 60 minutes. For any monthly usage above that, it is priced at $0.006 for every 15 seconds of processed audio.
By opening its voice-to-text API for developers, Google has brought a major industry disruption. Evidently, due to its immense exciting features, the other speech recognition services pale in comparison to Google’s powerful voice technology. Ultimately, Google’s cloud speech API may dominate the industry in the next few years.