A Brief History of Voice Recognition

If you are too busy to make that call to book your salon or restaurant, Google’s new updated virtual assistant just might be able to help you out. This sophisticated technology can make complicated calls, complete with hesitated, ‘mmhmms’ and ‘errs’.

Google made the announcement at Google I/O, which is a yearly event held since 2008 to share new tools and strategies with creators of products that work with Google software and hardware.

Looking into the history of voice recognition actually takes us back to 1952, when Bell Laboratories created Audrey, who was able to understand digits, but only from one voice. In 1962, IBM came out with Shoebox, which apart from understanding the digits from 0 through 9, could decipher 16 spoken words.

In the 1970s, Carnegie Mellon University created the Harpy speech-understanding system. With the vocabulary of an average three-year old, Harpy could understand around 1,011 words. It was also designed with a better search approach, which led to looking at searches by including an entire sentence rather than just a word, in other words, the importance of context.

Google Play to have a `Made in India` section

The 1980s saw some advances, but the poor processing power of computers still held back any major breakthroughs, for example, Bell increased the number of voices that could be recognized, but that was it. Only words spoken slowly and one-at-a-time were understood.

In the 1990s, Dragon Systems came out with DragonDictate, which came at a steep price of $9000. While it used ‘discrete speech’, where the user must pause between speaking each word, an improved version, called Dragon Naturally Speaking, was introduced after seven years. At a 100 words per minute, it was able to recognize words faster, but still needed some training, and the cost, now at $695, was still steep.

We can thank the 2000s, because that is when Google put in some priority research into voice recognition, and when a Stanford Research Institute spin-off, led by Dag Kittlaus, Adam Cheyer and Tom Gruber, began to show results, Apple bought it in 2010, and Siri was born. As the voice recognition built-in to Apple products, Siri has propelled our generation into voice-based web search.

Dag Kittlaus, Adam Cheyer and Tom Gruber, in their turn, founded Viv Labs, and have designed a product called ‘the Global Brain’, which is working towards something similar to what Google has now delivered.

According to Grand View Research, a US consultancy, the global market for voice recognition will reach $127.58 billion by 2024. While the tech giants Google, Apple, and Microsoft continue introducing voice-based products, keyboards, switches, and buttons might soon disappear, making interaction with machines seamless. Apart from Google Home’s Assistant which helps control appliances, Amazon Echo has Alexa, who helps users shop from home.

Listening, processing, and decoding language, are some of the highest functions performed by the human brain, and getting AI to do all three has been a long drawn challenge, as well as a learning process. Every language comes with dialects, accents, and pronunciations. Moreover, background noise, context, multiple voices, together, further complicate things.

Though the technology of voice recognition is advancing, invading spaces, such as smart devices, cars, and homes, it is not prevalent when it comes to using it. For example, only a third of smartphone users use speech to get about. However, with the advent of Google’s new product, this could change. Especially, when it comes to securing online identification, biometric measures or passwords might become the safest way. However, considering user data breach is still a current topic, voice recognition software can raise questions for user privacy, big data, and misinterpretation.

On the home front, India, with its myriad languages and dialects, might be much slower on the uptake, when it comes to adopting voice recognition software into daily life. However, efforts are being made towards this end. Four years ago, IITians, Subodh Kumar, Kishore Mundra, and Sanjeev Kumar started a speech-recognition software company called Liv.ai. The company supports 10 regional languages and is working on adding more. Its speech-recognition application program interface is being used by 500 business-to-business (B2B) and business-to-consumer (B2C) developers.

Navanwita Bora Sachdev

Navanwita is the editor of The Tech Panda who also frequently publishes stories in news outlets such as The Indian Express, Entrepreneur India, and The Business Standard