A Brief History of Voice Recognition

If you are too busy to make that call to book your salon or restaurant, Google’s new updated virtual assistant just might be able to help you out. This sophisticated technology can make complicated calls, complete with hesitated, ‘mmhmms’ and ‘errs’.

Google made the announcement at Google I/O, which is a yearly event held since 2008 to share new tools and strategies with creators of products that work with Google software and hardware.

Looking into the history of voice recognition actually takes us back to 1952, when Bell Laboratories created Audrey, who was able to understand digits, but only from one voice. In 1962, IBM came out with Shoebox, which apart from understanding the digits from 0 through 9, could decipher 16 spoken words.

In the 1970s, Carnegie Mellon University created the Harpy speech-understanding system. With the vocabulary of an average three-year old, Harpy could understand around 1,011 words. It was also designed with a better search approach, which led to looking at searches by including an entire sentence rather than just a word, in other words, the importance of context.


Google Play to have a `Made in India` section


The 1980s saw some advances, but the poor processing power of computers still held back any major breakthroughs, for example, Bell increased the number of voices that could be recognized, but that was it. Only words spoken slowly and one-at-a-time were understood.

In the 1990s, Dragon Systems came out with DragonDictate, which came at a steep price of $9000. While it used ‘discrete speech’, where the user must pause between speaking each word, an improved version, called Dragon Naturally Speaking, was introduced after seven years. At a 100 words per minute, it was able to recognize words faster, but still needed some training, and the cost, now at $695, was still steep.

We can thank the 2000s, because that is when Google put in some priority research into voice recognition, and when a Stanford Research Institute spin-off, led by Dag Kittlaus, Adam Cheyer and Tom Gruber, began to show results, Apple bought it in 2010, and Siri was born. As the voice recognition built-in to Apple products, Siri has propelled our generation into voice-based web search.

Dag Kittlaus, Adam Cheyer and Tom Gruber, in their turn, founded Viv Labs, and have designed a product called ‘the Global Brain’, which is working towards something similar to what Google has now delivered.

According to Grand View Research, a US consultancy, the global market for voice recognition will reach $127.58 billion by 2024. While the tech giants Google, Apple, and Microsoft continue introducing voice-based products, keyboards, switches, and buttons might soon disappear, making interaction with machines seamless. Apart from Google Home’s Assistant which helps control appliances, Amazon Echo has Alexa, who helps users shop from home.

Listening, processing, and decoding language, are some of the highest functions performed by the human brain, and getting AI to do all three has been a long drawn challenge, as well as a learning process. Every language comes with dialects, accents, and pronunciations. Moreover, background noise, context, multiple voices, together, further complicate things.

Though the technology of voice recognition is advancing, invading spaces, such as smart devices, cars, and homes, it is not prevalent when it comes to using it. For example, only a third of smartphone users use speech to get about. However, with the advent of Google’s new product, this could change. Especially, when it comes to securing online identification, biometric measures or passwords might become the safest way. However, considering user data breach is still a current topic, voice recognition software can raise questions for user privacy, big data, and misinterpretation.

On the home front, India, with its myriad languages and dialects, might be much slower on the uptake, when it comes to adopting voice recognition software into daily life. However, efforts are being made towards this end. Four years ago, IITians, Subodh Kumar, Kishore Mundra, and Sanjeev Kumar started a speech-recognition software company called Liv.ai. The company supports 10 regional languages and is working on adding more. Its speech-recognition application program interface is being used by 500 business-to-business (B2B) and business-to-consumer (B2C) developers.

Navanwita Bora Sachdev

Navanwita is the editor of The Tech Panda who also frequently publishes stories in news outlets such as The Indian Express, Entrepreneur India, and The Business Standard

Recent Posts

Navigating the future of semiconductors: The critical role of IP management in advancing chiplet technology

As technology evolves, industries are seeking more efficient, high-performance semiconductor solutions to power the next…

23 hours ago

KIP unveils first truly autonomous self-learning Superior AI Agents

KIP Protocol, the Web3 Base Layer for AI, launched what are called Superior AI Agents, the…

1 day ago

X outage: “X remains one of the most talked about platforms making it a typical target for hackers marking their own territory”

Yesterday, Elon Musk's social media site X experienced three separate outages, which Musk attributed to…

2 days ago

New tech on the block: Streaming, Gen AI, telecom, Fintech, EdTech, security, biotech & 3D printing

The Tech Panda takes a look at recent tech launches. Streaming: BSNL launches BiTV for…

3 days ago

India’s first 3D-Printed G+1 Villa at Godrej Eden Estate, Pune

Godrej Properties Ltd. (GPL), Indian real estate developers, unveiled India’s first 3D-printed G+1 villa at…

3 days ago

Staqu unveils Jarvis GPT: A powerful fusion of Vision AI & LLM for smart retail analytics

Combining advanced video analytics with the power of large language models, Jarvis GPT delivers real-time…

3 days ago