Baidu wants to build a speech recognition engine that’s 99 percent accurate, a threshold that Andrew Ng, chief scientist at Baidu and founder of Google’s "Google Brain" deep learning project, believes will fundamentally change how humans interact with computers.
Baidu, which opened its Silicon Valley AI Lab in 2014, is hoping to carve out a space for itself as a leader in speech recognition. So far, it’s making impressive headway. The company’s latest speech recognition engine, dubbed Deep Speech 2, uses deep learning to recognise words spoken in English and Mandarin, at times outperforming humans in the latter, according to Baidu.
"We can train this giant neural network that eventually learns to recognise speech on its own as well as a human can, and not spend so much of our time thinking about how words are structured," says Adam. "Instead, [we] can just ask the computer system to learn those things on its own."
The short answer to Baidu’s plan to conquer speech recognition is data — lots of it. Adam says Deep Speech 2 was trained on tens of thousands of hours of audio recordings. Some of it comes from public data, while another portion is from crowdsourcing services, such as Mechanical Turk, Amazon’s marketplace for odd jobs that require human intelligence.

Deep Speech 2 is an example of supervised learning, a type of machine learning that uses labelled training data – such as transcribed audio – to teach a system new skills, like recognising handwritten numbers. Without labelled training data, however, the neural network wouldn’t be able to differentiate right from wrong.
This is an excerpt from an article published on TechInAsia. You can read the full story here
This is an excerpt from an article published on TechInAsia. You can read the full story here