How Baidu's AI Lab plans to solve speech recognition - with lots of data

Deep Speech 2 uses deep learning to recognise words in English and Mandarin, reports Tech in Asia

Eva Xiao | Tech in Asia 

AI, artificial intelligence, robot
Baidu's Xiaodu, an artificial intelligent robot, can respond to voice commands. Photo: Reuters

wants to build a speech recognition engine that’s 99 percent accurate, a threshold that Andrew Ng, chief scientist at and founder of Google’s "Google Brain" deep learning project, believes will fundamentally change how humans interact with computers.

Baidu, which opened its Silicon Valley Lab in 2014, is hoping to carve out a space for itself as a leader in speech recognition. So far, it’s making impressive headway. The company’s latest speech recognition engine, dubbed Deep Speech 2, uses deep learning to recognise words spoken in English and Mandarin, at times outperforming humans in the latter, according to

"We can train this giant neural network that eventually learns to recognise speech on its own as well as a human can, and not spend so much of our time thinking about how words are structured," says Adam. "Instead, [we] can just ask the computer system to learn those things on its own."

The short answer to Baidu’s plan to conquer speech recognition is data — lots of it. Adam says was trained on tens of thousands of hours of audio recordings. Some of it comes from public data, while another portion is from crowdsourcing services, such as Mechanical Turk, Amazon’s marketplace for odd jobs that require human intelligence.

How Baidu's AI Lab plans to solve speech recognition - with lots of data

is an example of supervised learning, a type of machine learning that uses labelled training data – such as transcribed audio – to teach a system new skills, like recognising handwritten numbers. Without labelled training data, however, the neural network wouldn’t be able to differentiate right from wrong.
This is an excerpt from an article published on TechInAsia. You can read the full story here

How Baidu's AI Lab plans to solve speech recognition - with lots of data

Deep Speech 2 uses deep learning to recognise words in English and Mandarin, reports Tech in Asia

Deep Speech 2 uses deep learning to recognise words in English and Mandarin, reports Tech in Asia
wants to build a speech recognition engine that’s 99 percent accurate, a threshold that Andrew Ng, chief scientist at and founder of Google’s "Google Brain" deep learning project, believes will fundamentally change how humans interact with computers.

Baidu, which opened its Silicon Valley Lab in 2014, is hoping to carve out a space for itself as a leader in speech recognition. So far, it’s making impressive headway. The company’s latest speech recognition engine, dubbed Deep Speech 2, uses deep learning to recognise words spoken in English and Mandarin, at times outperforming humans in the latter, according to

"We can train this giant neural network that eventually learns to recognise speech on its own as well as a human can, and not spend so much of our time thinking about how words are structured," says Adam. "Instead, [we] can just ask the computer system to learn those things on its own."

The short answer to Baidu’s plan to conquer speech recognition is data — lots of it. Adam says was trained on tens of thousands of hours of audio recordings. Some of it comes from public data, while another portion is from crowdsourcing services, such as Mechanical Turk, Amazon’s marketplace for odd jobs that require human intelligence.

How Baidu's AI Lab plans to solve speech recognition - with lots of data

is an example of supervised learning, a type of machine learning that uses labelled training data – such as transcribed audio – to teach a system new skills, like recognising handwritten numbers. Without labelled training data, however, the neural network wouldn’t be able to differentiate right from wrong.
This is an excerpt from an article published on TechInAsia. You can read the full story here

image
Business Standard
177 22