Fooling Speech Recognition With Hidden Voice Commands

It’s 2018, and while true hoverboards still elude humanity, some future predictions have come true. It’s now possible to talk to computers, and most of the time they might even understand you. Speech recognition is usually achieved through the use of neural networks to process audio, in a way that some suggest mimics the operation of the human brain. However, as it turns out, they can be easily fooled.

The attack begins with an audio sample, generally of a simple spoken phrase, though music can also be used. The desired text that the computer should hear instead is then fed into an algorithm along with the audio sample. This function returns a low value when the output of the speech recognition system matches the desired attack phrase. The input audio file is gradually modified using the mathematics of gradient descent, creating a result that to a human sounds like one thing, and to a machine, something else entirely.

The audio files are available on the site for your own experimental purposes. In a noisy environment with poor audio coupling between speakers and a Google Pixel, results were poor – OK Google only heard the human phrase, not the encoded attack phrase. Given that the sound quality was poor, and the files were generated with a different speech model, this is not entirely surprising. We’d love to hear the results of your experiments in the comments.

It’s all a part of [Nicholas]’s PhD studies around the strengths and pitfalls of neural networks. It highlights the fact that neural networks don’t always work in the way we think they do. Google’s Inception is susceptible to similar attacks with images, as we’ve seen recently.

[Thanks to Wolfgang for the tip!]

4 thoughts on “Fooling Speech Recognition With Hidden Voice Commands

  1. We play a game with the GPS in the car: trying to fool it into obeying nonsense phrases that sound like commands. For example: “oyster man” and “horse demand” are accepted as “voice command.” It’s fun to tell it what to do in complete gibberish.

  2. Go to a bar, find the biggest, drunkest guy and get up in his face and tell him “Go vuck yourself!”

    After you regain consciousness laugh at him because you hacked his neural network. He misheard “vuck” as something else.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s