Amazon has thousands of employees listen to Echo audio clips as part of improving Alexa’s machine learning so that the personal assistant could better respond to voice commands.
These people are listening to what some Alexa owners tell the assistant, reviewing, transcribing and annotating audio recordings to help train Alexa’s machine learning model.
Bloomberg has the story:
The work is mostly mundane. One worker in Boston said he mined accumulated voice data for specific utterances such as ‘Taylor Swift’ and annotated them to indicate the searcher meant the musical artist.
So far so good, but…
Occasionally the listeners pick up things Echo owners likely would rather stay private: a woman singing badly off key in the shower, say, or a child screaming for help. The teams use internal chat rooms to share files when they need help parsing a muddled word—or come across an amusing recording.
I get sharing a customer’s audio recording with a fellow worker for the purposes of getting the job done. But sharing an audio clip with a colleague just because the user might have happened to say something funny or stupid feels totally wrong and unprofessional to me.
Sometimes they hear recordings they find upsetting, or possibly criminal. Two of the workers said they picked up what they believe was a sexual assault. When something like that happens, they may share the experience in the internal chat room as a way of relieving stress.
While Amazon has a process in place for its workers to follow whenever they hear something distressing, some employees were rebuffed in no uncertain terms with the explanation that it wasn’t Amazon’s job to interfere.
For those concerned about privacy, the report claims that people on this team are listening to only some of the voice recordings that were captured in Echo owners’ homes and offices.
An Amazon spokesperson commented:
We take the security and privacy of our customers’ personal information seriously. We only annotate an extremely small sample of Alexa voice recordings in order to improve the customer experience.
For example, this information helps us train our speech recognition and natural language understanding systems, so Alexa can better understand your requests, and ensure the service works well for everyone.
Sudio files are stripped of identifiable information like a user’s full name and address. That being said, Amazon could’ve been more transparent with its data collection:
The Alexa voice review process, described by seven people who have worked on the program, highlights the often-overlooked human role in training software algorithms. In marketing materials Amazon says Alexa ‘lives in the cloud and is always getting smarter.’ But like many software tools built to learn from experience, humans are doing some of the teaching.
Users can adjust settings to stop Amazon from using their voice recordings to improve Alexa.
The online retail giant acknowledges that Alexa requests are being used “to train our speech recognition and natural language understanding systems,” but this is buried in a list of frequently asked questions on their website. No matter how you look at it, contextual voice recognition is a tough nut to crack but machine learning promises to be the right solution. The problem is, machine learning models must be trained.
For instance, Apple has trained Face ID with more than a billion photographs of people’s faces. As for speech recognition, achieving high accuracy does require large amounts of labeled data.
That’s why launching Siri in a new language isn’t possible without having enough data to train the acoustic models, and that data has to come from real people performing real voice queries. The only difference between Amazon and Apple is that the former has humans listening to some of those recordings while the latter, presumable, does not.
Now that you know that Amazon has a global team listening to Alexa audio clips, are your more or less likely to continue using Echo products?
Let us know by leaving a comment below.