Some Amazon employees who listen to Alexa requests have access to users' home addresses

Bloomberg recently reported that Amazon has a project with some of its employees listening to some of the audio recordings made by Alexa speakers when users interact with them, but it now appears that those folks have access to customers’ home addresses, too.

A new Bloomberg report from this morning cites five anonymous employees on Amazon’s Alexa Data Services team as confirming that they can easily find a customer’s home address by typing their latitude and longitude information into mapping software like Google Maps.

While there’s no indication Amazon employees with access to the data have attempted to track down individual users, two members of the Alexa team expressed concern to Bloomberg that Amazon was granting unnecessarily broad access to customer data that would make it easy to identify a device’s owner.

Well, that certainly doesn’t sound promising.

Some of the workers charged with analyzing recordings of Alexa customers use an Amazon tool that displays audio clips alongside data about the device that captured the recording. Much of the information stored by the software, including a device ID and customer identification number, can’t be easily linked back to a user.

That’s how it should be.

However, Amazon also collects location data so Alexa can more accurately answer requests, for example suggesting a local restaurant or giving the weather in nearby Ashland, Oregon, instead of distant Ashland, Michigan.

There’d nothing wrong with Alexa collating location data from users to improve its service were it not for the fact that some of its employees are allowed to see it in the first place.

In a demonstration seen by Bloomberg, an Amazon team member pasted a user’s coordinates, stored in the system as latitude and longitude, into Google Maps. In less than a minute, the employee had jumped from a recording of a person’s Alexa command to what appeared to be an image of their house and corresponding address.

Well, that’s certainly disturbing but anyone can look up a person’s location in Google Maps as long as they have their latitude and longitude, there’s nothing special about it. What is disturbing is the fact that user coordinates are available to some Amazon employees.

Another internal tool they’re using stores more personal data.

After punching in a customer ID number, those workers—called annotators and verifiers—can see the home and work addresses and phone numbers customers entered into the Alexa app when they set up the device, the employee said. If a user has chosen to share their contacts with Alexa, their names, numbers and email addresses also appear in the dashboard.

Yeah, but why show all that info in the dashboard?

That data is in the system so that if a customer says ‘Send a message to Laura,’ human reviewers can make sure transcribers wrote the name correctly so that the software learns to pair that request with the Laura in the contact list.

It’s unclear from the report how many employees and contractors might have access to these features. Two Amazon employees said they believed the vast majority of workers in the Alexa Data Services group were able to use that software about a year ago.

Another employee said that, “until recently”, the system displayed full phone numbers with some digits obscured. “Until recently,” meaning after Bloomberg reported on this, right?

Amazon further limited access to data after Bloomberg’s April 10 report, two of the employees said. Some data associates, who transcribe, annotate and verify audio recordings, arrived for work to find that they no longer had access to software tools they had previously used in their jobs, these people said.

As of press time, their access had not been restored.

Worryingly, the original report included a statement attributed to an Amazon spokesperson denying that the folks listening to Alexa recordings have access to other information.

Employees do not have direct access to information that can identify the person or account as part of this workflow. All information is treated with high confidentiality and we use multi-factor authentication to restrict access, service encryption and audits of our control environment to protect it.

In a new statement responding to the story, however, Amazon has changed its position and is now calling access to internal tools “highly controlled”:

Access to internal tools is highly controlled and is only granted to a limited number of employees who require these tools to train and improve the service by processing an extremely small sample of interactions.
Our policies strictly prohibit employee access to or use of customer data for any other reason and we have a zero tolerance policy for abuse of our systems. We regularly audit employee access to internal tools and limit access whenever and wherever possible.

Before we jump to any premature conclusions, it bears repeating that this isn’t some kind of a secret project—Amazon doesn’t hide the fact that it pays people to listen in on customer conversations. Like Apple and other tech firms, Amazon keeps anonymized snippets of audio recordings on its servers for a period of time to help improve its voice recognition and AI.

“We use your requests to Alexa to train our speech recognition and natural language understanding systems,” the company says in a list of frequently asked questions.

It’s a fact of life in the technology world that artificial intelligence systems that help machines understand the meaning behind spoken words must be trained on a regular basis using real recordings, which is the only reliable way of improving their accuracy.

As Amazon spokesperson explained to Bloomberg:

We only annotate an extremely small sample of Alexa voice recordings in order to improve the customer experience. This information helps us train our speech recognition and natural language understanding systems so Alexa can better understand your requests and ensure the service works well for everyone.

Apple does the exactly same thing—Siri has human helpers who listen to some of the clips, tied to a random identifier and stored on servers for six months.

After that, the random identifier is completely removed but the data is retained for longer periods to improve Siri’s voice recognition. Yes, Google also has reviewers tasked with listening to some of Assistant’s snippets to help train and improve the service. Google’s recordings are stripped of any personally identifiable information and the audio itself is distorted.

As Florian Schaub, a professor at the University of Michigan who has researched privacy issues related to smart speakers, nicely put it:

You don’t necessarily think of another human listening to what you’re telling your smart speaker in the intimacy of your home. I think we’ve been conditioned to the assumption that these machines are just doing magic machine learning.
But the fact is there is still manual processing involved.
Whether that’s a privacy concern or not depends on how cautious Amazon and other companies are in what type of information they have manually annotated, and how they present that information to someone.

For context, here’s an excerpt from the original story:

Some Alexa reviewers are tasked with transcribing users’ commands, comparing the recordings to Alexa’s automated transcript, say, or annotating the interaction between user and machine. What did the person ask? Did Alexa provide an effective response?
Others note everything the speaker picks up, including background conversations—even when children are speaking. Sometimes listeners hear users discussing private details such as names or bank details; in such cases, they’re supposed to tick a dialog box denoting ‘critical data. They then move on to the next audio file.
According to Amazon’s website, no audio is stored unless Echo detects the wake word or is activated by pressing a button. But sometimes Alexa appears to begin recording without any prompt at all, and the audio files start with a blaring television or unintelligible noise. Whether or not the activation is mistaken, the reviewers are required to transcribe it. One of the people said the auditors each transcribe as many as 100 recordings a day when Alexa receives no wake command or is triggered by accident.

Bloomberg’s original reporting claimed that “the thousands of employees” working on this project sometimes must also review clips that can be quite distressing—and even some that are potentially criminal. In one reported case, two Amazon employees heard what they think was sexual assault but were instructed by their employer not to report the incident because it “wasn’t Amazon’s job to interfere.”

In such extreme cases, it’s very much Amazon’s job to interfere.

Why wouldn’t the company permit the employees who listen to Alexa recordings to report these incidents to law enforcement and potentially save lives? If someone breaks into my house and holds me at gunpoint, I want whoever might be listening to be allowed to see my home address and even alert the police, if necessary.

Thoughts?