Just how exactly does Siri learn a new language? In today’s interview with Reuters, Apple’s speech team head Alex Acero offered a behind-the-scenes look at how Siri is being taught new languages, a process that involves script-writing, capturing voices in multiple accents and dialects and using machine learning and artificial intelligence to build and evolve new language models over time. The system requires a team of people tasked with reading passages of manually transcribed text.
Before actually updating Siri, Apple first rolls out Dictation support for a new language.
Siri currently speaks 21 languages in 36 countries. By comparison, Microsoft’s Cortana supports eight languages tailored for thirteen countries, Google Assistant speaks four languages while Amazon’s Alexa works only in English and German.
Teaching Siri a new language entails the following steps:
- People read custom passages of text in a range of accents and dialects
- Recordings are transcribed by hand so Siri knows exactly what it’s supposed to be learning
- They also capture sounds in a range of voices
- A new language model is built that tries to predict words sequences
Apple rolls out Dictation support for the new language before it’s added to Siri—that’s why the Diction feature supports more languages than Siri (the upcoming macOS Sierra 10.12.4 software update, for instance, will enable Dictation support for Shanghainese, a Chinese dialect spoken only around that city).
Dictation lets Apple capture anonymized audio recordings, complete with background noise and mumbled words. The audio is manually transcribed by humans, a process that Acero claims helps cut the speech recognition error rate in half.
It’s only after enough data has been gathered that Apple commissions an actor to record voices for Siri. When a new Siri language is ready for prime time, it’s released with answers to what Apple estimates will be the most common questions.
Apple’s been investing large sums into artificial intelligence and machine learning to help the language models evolve over time as Siri learns more about what users ask.
They update Siri every two weeks with more tweaks, said Acero.
One possible problem with Apple’s approach, according to Charles Jolley, creator of an intelligent assistant named Ozlo, is that you can’t hire enough writers to come up with the system you’d need in every language.
Other personal assistants, including Google Now and Microsoft Cortana, mitigate the scaling issue by synthesizing the answers, something Siri is not very good at.
Viv, a startup founded by Siri’s original creators that Samsung acquired last year, is working on just that. “The only way to leapfrog today’s limited functionality versions is to open the system up and let the world teach them.”
Although Siri now speaks more languages than her rivals (Google and Amazon said they plan to bring more languages to their respective assistants), the user is still left with a sub-par experience because, as I mentioned, other assistants are better at understanding context and providing more conversational responses.
The Cupertino company’s $200 million acquisition of Australia-based machine learning startup Turi in August 2016 should help improve Siri’s language and knowledge models.
Apple is expected to show off enhanced Siri capabilities (which may or may not be exclusive to iPhone 8) at its annual pilgrimage for developers which kicks off with a keynote on June 5. Siri improvements may include multi-language support.
iOS 10 supports typing in two languages without needing to switch keyboards so perhaps Siri will soon understand multiple languages without requiring you to manually choose one in Settings → Siri → Language?
A multi-language Siri should work great on Apple TV, too.
A November 2015 interview with several Apple TV project managers suggested Apple initially limited Siri on the set-top box to just eight countries due to the differences in the pronunciation of actor names, films and directors in various languages and dialects.