For individuals at risk of losing their ability to speak, such as those diagnosed with amyotrophic lateral sclerosis (ALS) or other conditions affecting speech, a voice replicator can be a lifeline. Enter Personal Voice, a revolutionary tool first introduced in May 2023 and made available on iOS 17 in September of the same year. This innovative solution, compatible with the latest iPhone, iPad, or Mac devices, generates synthesized voices enabling users to communicate seamlessly through FaceTime, phone calls, assistive communication apps, and in-person conversations.

Creating Your Personalized Voice

The process begins with the user reading aloud a set of randomized text prompts, recording 150 sentences directly on their device. Overnight, while the device is charging, locked, and connected to Wi-Fi, machine learning techniques fine-tune the voice audio, ensuring a personalized touch. By the next day, users can type their messages, utilizing the Live Speech text-to-speech (TTS) feature, and be heard in conversations with a voice that echoes their own. Crucially, all model training and inference occur entirely on-device, safeguarding user privacy and security while offering uninterrupted access to Personal Voice.

Unveiling the Machine Learning Magic

Behind the scenes, Personal Voice harnesses three machine learning approaches:

1. Personal Voice TTS System: A neural TTS system processes text into speech output through three key components: text processing, acoustic model, and vocoder model. Apple researchers meticulously worked on the Open SLR LibriTTS dataset, comprising 300 hours of speech from 1000 speakers with diverse styles and accents. Fine-tuning both the acoustic and vocoder models on-device ensures superior voice quality and clarity, enhancing user experience.

2. Voice Model Pretraining and Fine-Tuning: By fine-tuning the acoustic model with on-device training, Personal Voice achieves remarkable accuracy in replicating the target speaker's voice. While considering both universal and on-device adaptation for the vocoder model, the team found that fine-tuning both models leads to enhanced voice quality, albeit requiring additional training time.

3. On-Device Speech Recording Enhancement: The on-device approach ensures that users can optimize their voice recordings efficiently and securely, without compromising data privacy. By leveraging machine learning algorithms directly on the device, Personal Voice empowers individuals to maintain control over their communication tools while benefiting from cutting-edge technology.

In essence, Personal Voice stands as a beacon of hope for individuals facing speech challenges, offering not just a means of communication but a voice that reflects their identity and personality. With its innovative approach to machine learning and commitment to user privacy, Personal Voice paves the way for inclusive and accessible communication solutions in the digital age.

Reasearch: https://machinelearning.apple.com/research/personal-voice