Google explains how the Pixel Recorder app utilizes AI models to assign Speaker Labels

Google explains how the Pixel Recorder app utilizes AI models to assign Speaker Labels

Recently Google introduced Speaker Labels to the Pixel Recorder apps. This feature is capable of recognizing distinct speakers in a recording, in addition to this, the feature is capable of adding distinct labels to the transcript. However, the users can change the labels with the speaker names. Quite simple right? But the work and thought behind such a feature are quite complicated.

In a blog post, Google indicates that the Speaker Labels are fueled by the new speaker diarization system. This system is named as Turn-to-Diarize. Turn-to-Diarize is based on various highly optimized machine learning algorithms and models. It permits diarizing hours of audio in real time. However, it requires fewer computational resources on Pixel smartphones.

Speaker change is detected by using an encoder model. This model isolates voice characteristics from each speaker. Followed by this, a multi-stage clustering algorithm assigns speaker labels to every speaker.

Furthermore, Google explained that the audio recordings can vary in their length. They can be short like a few seconds or up to 18 hours long. As more audio is consumed by the model, it becomes assertive in predicting speaker labels. Seldom the system make corrections to previously predicted low-confidence speaker labels. The speaker labels are automatically updated on the screen during the recording by the Recorder app. These labels portray recent and precise predictions.

Does not it seem magical to you, how is your smartphone doing it?

Google mentioned that it is introducing some changes to the Recorder app. Given these changes, the app will use less power. As of now, the system is based on the CPU block of Google Tensor chips. The company is now developing more computational tasks for the TPU block. Thus, making the diarization system power efficient.