Speech Intelligibility

More metering options are integrated with the Netflix Loudness Meter and Intelligibility Meter. To ensure consistency of content production, the Netflix Loudness Meter in Nuendo 11 is calibrated to the official Sound Mix Specifications and Best Practices, measuring the dialog-gated loudness as required by Netflix. Based on algorithms developed by the Oldenburg branch of the Fraunhofer IDMT in Germany, the new AI-powered Intelligibility Meter indicates in real-time the effort of the listener to understand speech in the mix.

In a strict sense, speech intelligibility is measured as the proportion of speech items (e.g. words) that can be recognized correctly in a given situation. More broadly, the term “intelligibility” is often used to describe the perceived effort one has to spend to understand speech. This is also relevant for broadcast applications, because even if I am technically able to understand every word of a dialog, I may still have to invest a lot of cognitive resources, e.g., when the background sounds are too loud. This broader sense of speech intelligibility is what we measure with Nuendo’s new tool.

Speech consists of small building blocks, so-called phonemes. Several phonemes combine to syllables or words. Phonemes are what automatic speech recognition engines detect and convert to meaningful speech. In very clear speech, there is only a single phoneme at a given instant of time. In technical terms, a machine trained to recognize speech detects a high probability for the presence of a specific phoneme and a low probability for all other phonemes. The more disturbed the speech, the less distinct this probability is: The machine is less certain which phoneme is present. This is what we use to quantify intelligibility.

The algorithm has to perform different tasks. First, it must detect if speech is present or not. This sounds trivial but is a challenging issue when considering how diverse and “speech-like” broadcast background sounds can be. Then we use automatic speech recognition technology and compute how certain the recognizer is to detect individual phonemes. Finally, we map this certainty to a scale that corresponds to human perception as measured in hundreds of hours of listening experiments. For all this to work robustly, we exploited deep learning with many thousand hours of training material with real speech and highly challenging backgrounds.

For more information on speech intelligibility visit the Fraunhofer-website

Amazon

speech intelligibility measurement tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Amazon

audio loudness meters for broadcast

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Amazon

professional speech recognition software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Amazon

Nuendo 11 audio editing software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

You May Also Like

Safety 101 for Neighborhood Safety

Keen on safeguarding your neighborhood? Discover essential tips to create a safer, more connected community today.