speech recognition
Speech Recognition
Speech recognition, or speech-to-text, is the ability of a machine or program to identify words spoken aloud and convert them into readable text.
What is speech recognition
Speech recognition, or speech-to-text, is the ability of a machine or program to identify words spoken aloud and convert them into readable text. Rudimentary speech recognition software has a limited vocabulary and may only identify words and phrases when spoken clearly. More sophisticated software can handle natural speech, different accents and various languages.
Speech recognition uses a broad array of research in computer science, linguistics and computer engineering. Many modern devices and text-focused programs have speech recognition functions in them to allow for easier or hands-free use of a device.
Speech recognition and voice recognition are two different technologies and should not be confused:
Speech recognition is used to identify words in spoken language.
Voice recognition is a biometric technology for identifying an individual's voice.
How does speech recognition work?
Speech recognition systems use computer algorithms to process and interpret spoken words and convert them into text. A software program turns the sound a microphone records into written language that computers and humans can understand, following these four steps:
analyze the audio;
break it into parts;
digitize it into a computer-readable format; and
use an algorithm to match it to the most suitable text representation.
Speech recognition software must adapt to the highly variable and context-specific nature of human speech. The software algorithms that process and organize audio into text are trained on different speech patterns, speaking styles, languages, dialects, accents and phrasings. The software also separates spoken audio from background noise that often accompanies the signal.
To meet these requirements, speech recognition systems use two types of models:
Acoustic models. These represent the relationship between linguistic units of speech and audio signals.
Language models. Here, sounds are matched with word sequences to distinguish between words that sound similar.
What are the features of speech recognition systems?
Good speech recognition programs let users customize them to their needs. The features that enable this include:
Language weighting. This feature tells the algorithm to give special attention to certain words, such as those spoken frequently or that are unique to the conversation or subject. For example, the software can be trained to listen for specific product references.
Acoustic training. The software tunes out ambient noise that pollutes spoken audio. Software programs with acoustic training can distinguish speaking style, pace and volume amid the din of many people speaking in an office.
Speaker labeling. This capability enables a program to label individual participants and identify their specific contributions to a conversation.
Profanity filtering. Here, the software filters out undesirable words and language.
Comments
Post a Comment