Email Newsletter icon, E-mail Newsletter icon, Email List icon, E-mail List icon Sign up for our Newsletter

Philips SpeechMike Air Medical Dictation Recorder
Speech Magic Solution Builder

SEARCH THIS SITE
only search Emerging Technologies






Rate This Website

Download the Alexa Toolbar



You are here: Home > Voice Recognition > Voice Recognition 101: Understanding the Fundamentals

Voice Recognition 101: Understanding the Fundamentals

Voice recognition is a form of computer dictation that has become increasingly popular across various industries. Despite its widespread use, the technology isn't flawless and presents many challenges to the user. The major difficulty in using voice recognition is the significant differences between human speech and the traditional form of computer input.

Although computer software is generally deigned to generate precise results after receiving the proper input, the way humans enunciate their words is anything but precise. Everyone's voice is unique and even identical words have different meanings when spoken in different contexts. To overcome these common issues, developers have taken various approaches towards voice recognition.

Template Matching Voice Recognition

The two most common elements of voice recognition are template matching and feature analysis. The simpler of the two, template matching offers the highest accuracy, but also tends to be the most limited. As with any approach, the first step in template matching involves the user speaking into a microphone. In order to interpret the voice input, the computer matches it up with the known meaning, which is stored in a sample or template. This technique is quite similar to traditional computer commands executed over a keyboard.

Because everyone's voice is different, it is impossible for a voice recognition program to have a template for each person that may use the program. For this reason, it must be trained with the voice input of a new user before the program is able to recognize their voice. During the training process, the software generally displays a word or phrase, calling for the user to speak the text numerous times into the microphone. The software then formulates a statistical average of samples of the same word or phrase and stores it as a template in the program database. With this approach, the program's vocabulary is limited to the words used in the training process while the database is limited to the user that trained the software. This type of voice recognition is often referred to as speaker dependent, a technique that results in an accuracy rate of about 98%.

Feature Analysis Voice Recognition

Feature analysis is a technique that typically results in speaker independent voice recognition. Instead of attempting to match voice input with a voice template, this technique processes voice input by utilizing linear predictive coding or Fourier transforms. From there, it attempts to find pattern similarities between the input the program expects and the actual voice input. As the similarities are made present for a variety of speakers, the program doesn't have to be trained when a new user wants to use it. The speaker independent method is capable of dealing with many speech differences the dependent technique cannot. This includes accents, various speeds of delivery, volume, pitch and inflection.

Speaker independent voice recognition has proven to be more difficult to work with. Some of its greatest deficiencies are the wide range of inflections and accents used by people of different nationalities. Voice recognition accuracy for this technique is significant lower at around 90 to 95%. For this reason, many of today's most widely used programs employ the speaker dependent method.

Contact Emerging Technologies

Bookmark and Share