Ben Rudolph

Presentation #1

QMCS 425.01

02/24/00

Speech Recognition

With a plethora of new technologies becoming available at such high speeds, people are looking for, and finding ways to complete daily tasks both quicker and easier. Examples of this range from electronic mail to on-line shopping. Both of these need to be done with the use of a computer. Now, however, even computers can sometimes be too slow and cumbersome to use. One way that computers are being partially by-passed is through the use of speech recognition. In this paper I will discuss what speech recognition is, how it works, how it is used, and the problems associated with this new technology.

 

Speech Recognition is the use of computers to hear and understand spoken words. The recognition of speech by computers can take a couple of different forms. The first type of speech recognition is done when a person speaks directly to a computer while sitting in front of the monitor. By speaking certain commands, the user can direct the computer to perform specific functions, such as navigating to, and selecting desired destinations similar to the manual "point-and-click" method. This also can take the form of dictation, where the computer will type what the user says in certain applications such as a word processor. The other main type of speech recognition is a less known form where a user can navigate through a computer system over the telephone. In this case, the user does not have to be in proximity to the computer.

*Note: Speech Recognition should not be confused with applications known as DTMF where the user selects from certain options by verbally specifying a number or letter.

 

1960’s – First basic research in automatic translation and speech synthesis

1970’s – Lab development of Speech Recognition and dictation methods

1980’s – First in-field trials of Speech Recognition and dictation

1990’s – Systems deployed and Speech Recognition software available to public

 

Speech recognition when used for dictation purposes works as follows: The words spoken by the user are captured by the microphone and processed by the sound card in the computer hard drive. The dictation software then analyzes the sound and makes certain distinctions. Such distinctions are made between the lower-frequency sounds of vowels and the higher-frequency sounds of consonants. The results of these distinctions are then compared with phonemes, the sounds of the language that are part of the software. The sounds are then put together and processed against the dictionary to come up with the word that was most likely spoken by the user. The word is then typed by the computer in the application that is being employed.

Dictation software usually requires a training session in which the user reads a set of selected passages. This is used to build the dictionary for the software as well as accounting for certain patterns in the user’s speech. Specific words can be manually added if the software is unable to recognize words during use. In most packages this is simply done by saying the word and spelling it once.

The form of speech recognition where the user can navigate through computer systems while on a phone is a bit more complicated. The main process of sounds being processed by a sound card and then compared to a dictionary of sounds is the same. However, the dictionary for such a system needs to be astronomically large. This is due to the fact that different accents, slang, and speech pattern need to be recognized by the system. Not all users of such a system can do a training session so that the software can become accustomed to the individuals.

 

Speech recognition in the form of dictation can be used in business to formulate documents, memos, and letters. This technology allows the user to be performing multiple tasks at a time.

Dictation can also be used by people who have partial or no use of their hands to help them be able to use computers more quickly and easily.

The navigational form of speech recognition shows the most promise for extensive use in society. The option to book hotels, make airline reservations, check the weather, etc. is already being used today. This can all be done without talking to another human being. Some predict that as many as 50% of all customer service type phone calls will be done using this technology in the future.

 

 

Currently, dictation software is considered to error approximately 1-2 times per sentence. Correcting these misidentifications is a lengthy and irritating process. Another frustration is due to the necessary time to complete the training sessions.

The main problem and complaint associated with the call center type of speech recognition is the difficulty of the software to recognize such a disparity between so many types of speech. Also, people are often aggravated when they are not able to speak to a real person who will answer their questions.

The price of speech recognition software is become less and less of a problem, which is indicative of the price of technology as a whole.

 

Although there are related problems, speech recognition is a technology that is having and will definitely have a significant impact on society in the near future.