Spik 1.0 - Voice commands execution in Windows

Download Report

Transcript Spik 1.0 - Voice commands execution in Windows

Spik v1.0
Voice Commands Execution
in a Windows Environment
Dekel Abelson
Eliran Dahan
Instructor: Ari Todtfeld
Objectives
• Analysis and exploration of Voice-Recognition systems, the
abilities of such systems and its limitations
• Understanding the Windows architecture
and programming concepts
• Development and implementation of a tool that enables
users to execute voice commands in a Windows
environment, including the restructuring of a graphic
interface (GUI) of the tool.
• Learning the Microsoft Speech SDK 5.1
(Software Development Kit) and its speech engine
Project skills
• C++ programming skills
• XML (Extensible Markup Language) programming
skills
• Programming in windows environment include
API (Application Programming Interface) commands
Brief history
• 1994 - Release of Dragon Systems' “DragonDictate” for Windows 1.0,
using discrete speech recognition technology
• 1996 - Introduction of IBM’s “MedSpeak”, being the first continuous
speech recognition software
• 1997 - Dragon Systems’ “NaturallySpeaking” first general-purpose
continuous speech software program
Two months later IBM release it’s “ViaVoice”
• 2005 – Due to improvements in PC’s process time and in the algorithms
used - today there are several speech recognition programs in
the market.
Voice recognition
•
Voice recognition follows these steps:
1. Spoken words enter a microphone
2. Audio is processed by the computer's sound card
3. The software discriminates between lower-frequency
vowels and higher-frequency consonants and
compares the results with phonemes, the smallest
building blocks of speech
The software then compares results to groups of
phonemes, and then to actual words, determining the
most likely match
4. The sentence is transferred to a word processing
application
Architecture
Voice command
by the user
SAPI 5.1
Speech Application
Program Interface
Processing the recognized
commands by C++/XML code
Commands execution
using API functions
GUI
• Execution file - spik.exe
• The GUI - A window that receives the voice commands
from the user. This GUI has been built in C++ using the
basic “Windown” class.
Sapi 5.1
• The SAPI provides a high-level interface between the application
and the speech engine
• The TTS (Text-To-Speech) system synthesize text strings
and files into spoken audio Speech
• Speech recognizers convert human spoken audio into
readable text strings
Processing
Main function contains the infinite
loop waiting for messages to
process
Microsoft Speech Engine
Main window procedure
that handles the messages to the
window
Execute commands that have been
identified by the speech engine
API functions
Commands Execution
•
•
•
Windows API is a set of Application Programming Interfaces available
in the Microsoft Windows operating systems which enable developers
to create software
The API consists of C functions implemented in dynamically linked
libraries (DLLs), mainly in core DLLs kernel32.dll, user32.dll and gdi32.dll
Main API functions we have used:
CreateProcess()– runs executable files
WinExec()
– runs windows procedures
ShellExecute() – runs URL files
ShowWindow() – sets the specified window's show state
SendMessage() – sends the specified message to a window or
windows
keybd_event() – synthesizes a keystroke
PostMessage() – places (posts) a message in the message queue
associated with the thread that created the
specified window
‫‪TheCode‬‬
‫קובץ טקסט המכיל מחרוזות‬
‫לשימוש התוכנית‬
‫קובץ טקסט בפורמט ‪XML‬‬
‫לשימוש מנוע זיהוי הקול‬
‫קובץ מקומפל‬
‫לשימוש מנוע זיהוי הקול‬
‫קבצי ‪Header‬‬
‫של מנוע זיהוי הקול‬
‫קבצי ‪Header‬‬
‫של התוכנית‬
‫קובץ תוכנית הרצה‬
‫קבצי קוד מקור‬
‫בשפת ‪C++‬‬
Adaptation & Training
• The speech recognition engine adapts itself to the user’s
voice, vocabulary and speech style in order to improve
speech recognition accuracy
• After adaptation there will be only ¼ of recognition errors
and the accuracy will rise
• As more training is being done,
accuracy will rise to
around 95%.
Voice command example
• Calculator usage:
Say the voice command “Open Calculator”
To run the calc.exe program
Say a simple exercise
And than say
“Equal” or “Result”
To show the solution
Voice command example
• Run programs - notepad
command line
• Internet usage - search google
• Windows navigation - my documents
system properties
start menu
screen saver
Added value of the project
• Advanced versions based on Spik v1.0 will be a
helpful tool for using the computer and the web,
for physically challenged population
Future Development
• Advanced OS navigation in order to eliminate
the use of the keyboard
• Adding Speech-to-Text capabilities
• Improved GUI to let users enter their own
voice commands
Q&A