Virtual Keyboard - SEMINAR
ABSTRACT
Input to small devices is becoming an increasingly crucial factor in development for the ever-more powerful embedded market. Speech input promises to become a feasible alternative to tiny keypads, yet its limited reliability, robustness, and flexibility render it unsuitable for certain tasks and/or environments. Various attempts have been made to provide the common keyboard metaphor without the physical keyboard, to build “virtual keyboards”. This promises to leverage our familiarity with the device without incurring the constraints of the bulky physics.
This paper surveys technologies for alphanumeric input devices and methods with a strong focus on touch-typing. We analyze the characteristics of the keyboard modality and show how they contribute to making it a necessary complement to speech recognition rather than a competitor.
INTRODUCTION
Touch-typing or machine writing was invented for mechanical typewriters which had the current QWERTY key layout since 1874. While this interface is come to age, it survived because of its many positive aspects. Yet it is not feasible for the ever-smaller computing devices that house ever-more advanced functionalities. New alphanumeric interfaces include numeric keypads augmented with letters as on the cell phone and the Graffiti handwriting characters. In this paper, we survey the state of the art in alphanumeric input interfaces. After an overview of the related work in the area in section 2, we lay out the general space of interfaces for text input in section 3. In the following section we discuss characteristics and human factors of touch-typing in particular. Section 5 explains the criteria we examined with. Section 6 introduces and compares various touch-typing input methods and devices. We conclude with stating that keyboards – whether virtual or real – are very well suited to the task of alphanumeric input.
CHAPTER-2
RELATED WORK
A large body of related work exists for physical keyboards and typewriters, their characteristics, usability, efficiency, history, and anthropomorphic backgrounds. As the keyboard becomes one of two largest components of computing devices (next to the display), research on smaller and more mobile text entry methods and devices has made great strides. We are not aware of a survey that focuses on input interfaces that retain the keyboard metaphor yet remedy the restrictive device. More related work is referenced from throughout the paper.
CHAPTER-3
ALPHANUMERIC INPUT
Since language and its manifestation in sentences, words, and letters is the human’s primary means of communication, it has been researched extensively for the human-to-computer interaction as well. In this section, we take a look at three main categories: Speech recognition, handwriting recognition and sign language. Touch-typing is covered in section 4. Discussed are characteristics and shortcomings.
3.1 Speech Recognition
Probably the most hailed UI, speech recognition (SR) is now at a stage where it can be successfully deployed in limited domains, such as in call centers for customer service or for voice dialing on mobile phones. Under these conditions it provides a highly user-friendly, unobtrusive, flexible and efficient interface method. But careful estimates suggest that 100% recognition rates for large vocabularies will not be possible in the near future. Noisy environments, speaker particularities (accents, speech impediments) worsen the situation. One of the strongest arguments however for why we should not solely rely on SR lies in how our brain processes speech generation. As Shneiderman summarizes in [26], producing sounds that make up words and sentences occupies parts of the brain that are also used for general problem solving. This is contrary to how for example body movements are processed: Physical coordination does not conflict with problem solving. In other words, it is harder to think when the thought also has to be spoken aloud. Overall, even the versatile SR interface has its limitations.
3.2 Handwriting Recognition
Handwriting recognition is another interface that offers itself to HCI due to its widespread mastery. It is equally hard as SR, especially connected-cursive handwriting. This is the reason for slightly modified alphabets like Graffiti or entirely new pen based scripting languages like Cirrin or Quikwriting that are better suited for recognition by a computer. They achieve good recognition rates and can be implemented on devices with limited capabilities. Just as in writing on plain paper though, the throughput is limited because a single actuator (the pen led by the fingers) is performing the communication task. One unit of information, a character, consists of a combination of straight and/or curved lines and/or dots. The amount of information a human can output through the medium of a pen is limited by motor skills rather than by cognitive skills. Furthermore, the improvement that can be achieved by training of motor capabilities peaks out much quicker than it does for cognitive capabilities. Stenography remedies this shortcoming of a single pen interface by coding of language into more, and more complex, symbols. Yet stenography requires a significant amount of training, too much for the common user.
Sign languages (SL) also code language into more symbols than there are letters in the English alphabet, thus a higher bandwidth can be achieved with fewer atomic elements. “Speaking” SL naturally does not involve the vocal system, although mouth movements frequently accompany hand and arm gestures. But – without proof – we guess that, contrary to speech processing, gesturing SL originates in parts of the brain that do not conflict with sentence forming. Therefore, it is conceivable that a SL-like modality can be developed that remedies the problems of SR and handwriting interfaces. Yet the drawbacks of using SL as input modality that must be overcome are big: The skill penetration in the population is small, extensive movements are required to communicate, signals are highly context dependent, and last but not least there are many difficulties associated with computer recognition of sign language.
TYPING AS INPUT MODALITY
As derived in the previous section, no alternative input modality is the silver bullet to the UI problem. In this section, we show that touch-typing, while certainly not “perfect” either, does not have many of the drawbacks of other methods.
4.1 Touch-Typing: Definition
We define touch-typing as any input method that employs discrete sensors, or sensed areas, or buttons, for one or a set of atomic symbols (letters, digits, or characters) of a language. Examples are the common keyboard, the keypad of a mobile phone, and onscreen keyboards on PDA’s. This definition explicitly includes “virtual” buttons that only differ from the surrounding physique in that their extent is sensed by some technique for touch by a finger or pointer. We use the term keyboard, or keyboard metaphor, interchangeably for touch-typing interfaces.
4.2 Characteristics and Human Performance
Touch-typing
was born with the invention of the mechanical typewriter, and the common QWERTY
layout followed us since 1874. Its greatest benefit is that all ten fingers can
be used to operate it in very rapid sequential order. The interface is not restricted
to communicating through a single pen. A seemingly related yet orthogonal
benefit derives from the fact that each button generates only one bit of
information. So in a sense, a keyboard
is an input device with one degree of freedom (1 DOF) 1, a pen or a mouse has 2
DOF (a 2-dimensional surface), and SL operates in 4 DOF (3-dimensional space
plus time). An exception which we will say to have 1.5 DOF (time being the
extra half dimension) are buttons that can generate more than one character by
pressing them repeatedly, for example those of mobile phone keypads. These
keyboard types severely limit the achievable throughput, although a study has
shown that both motor and cognitive skills adapt to the common use with only
the two thumbs.
(A more recent theoretical approach suggests that raw typing bandwidths up to 60 wpm are possible.) Another way to increase the amount of information each key can produce is to use complex moded or combinatorial operation, called chording keyboards. Individual key strokes do not generate a character, but only combinations of keys pressed simultaneously. Since n keys map to n + m symbols, chording keyboards also have a higher DOF than one. The maximum addressable space is 2n-1 symbols (less one because at least one key has to be pressed to recognize an action) with n keys.
Similar techniques as word disambiguation can be employed to even relax the requirement on the strict sequential ordering of key presses by performing context-sensitive reordering. This is frequently implemented in word processors that for example correct “hte” to “the”. This of course comes at the expense of greater context sensitivity of the input method.
Experimental observations suggest that the speed of classic touch typing with ten fingers is limited by cognitive skills rather than by the motor skills required to independently and rapidly move the ten fingers. In other words, the bottleneck for touch-typing bandwidth is the brain. For pen-based interfaces it is the dexterousness of the fingers rather than cognitive skills. This is good news: The brain can learn much quicker and better how to control body parts than the human physique takes to adapt to new requirements.
Table1. Bandwidth comparison of
different UI methods for communicating alphanumeric data. All values are commonly used quantities,
aside from the one for sign language which we guessed equal to reading
prose, based on the ability to simulcast American Sign Language for live
television.
A keyboard discretizes the 1 DOF input gesture into a set of zero-dimensional or binary bits of information. This is an important capability of keyboards that distinguishes them from one-dimensional devices like volume sliders for example. Another advantage of the keyboard metaphor over alternative methods of text entry is that it supports novice and expert users equally well. Keyboards afford both hunt and peck typing, which is advantageous if the key locations are not memorized, as well as touch-typing with ten fingers for advanced users.
4.3 The Influence of the Key(board) Layout
The QWERTY key layout (i.e. which key maps to which language symbol) is not optimal in a number of ways. Various fixed-layout keyboards optimize the key arrangement for frequent alternating use of the left and right hand, short travel distances of fingers, decreasing load from index finger to pinky to compensate the decreasing strengths of these fingers and so forth. Most frequently cited is the Dvorak keyboard, whose key layout supposedly is far superior to QWERTY. Yet after a closer look, Dvorak layout is only marginally better in terms of maximum typing speed of experienced users. It does, however, fare superior in that it is easier and quicker to learn. Other approaches are more promising overall, for example Zhai et al analytically predict the performance of various layouts and devise one with 40% predicted asymptotic performance improvement over QWERTY.
We define a virtual keyboard as a touch-typing device that does not have a physical manifestation of the sensing areas. That is, the sensing area which acts as a button is not per se a button but instead is programmed to act as one. So a sensing area could for example be realized with photo-electric sensors, active finger tracking methods, or a touch pad. The latter is different from a key pad as it does not have a-priori designated areas for buttons. Virtual keyboards that employ discrete sensing areas for each symbol (rather than chording methods) inherently allow for realization of a soft keyboard.
CHAPTER-5
METRICS AND CHARACTERISTICS
Method or device:
Since we did not limit the survey to actual devices, we have to distinguish between mere suggestions of character-producing methods without implementations and actual operating devices. A third alternative are devices that could potentially function as a virtual keyboard, but it has not been use yet as such.
Gear:
What devices are employed to realize the VK? Are there alternative methods that could shrink equipment size but can still produce the same VK method to the user? We also distinguish two main types of incarnations:
- Sensing areas are on a flat surface, e.g. a table: A regular key layout results in high familiarity with the interface method. Finger impacts with the surface produce tactile feedback. These VKs also allow for simple ways to visualize the keyboard.
- Sensing areas are on the user’s hand or on a worn glove: No surface is needed for this type, and typing can happen in “stealth mode”, e.g. while the hand is in the pant’s pocket. Tactile feedback is provided implicitly by finger contact. But at least with today’s technology a glove is essential to implementation.
Method of key press detection:
What does the device actually register – the touch of a surface, an interruption of a light beam etc. This has a big impact on how robust the method is, i.e. whether actions that were not intended to be key presses might be recognized as such.
Number of discrete keys:
How many keys does the method and/or device have? Is this number a hard limits or can it be increased easily? This is important if other or expanded alphabets like Chinese are to accommodated. We denote a device that has similar characteristics as a regular keyboard as having 52 keys and a plus “+” symbol stands for expandability. (The common typewriter has 52 keys while modern keyboards usually have 101 or more keys.)
Key-to-symbol mapping DOF:
Does each key correspond to exactly one symbol/character (1 degree of freedom, DOF), or is it a one-to-many characters mapping (1.5 DOF, see section 4.2), disambiguated by either temporal methods (multiple successive keystrokes), by statistical prediction or by chording methods (multiple keys pressed simultaneously produce one character).
Temporal significance interval for key press:
For the device to register a key press, how long does the key have to be pressed? On a physical keyboard, a key does not have to be depressed for a noticeable amount of time, but this might be different for virtual keyboards. Obviously, this has an immediate impact on the potential typing speed with the VK.
Number of discrete operators:
Are all ten fingers used to operate the keys of the virtual keyboard or is it only the index finger, thumb etc. that can press a key? The number of discrete operators influences the parallelism that the human can utilize and therefore the potential bandwidth achievable with this device or method.
Operator-to-key mapping:
Can any operator press any key or only a subset thereof? There are three different cases: A one-tone mapping means there are as many keys as operators, and each operator works with the one key only. A one-to-many mapping is exemplified by how most people touch-type on a keyboard: One finger is responsible for a set of keys, and the key set for a finger does not intersect with any other finger’s set, i.e. one key is always pressed by the same finger. Finally, a many-to-many mapping allows any operator to press any key. It must be noted that many devices theoretically have a many-to-many mapping, but individual users tend to operate a specific key with always the same operator. This is contrasted by the musical piano keyboard, where the keys are pressed by whatever operator is nearest by at any moment in time. (The authors are not aware of any interaction methods that use a many-to-one mapping.)
Operator-key switch time:
This is related to the number of discrete operators that can be employed with the VK, but it is an independent quantity. It gives an idea about the human factors aspect of the time between pressing two different keys. A physical keyboard has a very small switch time, especially when two different fingers are used: We can hit two keys almost simultaneously. On a thumb-operated cell phone pad on the other hand, it might take a more significant amount of time to switch from one key to the next. Fitts Law provides a widely accepted method for quantitative analysis. Due to the complexity of this quantity (it varies for different operators, keys, and operator-key sequences) we give only qualitative guidelines.
Feedback:
Does the VK sport a feedback mechanism other than characters appearing on a screen? If so, what human sensors are used? There might be a time delay between the typing action and the character and/or feedback being visible (audible…) by the user – how big is this delay? The usability and “feel” of the VK will strongly depend on good feedback characteristics.
Visual incarnation of a keyboard:
Is a projection or display mechanism available or feasible? With what technology? Visualization of the “keys” is an important consideration for novice users.
Familiarity:
Is the VK an entirely new method to input text, does it have a remote relation to conventional keyboards (same layout but does not support all its affordances), or is it very much in the style of a physical keyboard? This will have a big impact on how easily users can transition to this kind of virtual keyboard, and in our opinion also on the broad acceptance of the VK.
Estimated bandwidth:
This can of course be only a rough guideline to how many characters a human will likely be able to input per time when using this kind of device or method. We state the bandwidth in characters per minute (cpm). To compare to words per minute (wpm), a conversion factor of 5 can be used in accordance to the mean word length of 5 characters per word for the English language. We report raw key stroke numbers, not the number of characters after word completion or related methods.
Invisibility:
How apparent is the device and its use to other people? Can the VK be used in “stealth” mode, without disturbing others or even without being noticed? The next three criteria are very important UI characteristics, yet limited information about almost all devices and methods rendered a comparison impossible.
Cost:
What are mass production costs of the device? We put this in relation to the cost of a physical keyboard, which ranges in the tens of dollars.
Reliability and robustness:
We are used to perfect accuracy of physical keyboards – no key press is registered unless we actually press a key, and no key press goes unnoticed. VKs might have non-zero false positive and false negative rates, for example caused by non-human disturbances to the recognition process or by borderline key press actions by the human.
Accuracy:
Strongly related to this is the accuracy with which the intended key press is recognized. For this measure, we regard every key as having a unique mapping to a character – a one-to many key mapping has an additional effect on the final, perceived accuracy.
CHAPTER-6
VIRTUAL KEYBOARDS: METHODS AND
DEVICES
The following subsections explain the main characteristics of each VK2.
6.1 Visual Panel
The Visual Panel consists of a camera and a sheet of paper. The location of the extended index finger in reference to the paper is located with computer vision means. The primary application is a mouse pointer, clicking is achieved by resting the fingertip in its current position for three seconds. The authors demonstrated text entry by interpreting pointer locations as the keys of a keyboard, which were printed on the sheet of paper. An audible notification signals the recognition of a character after the 3 second significance interval.
6.2 Finger-Joint Gesture Wearable Keypad
The FJG suggests viewing the phalanges of the fingers (besides the thumb) of one hand as the keys on phone keypad. The thumb is used to press the virtual buttons. This is similar to Thumbcode, but it solely relies on word disambiguation to produce more than 12 characters. Yet the drawback of this 1.5 DOF key-to-symbol mapping might be mitigated by the familiar layout. Also, less complex hand configurations might be less tiring for the user. Just as Thumbcode, FJG has no user feedback method beyond skin contact sensations.
6.3 Thumbcode
The “Thumbcode” method described in defines the touch of the thumb onto the other fingers’ phalanges of the same hand as key strokes. Consequently there are 12 discrete keys (three for each index, middle, ring finger and pinky). To produce up to 96 different symbols, the role between keys and operators is broken up: The four fingers can touch each other in eight different ways, each basically representing a mode, or modifier key that affects the mapping for the thumb touch. Tactile user feedback is implicit when touching another finger with the thumb. A glove implementation was tested by the author.
6.4 Chording Glove
The Chording Glove employs pressure sensors for each finger of the right hand in a glove to implement a chording input device. Almost all possible finger combinations are mapped to symbols, making it potentially hard to type them. Additional “mode switches”, located along the index finger, are used to produce more than the 25 distinct characters. Yet user experiments suggest otherwise: rates of up to 19 wpm are achieved after ten training sessions “with no signs of leveling off”.
6.5
FingeRing
FingeRing uses accelerometers on each finger to detect surface impacts. In the wireless version depicted in the figure below these rings communicate with a wrist-mounted data processing unit. The interaction method is designed for one handed use, but could be extended to two hands with obvious implications. In the current version, the finger movements to produce one character are extensive: two chording patterns have to be typed within a time interval, each consisting of a combination of fingers hitting the surface. Due to this piano-style typing method, users with prior piano experience fare much better with this device; in fact, the full 2-stroke chord mapping is rendered too difficult for novice users.
6.6
TouchStream
The
TouchStream keyboard stretches our definition of a VK as it has keys printed on
the surface. Yet the underlying technology permits software configuration of
the sensed areas, equal to the
6.7
DSI Datotech Systems offers one of the few touchpads that reports up to ten surface contacts and their pressure forces independently and simultaneously. While it has not been implemented yet, one could use the 20x15cm large device to transfer the traditional keyboard modality in a one-to-one fashion to an interactive, software-configurable surface. Inherent to this device are the same user feedback methods as for any of the devices employing tabletop units: finger surface impacts.
6.8
VType
VType
detects the key stroke of each finger “in the air” with a data glove (fiber optical
curvature detection). Different locations of the key strokes are not
distinguished, only which finger pressed a key. Instead, disambiguation with
standard statistical methods on the word and sentence level solves the 1.5 DOF
mapping problem. There is currently no feedback mechanism incorporated into the
VType prototype.
6.9
VKey
Virtual Devices Inc. recently announced a combined projection and recognition VK. Little is known about this device, but their press release suggests that visual sensors (cameras) detect the movement of all ten fingers. Just as the VKB device, the VKey also consists of a tabletop unit and feedback is the tactile sensation of hitting a surface.
6.10
VKB Projection
The virtual keyboard technology developed by VKB is a tabletop unit that projects a laser image of a keyboard on any flat surface. Infrared cameras detect key strokes of all ten fingers.
Word disambiguation techniques are employed despite this 1 DOF mapping. Therefore, our guess is that engagement of all distinct key locations is detected, yet with a fairly low accuracy. These two characteristics in combination should result in fairly good recognition rates. Surface impact of the fingers serves as typing feedback.
In the virtual keyboard, four IR transceivers are placed approximately cm apart in the device. Transmitter are mounted on top of and the receiver under a printed circuit board. The “Silvery” surface inside the box generates suitable apertures for delimiting the space angle of receiver.
The tranreceivers of the device operate in cycle controlled by the clock frequency of the proximity sensing device. The signal received by one transreceiver at a given time a sum of signals reflected from a finger. By processing these proximity signals it is possible to recognize different keystrokes on a certain ‘sensitive’ area in normal usage situations.
6.11
Scurry
Tiny gyroscopes on each finger are the sensing technology in Samsung’s Scurry. The prototype suggests that these finger rings communicate with a wrist-mounted unit where the data is processed. Not much is known about this device, yet our guess is that finger accelerations and relative positions are detected, making it possible to distinguish multiple key targets per finger.
We further guess that a surface impact is required to register a key stroke, also making for the primary sensory feedback to the user. Little LED’s on the rings potentially provide additional feedback.
6.12 Senseboard
The Senseboard consists of two rubber pads that slip onto the user’s hands. Muscle movements in the palm are sensed (with unspecified, non-invasive means) and translated into key strokes with pattern recognition methods. All further information (obtained from the company’s web site) can be found in the tabular comparison. The only feedback other than characters appearing on a screen comes from the tactile sensation of hitting the typing surface with the finger.
6.13 Drawbacks
1. Virtual keyboard is hard to
get used to.Since it involves typing in thin air,it requires a little practice
.Only people who are good at typing can use a virtual keyboard efficiently.
2. It is very costly ranging
from 150 to 200 dollars.
3. The room in which the
projected keyboard is used should not be very bright so that the keyboard is
properly visible.
CHAPTER-7
FUTURE WORK
We have provided a qualitative analysis of different virtual keyboards. The logical next step is to conduct user studies to obtain quantitative measures on the usability and efficiency of these methods and devices. Based on these experiments we will try to draw conclusions on how each of the metrics and characteristics from section 5 influences usability and efficiency, independent from device artifacts.
CHAPTER-8
CONCLUSIONS
We first gave an overview of the range of input devices and methods for alphanumeric data. We then had a closer look at touch-typing as input method and highlighted its benefits. This was followed by a survey of the state of the art of touch-typing interfaces, or virtual keyboards. We found that the trend goes towards retaining the original keyboard metaphor as closely as possible.
Our conclusions are that while the keyboard is often regarded as an antique method that is unsuitable to modern computing devices, a number of characteristics are inherent in the way we use it that make it preferable over alternative methods. Input with keyboards is and will be an important user interface modality for computers for decades to come.
Comments
Post a Comment