When HCI Should Be HHI

G. Alan Creak
Computer Science Department
Auckland University
New Zealand
alan@cs.auckland.ac.nz

ABSTRACT

Human-computer interaction (HCI) is the study of how people communicate with computers, and as such is primarily concerned with the transfer of information between people and computer software. Systematic investigation of this area has proved of great value in understanding the processes involved in communication with computers and has contributed to the design of more effective interfaces. HCI techniques work well when the primary purpose of the interaction is communication with computers, or with other machines through computers. It is less clear that they are equally effective when, as is commonly the case in rehabilitation systems, the computer is used as an intermediary in communication between people. In this case, concentration on the HCI can distract attention from the requirements of human-human interaction (HHI), which must include much more than the verbal content.

THE PROBLEM

Human-computer interaction (HCI) has been defined (SIGCHI, 1992) as "a discipline concerned with the design, evaluation and implementation of interactive computer systems for human use and with the study of major phenomena surrounding them". In practice, most work in the area is directed towards the human-computer interface, with the object of determining how best to exchange information between a person and an inanimate entity, hardware or software, which will henceforth be referred to as the machine.

Driving machines is not always simple, but computing practice over the last few decades, including significant advances in HCI over the later parts of the period, has taught us ways of making it comparatively painless. The result is seen in modern interfaces, which - when well designed - are very effective when the primary purpose of the interaction is to cause the machine to perform some specific task. For example, people interact with machines to make them do work, give information, and provide entertainment. Most machines are used in this way for most of the time.

This is not the only case, though, and the situation is different if the machine is intended to mediate in communication between people, a category which include machines used for communication in rehabilitation systems. In such a context, the machine is being used to accomplish human-human interaction (HHI). This is a more demanding requirement. It is accepted that communication with a machine has to be conducted at a very low level of comprehension on the machine's part; despite efforts over the past decades, machines (even computers) are still stupid. In contrast, when people communicate with people, they try to convey ideas.

Communicating with other people through a mechanical medium has nothing at all to do specifically with computers; it has, presumably, been of concern ever since writing was developed, and people have tried to set down their ideas in ways which other people could understand. This is far more difficult than direct communication, and even with our millennia of experience we have not found an easy way to use writing effectively and automatically. The problem is in encoding the ideas we wish to communicate in the comparatively narrow channel of words on paper, or other medium, without the expressive qualities of voice and gesture which characterise direct human conversation. The difficulties are magnified when even narrower channels are used; examples are the simple character channel of electronic mail, and the single switch channel of many rehabilitation devices.

An example might help to clarify the distinction between HCI and HHI. The activity of an author writing text can be described in two ways, both of which are correct:

AS WRITING TEXT: The object of using word-processing software through an interface embodying conventional HCI techniques is to transfer characters to a disc file so that they can be transported to some other machine and eventually be printed. To do so, the capabilities of the software are exploited to encode certain instructions to the printer. This communication between human and machine is by far the most common application of HCI methods.

AS COMMUNICATING WITH PEOPLE: The object of writing the text is to convey ideas from the author's head to the reader's. This is not HCI - it is HHI. In communication using an unaided system such as speech or signing, encoding instructions to an intermediary is unnecessary. In each case, certain conventions are used to express feelings, add emphasis, imply comment on the primary message, and so on. In print in a conventional journal one might use devices such as italic text and page layout. In this journal, the _Guidelines for Authors_ (ITD, 1996) is more restrictive: "Information Technology and Disabilities will be available as plain ASCII text only. ..... Underlining or italics should be indicated in the following way: _Information Technology and Disabilities_. Please do NOT use any other text-marking conventions ....". How is the author to convey the essential non-verbal content of the message?

The result is that the author wishes to achieve HHI, but is constrained to make do with HCI, and must grapple with the human-computer interface to drive a machine in such a way that a reader will be led to reconstitute something like the author's original idea. The customary non-verbal conventions of direct communication are not available, so any communication which would normally be expressed non-verbally must be made explicit in some other way, or abandoned.

PEOPLE TALKING TO PEOPLE

The difficulty arises because human communication is much more than a string of words. The primary meaning of the words is affected by context, and the meaning of a phrase can be modified by non-verbal components of the communication to express emotion, humour, disbelief, sarcasm, interrogation, and other attributes which we commonly take for granted. Few communication aids make provision for this extra-verbal communication. Some people who use such aids can, and do, compensate by accompanying their words by gestures, facial expressions, mime, or some other communication channel which bypasses the machine, but these channels might not be available to those with more severe disabilities, or in telecommunication with distant partners. It is therefore important to consider how the non-verbal communication might be conveyed in an HHI communication system. These observations have obvious bearing on the design of such communication systems, for they point to the importance of identifying the required communication channels, and of finding ways to convey the information which they normally carry.

The case is strengthened by considering the complementary needs of the recipient of the message. The non-verbal information must be presented in different ways according to the recipient's abilities; a recipient with impaired hearing might require formatted text, while a recipient with impaired sight might require inflected speech. Without a means of encoding the additional material independent of the medium to be used, a special-purpose system will be required for each case.

There have been some moves towards methods for encoding non-verbal content more explicitly, but these can at best be regarded as rudimentary. For printed text, encoding techniques such as changes in typeface or layout are used routinely, but in practice at the level of HCI rather than HHI; instead of encoding the real message, with all its nuances, and having the computer present it, word processors are designed to make it easy for people to control the printing process directly. Some mark-up systems such as HTML (Raggett, 1997) and, at a slightly higher level, LaTeX (Lamport, 1986), go a step further towards HHI, making it possible for one to describe the structure of a document rather than the minutiae of its presentation, and leaving it to the implementation of a system to present the material as best suits the local facilities. (The _HTML Reference Specification_ (Raggett, 1997) includes the recommendation, "Where the available fonts are restricted or for speech output, alternative means should be used for rendering differences in emphasis".) They are also interesting in that the added information is embedded in the text stream, not imposed on the text by extra-textual operations.

Some more ambitious approaches have been investigated. In the area of speech, there has been some interesting work on generating emotional speech patterns. For example, Abadjieva, Murray, and Arnott (Abadjieva, Murray, Arnott, 1993) describe a system called HAMLET (Helpful Automatic Machine for Language and Emotional Talk), which retrieves and utters stored phrases given cues to the sort of remark required and the desired tone of voice. Even this interesting device, though, is a special-purpose system catering for only one means of input and output.

In the general area of computer-assisted communications, the few examples of such work are vastly outnumbered by developments in real-time audio and video communications. In this work, computer systems are designed to deal with interactions between people, but attention is rarely directed to the development of computer techniques which help us to encode our ideas. Instead, the common approach has been to sidestep the problem by providing such means of communication as audio and video channels, perhaps controlled and in some ways assisted by computer techniques, but still for the most part conveying information directly from person to person by the familiar channels of speech and vision. In effect, the aim has been to reproduce as far as possible at a distance the traditional conditions of human conversation where we know that by inflexions of voice, by gesture, by facial expression, by drawing or pointing or many other non-verbal means we can communicate effectively.

This solution, if it is a solution, amounts to a withdrawal of the computer from involvement in the communication process. Instead, facilities are provided, in one form or another, for people to manage all details of the communication for themselves, with all the conversational techniques simply copied from the sender's end of the system to the receiver's end. The information carried by the message - gestures, voice, intonation, etc. - is never explicitly encoded in the system, any more than the words of an audio channel are transported as ASCII characters.

This solution is not a solution for rehabilitation communication systems. It relies on the human participants being able to use the conventional non-verbal communication repertoire - which is, by definition, often not the case for people with disabilities. To people with restricted vision, gestures and facial expression are not helpful; to people with restricted hearing, intonation and prosody might not be perceptible; to people with restricted physical abilities, methods depending on physical and vocal signs might be inaccessible. To achieve HHI under these circumstances some means of explicitly encoding more of the information in the communication is essential, as information usually expressed in terms of voice or gesture must be conveyed in some other way if it is to be conveyed at all. For many people with physical disabilities who cannot use either voice or gesture fluently, other means must depend on the abilities which they have, and computer-based communication aids are well established as means for transforming messages from one form of encoding to another. Many such aids are in use, but almost all are restricted to transmitting characters or words. If a system of this sort is to mediate in HHI, it must be designed to deal with far more than character streams, and explicit means of conveying components of HHI usually carried by the other channels previously mentioned must be provided. Though realistically it is necessary to accept that complete success is - for the moment - beyond reach, it is important to investigate how some advance towards HHI, as mediated by computer systems, might be made.

DISTINGUISHING HCI FROM HHI

Some differences between HCI and HHI can be seen as consequences of the fact that in HCI people are communicating with machines, while in HHI people are communicating with people:

1: In HCI, a machine has to understand the message; in HHI, the machine need not understand the message.

2: In HCI, the human participant is talking down to a machine; in HHI, two people are talking at the same level.

3: In HCI, the target language is simple, rigid, and (to people) unfamiliar; in HHI, the target language is complex, flexible, and familiar.

The differences are seen most clearly in a comparison between HCI and HHI in their extreme forms. Possible examples might be the HCI interface used to control a computer's operating system, and HHI using a videotelephone. The language of interaction with the operating system interface is technical, but structurally simple, and every communication is designed to cause some specific action in the machine. In contrast, the language used during a videotelephone conversation is essentially the same as that used in ordinary conversation, and does not affect the actions of the machine at all.

Other systems occupy positions between these two extremes - indeed, even those examples are rarely found in their pure forms; the computer operator will choose filenames comprehensible in human terms, and the videotelephone still relies on some mechanical dialling language. It is interesting that in these compromises the contrast between HCI and HHI is brought out very clearly. The HCI of the operating system interface is composed of signals which are semantically significant to the system, but among them are certain bit strings used as identifiers which at system level have no intrinsic semantic significance except that they are distinct and used consistently. These bit strings, though, are the file names chosen by the operator specifically for their semantic connotations at the HHI level. Conversely, with the videotelephone, it is precisely because the main signals have no HCI semantic content that it is necessary to introduce the dialling operations at the HCI level.

It is suggested that this question of machine-level semantics is at the root of the distinction between HCI and HHI. HCI is, by definition, semantically significant to the machine; if it were not, it would fail. In contrast, there is no reason to expect that HHI will have any more significance at machine level than that of a bit string to be passed on accurately and speedily.

This is satisfactory when human communication is copied without change from input to output, as in the videotelephone. It is no longer sufficient once the machine is required to deal with the HHI stream in ways which depend on the stream's human-level semantics. It is self-evident that to achieve this aim it will be necessary to devise means whereby the machinery can detect the semantic features which determine the required operations.

An important consequence of the difference in semantic level between HHI and HCI can be seen in this context. In HCI, what a message is supposed to do is defined in very precise terms; in order to cause a machine to act in a desired way, the message must be equivalent to some well defined binary signal. It might begin in a different form and pass through several intermediate representations, but all these must be equivalent to, and transformed into, the necessary final binary signal. Information theory offers useful techniques for analysing and describing the properties and behaviour of such processes.

But if the sentence "Mary _had_ a little lamb" is spoken, then the five bits of information which correspond to the fact that the words "Mary", "a", "little", and "lamb" were not emphasised while "had" was emphasised are buried deep in a welter of other sonic signals. While the five bits (or perhaps a few more if the intensity of emphasis is important) are undoubtedly there somewhere, information theory can show how all the bits can safely be conveyed, how the message can be compressed while retaining all the information, and other such very useful information, but it cannot easily determine what those bits are, and much less provide the real information: either that Mary once owned a lamb, or that Mary really did own a lamb, depending on the broader context. This human-level information is very elusive - which is why automatic translation is very hard to achieve - and it has no natural machine-level representation.

VOCABULARIES

It is therefore necessary to invent arbitrary representations which work for people. These will be called _vocabularies_, though the term will be stretched beyond its conventional meaning. Vocabularies are presented as an effective basis for analysing the transmission and transformation of human communication, and it is suggested that, in much the same way as information theory provides a means of discussing information processing at the HCI level, so vocabularies can provide a means of discussing information processing at the level of HHI.

As a simple example, consider how such an approach might be used to analyse the problem of Mary and her lamb. First observe that information is conveyed by both words and emphasis, so the vocabulary must provide ways of expressing both. In practice, it seems that the two components are almost independent - if that were not so, the practice of representing emphasis by printing ordinary words in italic type would not work - so it is not unreasonable to think of two independent information channels, one for each of the components. The vocabulary must now include information about both channels, so the sentence given as an example earlier would be encoded as the sequence:

{ ("Mary", no) ("had", yes) ("a", no) ("little", no) ("lamb", no) }.

Provided that both channels can be conveyed somehow, the sense of the message is in principle preserved. The channel carrying the words is essentially a conventional system, and there are many different means of encoding the emphasis signal which could be used if the original vocal emphasis is impossible. The extended vocabulary can be implemented directly if a second physical communication channel is available, with each component carried in its own channel. If this is not possible, both word and emphasis signals must be encoded into a single channel, usually primarily intended to transmit characters; compare HTML and LaTeX. This might be an electronic channel, but the description also applies to the more traditional methods such as italic type, or even inflected speech.

COMPUTERS MEDIATING HUMAN COMMUNICATION

What part do computers play in these considerations? They have two distinct roles, which can be distinguished by the nature of the vocabularies used in each case.

If the intention is to effect communication between people at a distance by simulating face-to-face interactions, as with the videotelephone, computers can manage communications and deal with data compression and expansion. Such transformations can be seen as vocabulary transformations, but the vocabularies concerned have no connection whatever with the meanings of the messages conveyed. Instead, they are defined in terms of the purely physical characteristics of the signals - the changes in air pressure for the sound signal, and the brightness and colour of a point in a picture for the vision.

In contrast, communication through a medium such as electronic mail deals with units - characters and words - which have direct semantic significance related to the content of the message. The vocabularies used in these cases are similarly directly related to the content of the messages transmitted, as the semantic constituents of the messages are directly represented. This is not to say that the machine "understands" the communication in any way, but that the units are there and accessible for manipulation if useful transformations can be designed. Significant translation effort becomes necessary if communication between people using different vocabularies is required, or the media involved cannot carry the full vocabulary,. The example of emphasised speech constrained to pass through a character channel falls into this category.

The two roles differ in two significant ways. Perceptually, the direct transmission of vision and sound achieved by using the physical vocabularies of audio and video signals and data compression supports more natural communication than is possible with electronic mail. But the more semantically significant vocabularies used with the simpler communications methods make it possible to manage vocabulary translations which are certainly impossible at present with the more realistic signals.

This suggests that to approach HHI in cases where direct communication is not possible, methods depending on semantically appropriate vocabularies should be sought. If it is desired to transmit emphasised speech, then it follows that the sender should explicitly encode both the words (as at present, perhaps using the conventional typewriter keys) and the emphasis information - for example, by providing a separate emphasis key to be pressed when required. By this means the information needed for a step towards HHI has been acquired.

Once acquired, the information must be preserved and communicated, and we return to the question of vocabularies. In an attempt to explore design methods which could be used in developing computer software for rehabilitation purposes, Creak and Sheehan (Creak, Sheehan, 1991) suggest a systematic approach to describing signals which occur in computer systems which manage communication. The basis of their method is a careful definition of how all components of a signal are encoded as the signal passes through the system. Signals are described in terms of certain attributes of their encoding media, which may or may not be changed by system components and at interfaces between components. If loss of some aspect of the communication is to be avoided, the description of the signal should be unchanged across every interface except in matters which reflect the properties of the media concerned (in effect a requirement that plugs and sockets must match exactly), and some of its attributes must be preserved within the system components. By defining the attributes broadly it is hoped to include all significant aspects of the communication. This view is not put forward as a design technique, but rather as an aid to system specification: by discussing each interface or component with the attributes in mind, attention is drawn to possible areas of concern. An example of this method used in a discussion of alternative keyboard interfaces (Creak, 1998) illustrates its usefulness.

The attribute at the highest level is called the "human form" of the signal, and is a description of how the person sending the message intended it to be expressed. If this is framed to include the non-verbal information of HHI, then the specification will either be constructed to transmit all the information through the system, or it will become clear where the information is lost. This approach to the problem brings into prominence the necessity of encoding the non-verbal information. It is customary to encode words as characters, but by the definition of HHI this is only part of the message to be conveyed. The vocabularies discussed here must be constructed by defining patterns in the signals defined in this analysis, and therefore complement the earlier proposal.

CONCLUSION

This sort of approach has not commonly been adopted in rehabilitation computing - perhaps simply because few people have asked for the full human-form description. A case has been made elsewhere (Creak, 1997) for the importance of incorporating HHI ideas at the earliest stages of design. Lacking other instructions, the computists have done what they are accustomed to doing, which is to make computers do what they do very well, and they are accustomed to computers accepting streams of input text and doing things with them which primarily serve the purposes of communicating with computers. Insofar as text is involved, the computer has, perhaps, been thought of as a convenient way to prepare text for printing - which is to say, for communicating with a machine. The author is then left with the task of worrying about conveying the meaning, using the stratagems practised in the printing trade, and computer interfaces which help the authors to do that have been provided.

In contrast, communication with people has received much less attention. There is no provision for anything but simple text in the common forms of electronic mail, so those who use it have invented their own parallel emotion channel in the form of "smileys", encoded (like the ordinary text) in terms of the basic character channel. The multimedia approaches used in such activities as computer conferencing do not address the details of conveying non-textual (or similar) information; though they can transmit a full audio or video channel, there is little attempt to sort out the different vocabularies which have been mentioned.

The neglect can be seen as a matter of concentration on HCI rather than HHI. Current computer systems are efficient at controlling machines, but for any but the simplest implementations of human interactions their communication methods are inadequate. The gap can sometimes be patched up by simply copying all available information (a video stream, for example) and relying on the recipient to perform the analysis, but that only works if the people at the other and are able to do so. If the abilities of sender and recipient do not match, it might be impossible for the recipient to perform the required analysis, and alternative methods are required. Effort directed at identifying and separating the vocabularies used in communications could be well spent.

REFERENCES

ITD. (1996). _Information Technology and Disabilities Guidelines for Authors_, (Online) Available: http://www.rit.edu/~easi/itd/guidelines.html.

Abadjieva, E., Murray, I.R., Arnott, J.L. (1993). An enhanced development system for emotional speech synthesis for use in vocal prostheses. In _ECART 2: Proceedings of the European Conference on Rehabilitation Technology_, paper number 1.2.

Creak, G.A.. (1997 June). Notes for a seminar: Insights from a System Specification Aid. _Sigcaph Newsletter_ (58), 8-13.

Creak, G.A., (1998 "Spring").: Novel approaches to using keyboards. _Communication Outlook_ 18(2/3), 28-40.

Creak, G.A., Sheehan, R. (1991). _The representation of information in rehabilitation computing_. Auckland Computer Science Report #54, Auckland University Computer Science Department.

Lamport, L. (1986). _LATEX: a document preparation system_ ..Addison-Wesley.

Raggett, D. (1997). _HTML Reference Specification (REC-html32)_. World-Wide Web Consortium.

SIGCHI. (1992). _Curriculum for human-computer interaction_. ACM SIGCHI Curriculum Development Group

Creak, G. A. (1999). When HCI should be HHI. Information Technology and Disabilities E-Journal, 6(3).