Reviewed by
Scott Buckley
Northern Illinois University
November 1997

Maybury, M. T. (Ed.). (1997). Intelligent Multimedia Information Retrieval. Menlo Park, CA: AAAI Press. Reviewed by: Scott Buckley.


Someday, computers will give us easy access to extensive, searchable archives of mixed text, graphics, sounds, narrations, and video footage. At least that's been the dream for the past ten years or so. This collection of twenty-two papers explores the software challenges involved in fulfilling that promise. The articles are divided into seven sections that together represent three major functions of multimedia delivery systems:

    Processing and indexing data sources
    Searching and retrieving specified data
    Presenting the search tools and results to the user

After reading the book, it is clear that the technical complexities of each function are formidable and that practical, fully functional systems will not be available for many years. From the standpoint of interface design, however, the book offers a unique perspective which may prove valuable to software developers today.

The word "intelligent" in the book's title describes the biggest obstacle in developing an ideal multimedia archival system. It is relatively straightforward to watch video clips or listen to sound files and attach lists of words that describe the contents. These indexes, however, will be subjective, and can never fully describe the associated media clip. To be truly useful, a multimedia system must allow a user to define patterns of sound and/or imagery for which the retrieval engine will then search. Moreover, the retrieval engine must be capable of understanding and applying the semantics of human language and meaning.

Suppose that a person were to ask a computer-based multimedia system to find all media sources that refer to clocks. The system would need to understand the concept "clock" and select media files that include any object that visually resembles a clock, as well as any text or sound files which mention the word "clock" itself, or the names of specific types of clock such as "hourglass," "sundial," and so forth. Clearly, the development of multimedia retrieval systems is closely tied to the field of artificial intelligence.

Thirteen of the papers in the book focus on approaches to analyzing sound or image streams prior to retrieval. Although these articles are highly technical, more general approaches to user interface design emerge. For example, the first chapter describes the QBIC system used in several IBM products to abstract and search image databases for specific visual patterns. Part of the QBIC system is a drawing tool which permits the user to sketch patterns for which the retrieval engine will search. Likewise, Chapter 5 discusses Sagebook, a system for browsing, searching, and customizing examples of graphics used to visually represent data. The authors spend considerable time discussing implications for the user interface. One particular concern is that users often don't know how to describe what they want. They conclude that a successful interface must assist the user in refining queries and understanding the search tools.

The emphasis of the second half of the book moves from data analysis to retrieval and query techniques. Here again, the papers stress the need for programming techniques based on artificial intelligence. Topics include the use of intelligent agents for searching and retrieval; interactive visualizations of data sets during the query and retrieval processes; and hypertext systems that build adaptive models of the user based on navigation patterns.

While these programming models are advanced, the authors suggest many considerations for user interface design that apply to interface design today. Chapter 18, for example, describes HYNECOSUM, a program developed for use by medical staff in a hospital environment. The program builds a model of the user and modifies the interface according to the experience level of the user. In testing the program, the author found that users were confused when the contents of screens seemed to change inexplicably. To minimize confusion, the program was modified to notify the user of the proposed changes and ask permission to alter the interface. Such an approach could well apply to applications developed with current authoring tools that use conditional branching structures.

The final two chapters of the book describe experiments designed to study how users engage with computer interfaces. Of particular interest is the final chapter, in which subjects are given an assortment of computer text, graphic, and video files describing events in the life of a fictitious race car driver. The subjects are required to open the files to answer a list of questions concerning details of the driver's life.

In the study, the subjects showed a strong preference for using text files, even when searching for details that might have been more easily found in a graphics file. Also, the subjects tended to access one file at a time, opening and closing windows as required, rather than tiling multiple windows to assist in visual cross-referencing of the information. This suggests the need for multimedia systems to present tiled, multi-window interfaces that assist the user in establishing associations between different types of media.

Software developers looking for simple, practical guidelines to interface design should probably avoid this book. The concepts and methodologies presented are abstract and theoretical, better suited to researchers in programming logic and artificial intelligence. Meaningful guidelines for interface design can be found, but only with significant effort.

However, those interested in the future of interface design should read this book. Most of the systems described have been implemented in pilot projects with promising, if limited results. More importantly, one common theme emerges from the various articles: For computers to fulfill their promise as information resources, artificial intelligence techniques must be employed in the processing and retrieval of data, as well as the interface tools for query and display. With ever-increasing processing power and more sophisticated programming techniques, the promise may well be fulfilled. If nothing else, a software developer who understands the potential of intelligent computer interfaces may be better able to use conventional techniques to improve the effectiveness of today's software, while waiting for the more powerful tools of the future.