# Multimodal Presentation of Information in Ambient Intelligence

C. Jacquet, Y. Bellik, Y. Bourda, J.P. Sansonnet,

## Abstract

This work deals with the design of multimodal information systems in ambient intelligence. Its agent architecture is based on KUP, an alternative to traditional software architecture models for human-computer interaction. The KUP model is accompanied by an algorithm for choosing and instantiating interaction modalities. The model and the algorithm have been implemented in a platform called PRIAM, with which we have performed experiments in pseudo-real scale.

## Introduction

Users of public places often have difficulties to obtain information that they need. For instance, when a passenger arrives in an airport, he does not know where his boarding gate is located. So as to provide users with potentially useful information, staff generally place information devices in specific locations. These can be screens, loudspeakers, interactive information kiosks, or simply display panels. However, these information sources give non-targeted, general purpose information suitable for anyone. In consequence, they are generally overloaded with information items, which makes them difficult to read. Yet, a given user is generally interested in only one information item: finding it among a vast quantity of irrelevant items can be long and tedious (see fig.1).

Figure 1: Three travellers looking for the boarding gates at the Charles-De-Gaulle Airport (Paris).

Indeed, it is no use presenting information that nobody is interested in. Therefore, we propose an ubiquitous information system that is capable of providing personalized information to mobile users. For instance, monitors placed at random in an airport could provide nearby passengers with information about their flights. Only the information items relevant to people located in front of the screens would be displayed, which would improve the screens’ readability and and reduce the users’ cognitive load.

On the other hand, classical information devices are often not suited for handicapped people. For instance, an information screen is useless to a blind person. Similarly, a deaf person cannot hear information given by a loudspeaker. For these reasons, we focus on multimodal information presentation. One given device will provide information to a user only if its output modality is compatible with the user’s input modalities. This way, the system will avoid situations in which people cannot perceive the information items.

## The KUP model

As people rapidly move from place to place in public spaces, they will not necessarily be able to perceive a presentation device (look at a monitor or listen to a loudspeaker) when a given information item is made available. In consequence, the system must ensure that this information item is presented to them later, when a suitable device becomes available. This leads us to consider two unsynchronized phases:

1. in a first phase, an information item is "conceptually" provided to the user,
2. in a second phase, this information item is physically presented to the user, through a suitable device and modality (text on a screen, speech synthesis from a loudspeaker, etc.)

To manage these two phases, we have introduced the KUP model. The KUP model is a software architecture model for ambient intelligence systems. It takes three logical entities into account:

• knowledge sources, for instance the information source about flight delays in an airport. They are denoted by $K_\ell$,
• logical entities representing users, denoted by $U_\ell$,
• logical entities representing presentation devices, denoted by $P_\ell$.

These logical entities correspond one-to-one to physical counterparts, respectively:

• the spatial perimeter (zone) in which a certain knowledge is valid, denoted by $K_\varphi$,
• human users, denoted by $U_\varphi$,
• physical presentation devices, denoted by $P_\varphi$.

In the first phase, a knowledge source $K_\ell$ sends an information item to the user entity $U_\ell$. In the second phase, the user entity $U_\ell$ asks a presentation entity $P_\ell$ to present the information item. This results in a presentation device $P_\varphi$ presenting the information for the human user $U_\varphi$.

Most software architecture models for HCI (e.g. MVC, Seeheim, ARCH and PAC) rely on logical representations for the functional core and the interface only. There is no active logical representation of the user. In contrast, this entity lies at the center of the KUP model (see fig. 2).

Figure 2: In classical architecture models, the user is not logically represented. In KUP, a user entity lies at the center of the system.

## Choosing a modality

The problem that we have to solve is as follows: a given user wishes to have a given semantic unit (information) presented on a given presentation device. The system must choose a modality, and instantiate it, in order to present the semantic unit. The modality and its instantiation must be compatible with both:

• the user’s capabilities (e.g. one cannot use a visual modality if the user is blind) and preferences (e.g. if a user prefers text to graphics, the system must try and satisfy this wish),
• the presentation device capabilities (e.g. a monochrome screen is not capable of performing color output),
• the semantic unit’s capability to convey its informational content in different modalities.

If there are several possibilities, the system should choose the user’s preferred solution among them.

To solve this problem, we associate a profile with the user, the presentation device and the semantic unit. These profiles describe interaction capabilities and possibly preferences, i.e. which modalities can be used, which attribute values are possible. The solution will have to comply with each profile, therefore it will lie at the "intersection" of the three profiles. We define a profile as a weighting tree of all available modalities. A real number, comprised between 0 et 1, is associated with each node (modality) of the tree. 0 means that the corresponding modality (or the corresponding sub-tree) cannot be used; 1 means that it can be used; values in-between can indicate a preference level. For instance, in the profile of a blind person, the sub-tree corresponding to visual modalities is weighted by 0, so that it cannot be used. The nodes’ weights will determine the choice of a modality. Similarly, attributes are "weighted" too, which will help instantiating the chosen modality. More precisely, each possible value of an attribute is given a weight between 0 and 1, with the same meaning as above. Formally, a weight function is associated with the attribute, whose domain is the attribute’s possible values, and whose codomain is the [0, 1] interval. Fig. 3 is an example of a partial profile (it only contains two concrete modalities). The profile describes a user with a visual impairment, whose native tongue is English. The node weights are shown in white characters inside black ovals. Since the user is visually impaired, but not blind, the weight of the visual modality is low, but not zero.

Figure 3: A partial profile (for the sake of clarity, some attribute weight functions are not shown).

To select a modality, the system has to take the three profiles into account (user’s, presentation device’s, semantic unit’s). To this end, we define the notion of intersection of profiles. The intersection of n profiles $p_1, \dots, p_n$ is a profile (i.e. a weighted modality tree), in which weights are defined as follows:

• the weight of a node is the product of the n weights of same node in the profiles $p_1, \dots, p_n$,
• the weight function of an attribute is the product of the n weight functions of the same attribute in the profiles $p_1, \dots, p_n$.

We call it intersection because it has natural semantics. Indeed, a given node is weighted by 0 in the resulting profile if and only if there is at least one of the intersected profiles in which the given node is weighted by 0. The resulting profile contains information about which modalities can be used to present a given semantic unit to a given user, on a given presentation device. It also contains information to determine the values of the attributes of the chosen modality (instantiation). First, the system has to choose a concrete modality, i.e. one of the leaves of the tree. To do this, it evaluates each leaf. The evaluation of a leaf is a real number that accounts for the weights that have been assigned to all its ancestors in the weighted tree. If an internal node has a null weight, it means that the corresponding sub-tree cannot be used, so all its leaves must evaluate to zero. We could therefore define the evaluation of a leaf to be equal to the product of all the ancestor node weights. However, in this case leaves with many ancestors would by nature be more likely to have a small evaluation than leaves with fewer ancestors. To avoid this shortcoming, we define the evaluation of a concrete modality (i.e. a leaf), to be the geometric mean of all its parent modalities’ weights (including its own weight). More precisely, if $w_1, \dots, w_m$ are the node weights along a path going from the root (weight $w_1$) to the concrete modality (weight $w_m$), then the evaluation is:

$e = \sqrt[m]{w_1 \times w_2 \times \cdots \times w_m}$

From that, we decide to choose the concrete modality with the highest evaluation. Fig. 4 gives an overview of the various steps described above.

Figure 4: Overview of the algorithm for choosing a suitable modality. First, profiles are intersected, which gives out a list of usable modalities. Each possible instantiation of these modalities is evaluated, so as to choose the best one.

## Evaluation

We have built an implementation and an evaluation of the framework described above. It is called PRIAM, for PResentation of Information in AMbient intelligence. The goal of the evaluations is to demonstrate the interest of dynamic information presentation systems for mobile users. The evaluations are based on screen displays. Proximity among screens and users is sensed thanks to infrared badges. Other techniques could have been used, such as RFID, but infrared present a significant benefit: they not only allow the detection of people’s proximity, but also of people’s orientation. This way, someone who is very close to a screen, but turning her back to the screen, is not detected. We performed an evaluation so as to assess the impact of dynamic display of information in terms of item lookup time. 16 subjects had to find an information item among a list of other similar items. We proposed two different tasks:

• to find a mark obtained at an examination,
• to find the details about a flight.

We measured the lookup time for each user, with respect to the number of users standing in front of the list. There were 1 to 8 simultaneous users (see fig. 5), which seems to be realistic of the maximum number of people who can gather around the same display panel. On the one hand, in control experiments, users were presented with fixed-size dynamic lists, containing 450 examination marks or 20 flight details. On the other hand, when using the dynamic system, the display panel showed only the information relevant to people standing at proximity (i.e. 1 to 8 items).

Figure 5: Mark lookup in a list (in the back) or on a screen (on the left). This is a picture from the experience video.

This experiment showed that information lookup was far quicker when information display was dynamic:

• as for mark lookup (see fig. 6), lookup times were 51 % to 83 % shorter (depending on the number of users), and in average 72 % shorter,
• as for flight lookup (see fig. 7), lookup times were 32 % to 75 % shorter (depending on the number of users), and in average 52 % shorter.
 Figure 6: Mark lookup time, with respect to the number of people. The vertical bars represent standard deviations, the dots average values. Figure 7: Flight information lookup time, with respect to the number of people. The vertical bars represent standard deviations, the dots average values.

The evaluations have shown the benefits of dynamic display of information for mobile users. This turns out to allow very quick lookup of information in lists.However, people were generally disturbed by the items dynamically appearing and vanishing, which caused complete redisplays each time, because the lists were constantly kept sorted. This problem could be addressed by inserting transitions when adding and removing items, or by adding new items unsorted at the bottom of the lists.

## Conclusion and perspectives

We have proposed a model and an algorithm that enable the design of multimodal information presentation systems. These systems can be used to provide information to mobile users. They intelligently make use of public presentation devices to propose personalized information. We have performed evaluations in pseudo-real conditions, which leads us to consider the following perspectives :

• On a given screen, it could be interesting to sort the various displayed semantic units according to different criteria rather than just alphabetically or in a chronological way. A level of priority could thus be given to each semantic unit. This would for instance allow higher-priority semantic units (e.g. flights which are about to depart shortly, or information about lost children) to appear first. Similarly, there could be priorities among users (e.g. handicapped people, premium subscribers would be groups of higher priority). Therefore, semantic units priority levels would be altered by users’ own priorities.
• In this paper, proximity was binary. Actually, it is possible to define several degrees of proximity, or even a measure of distance. These degrees or distances could be used as parameters of the instantiation process. For instance, text displayed on a screen could be bigger when people are farther away.
• If a user is alone in front of a screen, then only her own information item is displayed, for instance the destination of her plane. This can raise privacy concerns if someone is watching from behind. However, this can be solved by displaying one or two randomly chosen irrelevant items on the screen, thus confusing the badly disposed persons.
• Our first experiments took place in simulated environments (a room and a corridor in our lab). So in the short term, we plan to carry out real-scale experiments, for instance in an airport or train station. Their goal will not be to test and validate the algorithms, because we have already verified their behavior with the simulator and the experiments, but rather:
• to evaluate the overall usability of the system: how do users react to such a highly dynamic system?
• to study the sociological impact of this system,
• to test the platform’s usability: is it easy to create an application? what are the guidelines to follow?

## References

• C. Jacquet, Y. Bellik and Y. Bourda, "KUP: A Model for the Multimodal Presentation of Information in Ambient Intelligence", in Proc. of the 3rd International Conference on Intelligent Environments, IE07, Ulm, Germany, 24-25 Sept., 2007.
• C. Jacquet, Y. Bellik, Y. Bourda, "KUP : un modèle pour la présentation opportuniste et multimodale d'informations à des utilisateurs mobiles ", dans les actes de la 19ème conférence francophone sur l'Interaction Homme-Machine, IHM 2007, Paris, 13-15 Nov., 2007.
• C. Jacquet, Y. Bourda, Y. Bellik, "A Component-Based Platform for Accessing Context in Ubiquitous Computing Applications", in Journal of Ubiquitous Computing and Intelligence (JUCI), Special issue "Ubiquitous Intelligence in Real Worlds", L. T. Yang & J. Ma Eds., 10 pages, 2006.
• C. Jacquet, Y. Bellik, Y. Bourda, "KUP, un modèle pour la présentation multimodale et opportuniste d'informations en situation de mobilité", dans la revue Ingénierie des systèmes d’information (ISI), numéro spécial "Adaptation et gestion de contexte", Vol. 11, n°5, 25 pages, Décembre 2006.
• C. Jacquet, Y. Bellik, Y. Bourda, "Dynamic Cooperative Information Display in Mobile Environments", in Proc. of the 10th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems, KES 2006, Bournemouth, England, 9-11 Oct. 2006, 8 pages.
• C. Jacquet, Y. Bellik, Y. Bourda, "PRIAM : affichage dynamique d’informations par des écrans coopérants en environnement mobile", dans les actes des 3e Journées Francophones Mobilité et Ubiquité, UBIMOB 2006, Paris, 5-8 Septembre, 2006, 8 pages.
• C. Jacquet, Y. Bellik, Y. Bourda, "KUP, un modèle pour la présentation multimodale et opportuniste d'informations en situation de mobilité", dans la revue Ingénierie des systèmes d’information (ISI), numéro spécial "Adaptation et gestion de contexte", 25 pages, Déc. 2006.
• C. Jacquet, Y. Bourda, Y. Bellik, "An Architecture for Ambient Computing", in Proc. of the IEE International Workshop on Intelligent Environments, H. Hagras and V. Callaghan (Eds.), Colchester, UK, june 2005, pp. 47-54.