F. Bouchet, J-P Sansonnet
The need of an assistance corpus
Various studies have shown that ordinary users are still unsatisfied with the traditional help systems available for
computer software (Help files, searches by keyword or long FAQ). We consider the possibility to enhance their experience
by using Embodied Conversational Agents (which benefits in Human-Computer Interaction have been proven) embedded to the
In order to offer an efficient assistance to those ordinary users’ problems, we have to conceive Rational Assisting Agents
able to understand both the problems users are facing and the way they express their need for assistance. If at least some
of the problems can be forecasted by analyzing software applications, the second point clearly can’t be dealed with without
a corpus of real people’s requests collected in situ.
Corpus collection and building
The Daft corpus is a set of 11.000 isolated assistance requests (in French), built up in 3 steps:
- Original collection: for two years, about 100 human subjects have been asked to use several applications
(from simple applets to dynamic websites) embedding an ECA powered by the first version of the DAFT system,
providing 5.000 original typed requests.
- Cover enhancement: to improve the cover of the assistance language domain, the corpus size has been significantly
increased by adding different formulations of original requests inspired from thesauri.
- Application domain expansion: by adding 3.000 questions from FAQ of text processing systems, we handled the concern
raised by the “simplicity” of some of the applications used in the original collection process and reinforced the proportion
In order to characterize the Daft corpus, and indirectly to validate its specifity, it has been compared to similar task
oriented ones, from a speech acts point of view. We used 3 other corpora for this comparison:
- the Switchboard corpus: 200.000 manually annotated utterances from task-oriented phone talks),
- the MapTask corpus: 128 dialogues in which one person has to draw a map following another person instructions,
- the Bugzilla corpus: 1.200.000 comments from 128.000 bug reports created during the development of the Mozilla
We used the Searle classical taxonomy of speech acts as the reference, converting the original corpora taxonomies into to
this one to be able to compare their repartition:
Despite the difficulty to convert some acts (which explains the unknown additional category), the Daft corpus clearly appears as different from the other ones. We can especially note:
- A majority of directives (57%), explained by the high number of orders and questions. This phenomenon can be explained
by the fact users are more direct when interacting with computer than with other humans (as it’s the case for the 3 other corpora).
- A rather low proportion of assertives (13%), as users seem to prefer to express their feelings and states of mind (29%)
rather than more objective and neutral facts.
- Very few commissives (1%), which is explainable by the nature of the user-agent relationship, where the latter is subordinate to the former.
Conversational activities of the Daft corpus
During the corpus collection phase, human subjects were informed they had some tasks to do for which they could ask some
help (if needed) from an artificial assistant agent. However, they were completely free of their action, and particularly
could type whatever they wanted without any constraint. Various behaviors were observed, some users ending up abandoning
their original task, and hence the collected corpus reflects this diversity.
After having randomly extracted two independent subsets of one tenth of the corpus size, we manually gathered requests by
similar activities, which allowed us to distinguish 4 main activities:
- Control activity: direct controls to make the agent interact directly with the application software by himself.
- Direct assistance activity: explicit help requests from the user.
- Indirect assistance activity: user’s judgments about the software/agent, revealing an actual need of assistance.
- Chat activity: where the user generally focuses more on the agent than the application itself, and which can
itself be divided into:
- reactions to an agent’s answer: a set of way to agree/disagree, expressions of incredulity (“doubt it”), lack of understanding (“you lost me”) or insistence (“please answer”).
- communicative functions: forms used to start/end the communication, as well as some phatic acts (“are you there?”).
- dialogue with the agent: from orders (“shut up”) to questions (“do you have a soul?”) and from threats (“don’t force me to kill you!”) to compliments (“you look cute”).
- comments on the application: without any assistance value (“this page is nice”).
- others : “I’m an ordinary user”, “I want to do a cognitive gift”...
The existence of chat and control subcorpora reveals that users actually expect the agent to be able not only to bring them
assistance, but also to do things in the application by himself (and for them) as well as to have capacities to react to
comment not related to the global task they are working on (phenomenon certainly reinforced by the use of a visible embodied
For more Information and online documentation: see The Daft project homepage
F. Bouchet, Caractérisation d’un Corpus de Requętes d’Assistance,
RECITAL'07, Toulouse, June 5-8th 2007
F. Bouchet, J-P. Sansonnet, Étude d’un corpus de requętes en langue naturelle pour des agents assistants,
WACA 2006, Toulouse, October 26-27th 2006
- F. Bouchet, Conception d'un langage de requętes pour un agent conversationnel assistant,
Master Research Thesis (in french), September 2006