were asked to assess relevance but the test collection relevance assessments
30
were used to generate
expansion terms. This was to ensure that the terms used for expansion were the same for all users, and
were the same as in the experienced user simulation. This aspect of the experiment was hidden from the
searchers.
For all queries, the users failed to reach the potential effectiveness of the simulated user and on the
whole failed even to reach the level of AQE. So although IQE can improve retrieval effectiveness and
can demonstrate consistent improvement over a set of queries, the subjects in this set of experiments
failed to demonstrate the ability to make good term selections. This is a vital point for IR: if IQE is to
realise the experimental potential demonstrated in Harman s earlier experiments, it is necessary to
facilitate the selection of good query terms. How this process of iteratively developing a query can be
made easier requires a more careful analysis of what processes users follow within IQE. We look at this
in the next section.
5.4 Using IQE
In this section we present three investigations on user behaviour when interacting with an IQE system.
The results from these investigations are not consistent. However the very lack of consistency across
the experiments highlight important aspects of IQE and user interaction. They also highlight the fact
that it is difficult to predict, or make assumptions, about what functionality users want from IQE or IR
systems.
Beaulieu, [Beau97], as part of the ongoing work on the Okapi probabilistic system, carried out an
investigation of three interfaces to IR systems. One of these only offered AQE, two offered IQE. The
systems, unlike many query expansion systems, were not investigated through laboratory investigation
but through operational investigation: the systems were used as an interface to a university library
catalogue.
The first interface offered only AQE. The user was asked, for each document viewed, if the viewed
document was similar to what documents s/he would like to retrieve. If the user's answer was yes, then
they were offered the option of searching for similar documents. The query modification was hidden
from the user; the users only saw the results of the new search. In operational trials, the uptake rate was
around 33% percent (number of users trying the AQE option) and this led to retrieval of further relevant
items in around 50% of the searches
31
.
The first IQE system was based on a series of overlapping windows with separate windows for query,
relevant titles, and the retrieved set of titles. The user was asked the same relevance question as in the
AQE case ( Is this the sort of thing you are looking for? Y/N ). If the user answered yes, the document
title was added to a list of titles of relevant documents. Users requested term suggestions by the use of
an Expand Search button that caused the system to extract the top 20 expansion terms for display to the
user. Users could then select those terms that they would like to use in a modified query. Uptake on this
system was only 11% and query expansion only led to the retrieval of further relevant documents in
31% of the searches in which users tried IQE.
The results are significant for a number of reasons, relating to both the performance and behaviour of
the IQE system. The take up rate (number of users using query expansion) and the increase in relevant
documents found after query expansion were both lower in the IQE system than with AQE. Users
tended to select terms very strictly, with 50% of users reporting that they found it difficult to select
appropriate terms, and around 25% of users editing their original query rather than modifying their
query through the IQE facility.
A third interface was developed to give the user more information on which to base their choice of term
selection. A number of changes were made to the system design:
i.
the overlapping windows design was replaced by a multiple pane single window design.
ii.
an interactive thesaurus component was added which allowed the users to view terms related
to the initial query terms.
30
These were the relevance assessments associated with the WSJ test collection, rather than the assessments given
by the users in the course of the experiment.
31
Measured by analysis of search logs.
36
<
New Page 1
UK Web Hosting