The queries that do well with pseudo feedback are those queries that are already retrieving relevant
documents close to the top of a document ranking. However, those queries that do suffer from pseudo
relevance feedback are those that are already performing poorly; making these queries even worse may
hinder the use of pseudo feedback as a standard retrieval technique. An alternative suggestion to pseudo
feedback made by Buckley and Gay, [BG94], is to perform a high recall search and then a high
precision search on the retrieved documents, thus trying to help poor queries before improving the
order of retrieved documents.
4 Summary of automatic techniques for relevance
feedback
In this section we summarise the work on automatic RF techniques. It is clear from the vast majority of
work on automatic query modification that it can prove an effective, practical solution for improving
the quality of on line searching and it has been demonstrated to work well under a number of
conditions. In particular, it is a very useful technique for improving the performance of short queries or
queries which provide poor initial rankings. The basic approach of reweighting and expanding queries,
using terms drawn from the relevant documents, works well with the major contribution often coming
from the expansion component of the query modification [SB90], although this may be collection
dependent.
Although there has been a large volume of theoretical work on RF, in the foundations to the
probabilistic model for example, there remain a number of basic questions for which there are only
heuristic solutions. For example, if we choose to add only a number of terms to the query, how should
we choose how many terms to add? Similarly, how should we rank terms to give an optimal list of
expansion terms? Functions such as F4 that order terms by their discriminatory power are typically used
for this purpose but the actual performance given by these functions, and by query expansion in general,
is variable and is affected by collection, query and retrieval system used. Although the probabilistic
model, section 2.2.3, gives a strong theoretical basis for ranking documents after relevance information
has been provided, there is a lack of theoretical evidence to predict what makes a good set of expansion
terms for a given collection query system combination.
One potential solution to this problem is to involve the user in the process of modifying the query. In
section 1 we argued that one of the benefits of RF is that it requires minimal effort from the user a user
only has to identify relevant material not describe it. However we may gain a better representation of
what material is likely to be relevant if we allow the user more control over the term selection process
and also if we pay more attention to the tasks a user is trying to achieve with a system. These interactive
aspects of RF are the topic of the next section.
5 Interactive query modification
All the methods for query modification described previously automatically extract terms from
documents and add some or all of them to the query. A natural alternative is to allow users to select the
terms to be added interactive query expansion (IQE). The user, who has the best insight for
determining relevance, then has more control over which terms are added to the query. The strength
that is claimed for IQE is that the user can select better query expansion terms than the system. In this
section we shall look at the basic research on IQE, section 5.1, examining how terms should be ranked
for presentation to the user, section 5.2, and the effectiveness of IQE against automatic query expansion
(AQE), section 5.3.
5.1 Fundamentals of IQE
In addition to investigation ranking functions for query expansion, Harman, [Har88], investigated the
possible effectiveness of an interactive approach to query expansion. The experiments she carried out
were designed to test how effective query expansion could be if the user selected expansion terms from
a list of terms that were pre selected by the system.
She performed an initial experiment, on the Cranfield 1400 test collection, in which a variable number
of possible expansion terms
25
were added to the query. This experiment gave two main conclusions.
First, she found that different methods of sorting the expansion terms gave different performance: some
25
With no reweighting of the query terms.
32
<
New Page 1
UK Web Hosting