This lead to the notion of relevance feedback (RF): users marking documents as relevant to their needs
and presenting this information to the IR system. The system can then use this information
quantitatively retrieving more documents like the relevant documents and qualitatively retrieving
documents similar to the relevant ones before other documents. The process of RF is usually presented
as a cycle of activity: an IR system presents a user with a set of retrieved documents, the user indicates
those that are relevant and the system uses this information to produce a modified version of the query.
The modified query is then used to retrieve a new set of documents for presentation to the user. This
process is known as an iteration of RF.
The mechanism by which an IR system uses the relevance information given by the user is the main
focus of this paper. The paper covers several aspects of RF: the representations used in RF, how these
representations lead to deciding how to modify a query and the role of interaction in RF. Section 2
presents a brief discussion of the retrieval process as a whole and outlines how RF has been
incorporated into the major retrieval models. In section 3 we discuss extensions and modifications to
the traditional models of RF.
Historically, most RF approaches have been based on automatic techniques for modifying queries. In
section 4 we summarise these approaches. More recently, a number of researchers have examined the
role of the user in RF and have presented techniques designed to increase the interaction between the
user and system in RF. These interactive techniques are the main topic of section 5. In section 6 we
describe interfaces specifically designed to facilitate RF, in section 7 we outline some of the important
aspects the user that are important to RF, and we conclude this overview in section 8.
2 The information retrieval process
The IR process is composed of four main technical stages. The first stage, indexing the document
collection, during which the documents are prepared for use by an IR system, is discussed in section
2.1. Document retrieval, the process of selecting which documents to display to the user, is described in
section 2.2. The presentation of retrieved documents and the evaluation of the retrieval results are
discussed briefly in sections 2.3 and 2.4 respectively. In the section on retrieval we shall outline the
basic approaches to RF in the major retrieval models. In section 2.5 we shall summarise the difference
between these main approaches to RF.
2.1 Indexing
For small collections of documents it may be possible for an IR system to assess each document in turn,
deciding whether or not it is likely to be relevant to a user's query. However, for larger collections,
especially in interactive systems, this becomes impractical. Hence it is usually necessary to prepare the
raw document collection into an easily accessible representation; one that can target those documents
that are most likely to be relevant, for example those documents that contain at least one word that
appears in the user's query.
This transformation from a document text to a representation of a text is known as indexing the
documents. There are a variety of indexing techniques but the majority rely on selecting good document
descriptors, such as keywords, or terms, to represent the information content of documents. A good
descriptor for IR is a term that helps describe the information content of the document but is also one
that helps differentiate the document from other documents in the collection. A good descriptor, then,
has a certain discriminatory power
1
. This power of a term in discriminating documents can be used to
differentiate between relevant and non relevant documents, as will be discussed in the section on
retrieval.
Figure 1 outlines the basic steps in transforming a document into an indexed form. The first stage is to
convert the document text (Document text, Figure 1a) into a stream of terms, typically converting all
the terms into lower case and removing punctuation characters (Tokenisation, Figure 1b).
1
See [VR79], Chapter 2, for a more detailed explanation of the trade off between the descriptive and
discriminatory power of terms.
2
<
New Page 1
UK Web Hosting