18
16
14
12
Freezing
Average 10
Test/control
precision 8
Residual (removal)
6
Residual (no removal)
4
2
0
0
1
2
3
4
Feedback iterations
Figure 13: Average precision over 4 iterations of feedback
2.5 Summary of RF
In this section we shall summarise the main points from the previous sections and outline some of the
major issues in the core RF models. In section 2.5.1 we shall summarise the comparison between
Boolean and best match models, in section 2.5.2 we shall compare the types of best match model, and
in section 2.5.3 we shall compare the two main components of RF query term reweighting and query
expansion.
2.5.1 Boolean vs Best match
Although Boolean models are still popular and have strong advocates, e.g. [FST+99], in general there
are many advantages to best match models over exact match models. The first advantage is that the
user does not need to generate a query expression in the same way as with the Boolean model. Instead
they can enter a natural language expression. This means that users can initiate retrieval sessions
without knowledge of the collection, previous searching experience or experience in creating Boolean
queries.
A second difference is that ranking documents allows the users to interact in a more meaningful fashion
with the system, [Beau97]; documents are presented in order of match and documents are not excluded
if they miss out elements of the query.
Thirdly the system can automatically alter a query through RF. The main strength of best match models
is that they allow for iterative improvement, often using similar techniques to retrieve documents as to
modify queries. The strength of ranking models for RF is that, after initial querying, the user can
interact without further describing the information for which they are searching. The RF algorithms
discussed in the main body of this paper deal almost exclusively with best match algorithms. In the next
section we shall look at the relative performance of the best match models discussed previously.
2.5.2 Relative performance of best match models
In [SB90] Salton and Buckley investigated the relative performance of 12 feedback algorithms on six
standard test collections
18
. Several of the feedback algorithms (Ide dec hi, F4, Rocchio, and three
versions of Rocchio with scaling factors for query, relevant and non relevant set) have already been
discussed.
A further version of the Ide scheme was used, the Ide regular scheme, [IdS71], which uses all
retrieved, non relevant documents. The Ide regular is based on the Rocchio formula but omits the
18
CACM, CISI, Cranfield, Inspec, MEDLARS and NPL collections. These are relatively short document
collections ranging from 1, 033 documents (MEDLARS) to 12, 684 documents (INSPEC).
20
<
New Page 1
UK Web Hosting