combination. The overall system then selected which retrieval algorithm was giving the best
performance for the user at each feedback iteration.
Smeaton, [Sme98], suggests that retrieval strategies which are conceptually independent should work
better in combination, and that retrieval strategies that work to same general level of effectiveness
should be suitable for conjunction but again this is not always guaranteed to work. In particular, the
results presented in [Sme98] indicated that conceptual independence of techniques in retrieving
different documents did not appear to make a significant difference in experimental setting. However
support for this claim is to be found in [RLVR02a].
3.4.3 Multiple feedback algorithms
For RF, a natural combination of evidence is to combine the results of different feedback methods. This
could involve either combining the rankings given by different RF methods run on the same original
query and relevance assessments, or combining the modified queries produced by several RF methods.
This would be similar in spirit to Belkin et al.'s data fusion approach described in section 3.4. Lee,
[Lee98], examined the former approach combining rankings from multiple feedback functions, this
will be discussed separately in section 3.5. in the discussion of relevance feedback without relevance
information as this was the main area of Lee's work.
3.4.5 Summary of combination of evidence for RF
Combination of evidence has the potential to be a powerful technique for RF. However, the majority of
techniques attempted have shown that combination of evidence is a very variable technique for initial
retrieval. It will improve some queries but degrade the performance on others. In addition, it is also
very difficult to predict what evidence to combine for different collections or queries. Using relevance
information, section 3.4.1, to guide the combination process does seem to overcome at least some of
these difficulties.
3.5 Relevance feedback without relevance information
RF, as described so far, depends on a user providing relevance assessments for a sample of the
retrieved documents. An alternative approach, known either as pseudo, blind or ad hoc RF, employs
RF techniques to automatically improve a ranking before any documents have been shown to the user.
In this technique the system generates a document ranking from the initial query, selects a small number
of documents from the top of the ranking, then initiates an iteration of RF by assuming these top ranked
documents are all relevant (the pseudo relevant documents). The new query, generated by RF, is then
used to produce a new document ranking which is shown to the user. The basis behind pseudo RF is
that an iteration of feedback, based on the most similar documents to the user's initial query, will give a
better initial ranking of documents.
This technique was first suggested by Croft and Harper, [CH79], as a means of estimating probabilities
within the probabilistic model for an initial search
23
. It has since been widely investigated as a
technique for improving document rankings. Croft and Harper also pointed to the fact that this method
of improving a document ranking can suffer from one major flaw query drift. Query drift occurs when
the documents used for RF contain few or no relevant documents. In this case, RF will add terms to the
query that are poor at detecting relevance, and hence in retrieving relevant documents.
The pseudo RF technique then, works well for `good' initial queries those that are good in retrieving
relevant documents and poorly for `bad' initial queries those that are bad at retrieving relevant
documents. There are two possible solutions to this problem: either improve the initial ranking, so that
there is a greater likelihood of relevant documents being used to modify the query, or improve the
detection of relevant features, i.e. develop better RF techniques.
Mitra et al., [MSB98], have attempted, with some success, to rectify query drift by improving the
precision at the top of the documents ranking, increasing the likelihood of actual relevant material being
contained within the set of pseudo relevant documents, and hence decreasing the likelihood of query
drift. Their experiments used two approaches: a set of Boolean filters and term correlation information
to prioritise retrieval of documents that covers all aspects of a query. They found that their approaches
23
As a replacement for the idf term weighting function which is traditionally used when there is no relevance
information.
30
<
New Page 1
UK Web Hosting