work well for manually and automatically created filters, however around 25% of the queries still suffer 
from query drift. 
Buckley et al., [BSA+95], also looked at improving precision at the top of the initial document ranking. 
They used massive query expansion (500 terms and ten phrases   commonly occurring pairs of words) 
from the top 30 retrieved documents. Their experiments produced significantly better results than with 
no feedback, particularly with respect to the precision of the new document ranking. 
Most other researchers have concentrated on improving the feedback used in the pseudo RF 
approaches. Efthimiadis and Biron, [EB94], for example, found in their experiments that standard RF 
techniques used in pseudo RF experiments performed only slightly poorer than experiments using RF 
from users and with no feedback. However, the actual performance varied according to the algorithm 
used to rank terms for query expansion. Robertson et al., [RWJ+95], also found increased performance 
over no feedback, especially when using passages rather than the whole document, to select expansion 
terms  
In [Lee98], Lee proposed an ad hoc RF technique based on multiple RF techniques. The basic 
hypothesis is that, as different RF techniques may produce different modified queries, and different 
queries will retrieve different documents, then using a combination of RF techniques to modify queries 
will retrieve more of the relevant documents. An initial experiment was carried out treating the top 30 
documents as relevant and using a vector space retrieval function. This experiment compared the 
documents retrieved after performing pseudo retrieval using a Rocchio technique, Ide dec hi, F4, a 
variant of F
24
4 , and a simplified version of Fuhr s RPI formula, [FB91], Equation 17. 
    
    
n
n
p
(1-
q
)
rel
w
nonrel
w
w
i
i
    
    
ri
ri
qi
= log
, p
    
    
i
=
  
, q
i
=
  
    q
(
1
-
p
)
n
rel
n
nonrel  
i
i
    
r =1
n=1
Equation 17: Version of RPI used in [Lee98] 
This experiment validated Lee's initial hypothesis: different RF techniques retrieved different 
documents although the different RF algorithms performed at approximately the same level of retrieval 
effectiveness. The similarity of the documents retrieved by each RF algorithm varied according to the 
RF technique used (e.g. the two F4 techniques retrieved very similar documents but Rocchio compared 
with the modified F4 formula only had about 50% of documents in common). The difference between 
the various RF techniques was also reflected in the query terms used to expand the query. 
A second experiment combined the rankings, after normalisation of similarity values, obtained from the 
different modified query vectors. Combination of the rankings can provide significant improvements in 
effectiveness over the single RF methods. However more combination is not always better: 
combinations of two or three RF algorithms generally performed better than combinations of four or 
five RF algorithms.  Given that the algorithms produce different rankings, after new retrieval, one might 
expect that the more different are the rankings, the better the combined performance. However, Lee's 
experiments did not generally demonstrate this conclusively.  
Although the pseudo RF techniques described in this section can improve retrieval performance over 
not using pseudo RF, the problem still remains that it is a variable technique: some queries will be 
improved, others will be harmed. Several of the authors mentioned indicate that uncovering more 
details about the collection statistics, documents being used for RF and query characteristics may be 
used to predict which queries should be used for pseudo RF. For example, Lindquist et al., [LGF97] 
investigated various parameters for automatic RF using the vector space model and found optimal 
performance was gained using between 5 20 documents and 1 20 terms for feedback. They also provide 
support for weighting new query terms against original query terms, using within document term 
frequency and thresholding the query terms (only performing relevance feedback on queries that have 
terms with a high idf value). This leads to the suggestion that certain characteristics of a term may be 
good at predicting how the query is likely to improve given expansion by that term, which may be 
useful in pseudo feedback. 
                                                           
24
 [Rob86], Equation 12, 
 31 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting