recall precision figures. This method of evaluation is, then, biased somewhat towards queries that 
have more relevance assessments or those that perform poorly during initial iterations. An 
alternative, e.g. [SB90], is to only use the residual collection of both the rankings before and after 
feedback. This means that the two rankings are directly comparable but this method is really only 
suitable for small numbers of feedback iterations, otherwise the number of relevant documents in 
the residual collection can become relatively small and unrepresentative of the entire set  of 
relevant documents. 
  
freezing. The method known as freezing is based on the rank position of documents and comes in 
two forms: full  freezing and modified freezing. In full freezing the rank positions of the top n 
documents, the ones used to modify the query, and are frozen. The remaining documents are re 
ranked and RP figures are calculated over the whole ranking. As the only documents to change 
rank position are those below n (the ones used for RF) any change in RP happens as a result of the 
change of rank position of the unseen relevant documents. There is, then, no ranking effect. In 
modified freezing, the rank positions are frozen at the rank position of the last marked relevant 
document. 
The disadvantage of freezing approaches is that at each successive iteration of feedback a higher 
proportion of relevant documents are frozen. This means that the frozen section of the ranking 
contributes more to recall precision at later iterations of RF, so although RF may work better at 
these later iterations, it can appear to be performing more poorly due to the higher contribution of 
the frozen documents. 
In the previous discussion on the residual method of evaluating feedback runs, we mentioned that 
the residual collection method was forced to eliminate queries once all the relevant documents had 
been found. For the freezing methods, once all the relevant documents have been found for a 
query, recall precision figures can still be calculated. However the recall precision figures will not 
change once all the relevant documents have been frozen. Intuitively this seems correct: once we 
have found all the relevant documents for a query, feedback does not improve or worsen retrieval 
effectiveness.  
  
test and control groups. In this technique, the document collection is randomly split into two 
collections   the test group and the control group. Query modification is performed by RF on the 
test group and the new query is then run against the control group. RP is performed only on the 
control group, so there is no ranking effect. Successive queries can be run against the control group 
to assess modified queries on what can be regarded as a complete document collection unlike the 
residual ranking method. Unlike the freezing methods, all relevant documents in the control group 
are free to move within the document ranking. This means that recall precision figures, before and 
after query modification, are directly comparable. 
The difficulty with this evaluation method is splitting the collection. It is easy to randomly split a 
document collection (e.g. by putting all evenly numbered documents in test group and all odd 
numbered documents in the control group). However, a random split will not ensure that the 
relevant documents are evenly split between the two collections. Neither will it ensure that the 
relevant documents in the test group are representative of those in the control group. Other factors 
such as document length or distribution of index terms may also be important to the RF method 
being tested, and may not be equally split between the two collections. 
Each of these methods has advantages and disadvantages but all are standard methods of assessing RF 
algorithms. However, they only compare the performance of the algorithms in an idealised setting. For 
example, it is usual to use the same number of documents per feedback iteration to modify the query. A 
user, however, is unlikely to examine an identical number of documents per search iteration. Also RF 
experiments based on recall precision assume complete knowledge of the document collection: a fixed 
set of relevant documents is known beforehand. In interactive searching this is also unrealistic as what a 
user finds relevant may change over time, e.g. [Kuh93, Ell89, SW99, Vak00a]. Additional methods are 
required to test the effectiveness of RF algorithms in more realistic settings.  
A final point regarding these measures of RF evaluation is that they may not be directly comparable: 
each measure may appear to give different results depending on how the results are compared and on 
what factors affect the retrieval. An example of this is given in Table 3 which shows the results of RF 
 18 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting