on the same collection
15
 but evaluated using the three RF evaluation schemes. An initial document 
ranking, for each query, was obtained using the idf weighting function, followed by four iterations of 
RF, in which the top 6 expansion terms were added, based on an F4 ranking of expansion terms. 50 new 
documents were used in each iteration of feedback. After feedback all query terms were weighted using 
the idf weighting scheme and these values were used to score documents. Table 3 gives the percentage 
change, over no feedback, after four iterations of feedback using each of the three evaluation 
techniques. 
AP 88 
Full  
Residual  
Residual  
Test and  
freezing 
collection 
collection 
control 
(removal) 
(no removal) 
%age increase over 
+2.9%  77.0% 
 25.0%  +21.5% 
no feedback 
Table 3: Example RF evaluation 
As can be seen from Table 3, the results vary according to how they describe the retrieval effectiveness 
of the system. Full freezing (column 2) gives a small increase in the effectiveness of the system. The 
test and control method gives a larger percentage increase in effectiveness (column 5). These two 
approaches give different absolute performance figures (average precision) as they use different data to 
calculate  idf values, F4 values and do not have identical terms in the collection. The test and control 
method used two less queries (as all the relevant documents for this query appeared in the test 
collection), and several of the queries were expanded by terms that appeared in the test collection but 
not the control collection
16
. These differences cause the different performance figures for the two 
evaluation methods. 
The residual collection method (column 3) gives a large drop in retrieval effectiveness. This is because 
the residual collection method eliminates queries that have no relevant documents in the residual 
section of the collection. This means that queries, for which all relevant documents have been retrieved 
in early iterations of feedback, have been removed from the evaluation. The queries that are being used 
to calculate average precision are the ones for which the system finds it difficult to retrieve the 
remaining relevant documents
17
. If we do not remove queries when all relevant documents are found 
and, instead use the RP figures from the previous iteration, then we obtain the figure in column 4 for 
residual collection. This is an attempt to soften the effect of removing queries that perform well. This 
also shows a drop in retrieval effectiveness but not so severe a drop as in column 3. The drop in 
retrieval effectiveness is caused, again, by the effect of the queries for which the system finds it difficult 
to retrieve all relevant documents. 
An alternative method of examining RF performance is to plot the average precision values at each 
iteration of feedback, as in Figure 13. We can see that different methods give different shaped graphs. 
The freezing graph gives slight, but steady, increases in retrieval effectiveness at each iteration of 
feedback. The test and control method gives an initial large increase followed by decreases at the last 
iteration of feedback. The residual methods, however, give very different, but similar shaped graphs: 
large decrease initially followed by increases in performance at later iterations. 
The graphs can be used to highlight interesting areas   where RF is working well or where it is 
operating poorly. However as with recall and precision the graphs can be misleading: all four lines 
plotted in Figure 13 are evaluating the same feedback technique on the same collection. The point is 
that the evaluation measures are calculating different aspects of feedback: freezing is measuring 
cumulative effectiveness, residual collection is measuring the effectiveness of retrieving only the 
remaining relevant documents and test and control is measuring the relative performance of the 
modified queries produced at each iteration. 
                                                           
15
 AP (Associated Press) collection 1988. 
16
 This was also true for one of the original query terms. 
17
 The remaining queries may also include some queries that have a large number of relevant documents, but this 
is unlikely to be the case in this test as 200 documents have been used for feedback whereas the queries have an 
average of only 35 relevant documents per query. 
 19 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting