methods for sorting terms were better than other methods. Second, and more importantly for IQE, the 
performance of query expansion varied according to how many terms were added to the query. For the 
Cranfield 1400 collection, expansion by 20 terms gave optimal effectiveness.  
She performed a further experiment in which the system selected expansion terms from a list of those 
terms that occurred in at least one of the unseen relevant documents. This simulated a   perfect  choice 
of expansion terms on behalf of the user   the system only added terms that would retrieve unseen 
relevant documents. This approach (IQE simulated) was compared against the performance given by 
expansion using the top 20 expansion terms (AQE). 
This IQE simulated approach reduced the number of expansion terms from the 20 that were added in 
the AQE version to an average of 12 terms per query. Comparing AQE and IQE simulated, Harman 
found that, although the AQE worked well and gave large overall improvements in retrieval 
effectiveness, the IQE simulated expansion was capable of improving these results further. In addition, 
the IQE simulated expansion was more consistent in improving performance. This latter finding was 
important: automatic query expansion (AQE) shows good overall performance when averaged over a 
set of queries but this performance increase is variable, some queries do very well with AQE others 
improve very little or suffer a degradation in performance. IQE as Harman deployed it, on the other 
hand, improves more of the queries. 
Harman explored alternatives for obtaining terms for query expansion: query expansion by term 
variants, expansion by nearest neighbours. The first method   expanding the query by all variants of the 
query terms   showed little improvement when performed automatically, i.e. adding all variants of 
query terms. However using the `perfect user' strategy Harman did obtain significant improvements. 
The second strategy   expansion by similar terms as given by co occurrence information   also showed 
a drop in performance when performed automatically but an increase when performed in the simulation 
of a perfect user. Harman also demonstrated that combining query expansion techniques can further 
improve performance. 
Harman s 1988 experiments only examined query expansion: the expansion terms were not weighted 
according to their utility in retrieving relevant documents. In [Har92b] she ran a series of experiments 
on the same collection as in [Har88], the Cranfield 1400 collection, to determine the relative 
effectiveness of expansion and reweighting. She showed that, on this collection at least, expanding the 
query is more important than only reweighting query terms. Combining both techniques will give best 
overall performance. The relative merits of term reweighting and expansion may differ between 
collections and models but probably generally hold. She also demonstrated that multiple iterations of 
RF can increase performance over single iterations, so RF is useful over the course of a search. 
The work on AQE demonstrated that, although RF can improve retrieval effectiveness, it is variable 
across queries: some queries do very well with relevance feedback whereas others can show degraded 
performance. In IQE it might be reasonable to assume that a user can negate this variability by selecting 
only good RF terms and ignoring the non relevant ones. This potential benefit raises a number of 
questions regarding how good AQE methods are for IQE purposes. In the following sections we shall 
examine how ranking terms for IQE can affect performance, and the relative effectiveness of AQE and 
IQE. 
5.2 Ranking expansion terms in IQE 
It may be that the traditional term ranking algorithms used for AQE will perform differently when used 
by real subjects. That is, techniques that are successful in automatically selecting expansion terms are 
not suitable as a basis for a user selecting terms. One reason for this is that the reasons for a user 
selecting a term may not be based only on retrieval effectiveness. A user may, for example, choose 
fewer expansion terms due to the increased effort of term selection, or may choose terms that refine 
rather than modify a search topic.  
Efthimiadis, [Efth93, Efth95], examined eight term ranking algorithms, and investigated their 
performance in an IQE environment, when users performing real searches were making the relevance 
assessments and term selection. Four of these algorithms (F4, F4.modified
26
,  wi(pi     qi)
27
, and 
                                                           
26
  F4.modified is the version of the F4  weighting function that adds 0.5 to each cell in the numerator and 
denominator to prevent 0 entries (section 2.2.3) 
 33 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting