methods for sorting terms were better than other methods. Second, and more importantly for IQE, the
performance of query expansion varied according to how many terms were added to the query. For the
Cranfield 1400 collection, expansion by 20 terms gave optimal effectiveness.
She performed a further experiment in which the system selected expansion terms from a list of those
terms that occurred in at least one of the unseen relevant documents. This simulated a perfect choice
of expansion terms on behalf of the user the system only added terms that would retrieve unseen
relevant documents. This approach (IQE simulated) was compared against the performance given by
expansion using the top 20 expansion terms (AQE).
This IQE simulated approach reduced the number of expansion terms from the 20 that were added in
the AQE version to an average of 12 terms per query. Comparing AQE and IQE simulated, Harman
found that, although the AQE worked well and gave large overall improvements in retrieval
effectiveness, the IQE simulated expansion was capable of improving these results further. In addition,
the IQE simulated expansion was more consistent in improving performance. This latter finding was
important: automatic query expansion (AQE) shows good overall performance when averaged over a
set of queries but this performance increase is variable, some queries do very well with AQE others
improve very little or suffer a degradation in performance. IQE as Harman deployed it, on the other
hand, improves more of the queries.
Harman explored alternatives for obtaining terms for query expansion: query expansion by term
variants, expansion by nearest neighbours. The first method expanding the query by all variants of the
query terms showed little improvement when performed automatically, i.e. adding all variants of
query terms. However using the `perfect user' strategy Harman did obtain significant improvements.
The second strategy expansion by similar terms as given by co occurrence information also showed
a drop in performance when performed automatically but an increase when performed in the simulation
of a perfect user. Harman also demonstrated that combining query expansion techniques can further
improve performance.
Harman s 1988 experiments only examined query expansion: the expansion terms were not weighted
according to their utility in retrieving relevant documents. In [Har92b] she ran a series of experiments
on the same collection as in [Har88], the Cranfield 1400 collection, to determine the relative
effectiveness of expansion and reweighting. She showed that, on this collection at least, expanding the
query is more important than only reweighting query terms. Combining both techniques will give best
overall performance. The relative merits of term reweighting and expansion may differ between
collections and models but probably generally hold. She also demonstrated that multiple iterations of
RF can increase performance over single iterations, so RF is useful over the course of a search.
The work on AQE demonstrated that, although RF can improve retrieval effectiveness, it is variable
across queries: some queries do very well with relevance feedback whereas others can show degraded
performance. In IQE it might be reasonable to assume that a user can negate this variability by selecting
only good RF terms and ignoring the non relevant ones. This potential benefit raises a number of
questions regarding how good AQE methods are for IQE purposes. In the following sections we shall
examine how ranking terms for IQE can affect performance, and the relative effectiveness of AQE and
IQE.
5.2 Ranking expansion terms in IQE
It may be that the traditional term ranking algorithms used for AQE will perform differently when used
by real subjects. That is, techniques that are successful in automatically selecting expansion terms are
not suitable as a basis for a user selecting terms. One reason for this is that the reasons for a user
selecting a term may not be based only on retrieval effectiveness. A user may, for example, choose
fewer expansion terms due to the increased effort of term selection, or may choose terms that refine
rather than modify a search topic.
Efthimiadis, [Efth93, Efth95], examined eight term ranking algorithms, and investigated their
performance in an IQE environment, when users performing real searches were making the relevance
assessments and term selection. Four of these algorithms (F4, F4.modified
26
, wi(pi qi)
27
, and
26
F4.modified is the version of the F4 weighting function that adds 0.5 to each cell in the numerator and
denominator to prevent 0 entries (section 2.2.3)
33
<
New Page 1
UK Web Hosting