An alternative approach to exploiting term dependency is term clustering grouping sets of related
terms with a view to selecting query expansion terms from these sets. This can be achieved without
relevance information (using only statistical information on term similarity to choose expansion terms)
or with relevance information (using a combination of collection dependent information and
information from the relevant set to choose expansion terms). Both these methods will typically rely on
term co occurrence methods to generate clusters but the term co occurrence methods used in the
literature have generally not provided convincing results [PW91].
The methods for incorporating term dependence outlined in this section have not produced the increase
in retrieval performance that may be expected. Partly this may be due to the computational limitations
of calculating and storing dependence information. Although the term independence methods, such as
the F4 term weighting scheme, do not explicitly capture term dependence, they do implicitly capture
some degree of term co occurrence. That is, although the term independence methods do not calculate
explicit values for co occurrence, one would expect that the terms in the term expansion list would have
a greater than average degree of term co occurrence. This is because good discriminators of relevance
are those terms that appear more frequently in the relevant than non relevant documents. How to use
this co occurrence information successfully, and in a computationally efficient manner, remains an open
research question.
3.2 The dynamic nature of information seeking
Implicit to much of the early work on RF is the assumption that users have a fixed information need:
that the information for which they are searching does not change over the course of a search. Whilst
this may be true in certain cases, evidence from a range of studies on information seeking, e.g. [Kuh93,
Ell89, SW99], show that information needs should be regarded as transient, developing entities rather
than a fixed request.
The techniques discussed previously modify queries based on the difference between relevant and non
relevant documents but they do not consider when a document was marked relevant: a document
marked relevant at the start of a search contributes as much to RF as a document marked relevant at the
current iteration. If we assume that user's information needs are static then this is correct. However if
the user s need is developing or changing throughout the search, then documents that were assessed as
being relevant early in the search may not be good examples of what the user currently regards as
relevant. Campbell, in a series of papers on developing information needs, has addressed this issue
through the notion of Ostensive Relevance, [Cam95, Cam99, CVR96].
The basic premise behind Ostensive Relevance, [Cam95], is that documents selected at the current
iteration of RF are the best indicators of what the user finds relevant; documents assessed as relevant in
previous iterations are decreasingly useful at describing a user s information need. Relevant documents,
then, are not seen as a set of equally important documents but sets of documents of varying importance.
In [CVR96] Campbell and Van Rijsbergen produce an extension to the probabilistic model of retrieval
that incorporates an ageing component to term weighting. When calculating the weight of a term this
ageing component incorporates when the documents containing the term were assessed relevant: if the
documents were marked relevant at an early stage in the search then the term receives a lower weight
than if the document was assessed relevant in recent iterations. The ageing component can be tuned to
differentiate more or less strongly between older and more recent documents. In [Cam99] a preliminary
test of this approach indicated that ostensive weighting can improve searches in fewer search iterations
than non ostensive approaches. Ruthven et al. also showed ostensive weighting as being beneficial for
query expansion [RLVR02b].
Standard RF techniques, such as Rocchio or F4, will also adapt to changing information needs but they
will require more evidence to do so as they will require an accumulation of new evidence to outweigh
the old evidence. Campbell s ageing component reduces this mass of evidence required to shift a query
towards the new information need.
Berger and Van Bommel, [BVB97], present a model with similar aims. Their work is specifically
aimed at characterising the content of documents through hyper indices: hypertext representations of
document indexes, such as the one shown in Figure 14.
24
<
New Page 1
UK Web Hosting