An alternative approach to exploiting term dependency is term clustering   grouping sets of related 
terms with a view to selecting query expansion terms from these sets. This can be achieved without 
relevance information (using only statistical information on term similarity to choose expansion terms) 
or with relevance information (using a combination of collection dependent information and 
information from the relevant set to choose expansion terms). Both these methods will typically rely on 
term co occurrence methods to generate clusters but the term co occurrence methods used in the 
literature have generally not provided convincing results [PW91].  
The methods for incorporating term dependence outlined in this section have not produced the increase 
in retrieval performance that may be expected. Partly this may be due to the computational limitations 
of calculating and storing dependence information. Although the term independence methods, such as 
the F4 term weighting scheme, do not explicitly capture term dependence, they do implicitly capture 
some degree of term co occurrence. That is, although the term independence methods do not calculate 
explicit values for co occurrence, one would expect that the terms in the term expansion list would have 
a greater than average degree of term co occurrence. This is because good discriminators of relevance 
are those terms that appear more frequently in the relevant than non relevant documents. How to use 
this co occurrence information successfully, and in a computationally efficient manner, remains an open 
research question. 
3.2 The dynamic nature of information seeking
Implicit to much of the early work on RF is the assumption that users have a fixed information need: 
that the information for which they are searching does not change over the course of a search. Whilst 
this may be true in certain cases, evidence from a range of studies on information seeking, e.g. [Kuh93, 
Ell89, SW99], show that information needs should be regarded as transient, developing entities rather 
than a fixed request. 
The techniques discussed previously modify queries based on the difference between relevant and non 
relevant documents but they do not consider when a document was marked relevant: a document 
marked relevant at the start of a search contributes as much to RF as a document marked relevant at the 
current iteration. If we assume that user's information needs are static then this is correct. However if 
the user s need is developing or changing throughout the search, then documents that were assessed as 
being relevant early in the search may not be good examples of what the user currently regards as 
relevant. Campbell, in a series of papers on developing information needs, has addressed this issue 
through the notion of Ostensive Relevance, [Cam95, Cam99, CVR96].  
The basic premise behind Ostensive Relevance, [Cam95], is that documents selected at the current 
iteration of RF are the best indicators of what the user finds relevant; documents assessed as relevant in 
previous iterations are decreasingly useful at describing a user s information need. Relevant documents, 
then, are not seen as a set of equally important documents but sets of documents of varying importance. 
In [CVR96] Campbell and Van Rijsbergen produce an extension to the probabilistic model of retrieval 
that incorporates an  ageing  component to term weighting. When calculating the weight of a term this 
ageing component incorporates when the documents containing the term were assessed relevant: if the 
documents were marked relevant at an early stage in the search then the term receives a lower weight 
than if the document was assessed relevant in recent iterations. The ageing component can be tuned to 
differentiate more or less strongly between older and more recent documents. In [Cam99] a preliminary 
test of this approach indicated that ostensive weighting can improve searches in fewer search iterations 
than non ostensive approaches. Ruthven et al. also showed ostensive weighting as being beneficial for 
query expansion [RLVR02b]. 
Standard RF techniques, such as Rocchio or F4, will also adapt to changing information needs but they 
will require more evidence to do so as they will require an accumulation of new evidence to outweigh 
the old evidence. Campbell s ageing component reduces this mass of evidence required to shift a query 
towards the new information need. 
Berger and Van Bommel, [BVB97], present a model with similar aims. Their work is specifically 
aimed at characterising the content of documents through hyper indices: hypertext representations of 
document indexes, such as the one shown in Figure 14. 
 24 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting