the combination of evidence in RF: combining multiple queries, retrieval algorithms or feedback 
algorithms, and in section 3.5 we discuss pseudo RF: employing RF without the user's involvement. 
3.1 Dependence between terms 
The vector space and probabilistic models assume that terms are independent of each other, that is the 
presence of one term in a document does not alter the probability of seeing another term in the same 
document. Although this simplifying assumption has facilitated the construction of successful retrieval 
systems, it is not true. Words are related by use, for example in phrases, and their similarity of 
occurrence in documents can reflect underlying semantic relations between terms.  
Incorporating information on co occurrence patterns of terms in documents may improve retrieval 
effectiveness as indicated by the Association Hypothesis [VR79]:  
 If an index term is good at discriminating relevant from irrelevant documents 
then any closely associated index term is also likely to be good at this.  
Author such as Spiegel and Bennet, [SB64], as early as 1964, suggested that dependency information of 
this kind may be used to choose further search terms for query expansion. Not all query expansion 
based on dependence information is used for RF, for example we could use dependency information to 
automatically expand initial queries in the absence of relevance information from the user. However 
three investigations of dependency information, with a RF connection, are outlined below. 
Van Rijsbergen, Harper and Porter [VRHP81] proposed using a maximum spanning tree (MST) in 
which each node represents a term and each link represents the association or similarity between the 
two terms. The MST links each term to its most similar terms as measured by the association measure. 
The association measure used in [VRHP81] was the EMIM (Expected Mutual Information Measure) 
measure, based on the probability distribution of the two terms. The MST can be potentially be used in 
many ways to expand a query. In [VRHP81] the most similar terms to the query terms (the ones directly 
linked in the MST) are added to the query. The query and expansion terms in [VRHP81] are also 
reweighted by a weight based on the F4 weight. On the whole, Van Rijsbergen et al. show that their 
term dependence approach behaves better than the F4 term independence weighting scheme. They also 
demonstrate the relative robustness of the MST approach, in that although, the EMIM based MST gives 
superior results, alternative association measures do not give significantly different results. 
Smeaton and Van Rijsbergen [SVR83] investigate query expansion and term reweighting using term 
dependence. Their investigation centred around three methods of query expansion: the MST approach 
of Van Rijsbergen et al, a Nearest Neighbours (NN) approach (this added terms that were statistically 
most similar to a query term) and query expansion by a list of possible expansion terms from the 
relevant documents. The third technique, expansion with terms from relevant documents is similar to 
the term independence approaches outlined in section 2. The results from these experiments were 
largely negative. Query expansion via the MST generally degraded performance over the unexpanded 
query, as did expansion via the NN or expansion terms chosen from the relevant documents. One 
striking feature was that the performance degradation increased as the number of terms added to the 
query increased. Smeaton and Van Rijsbergen point to the difficulty in estimating probabilities as the 
main reason for this failure.  
In [Bha92] Bhatia also presented a model of dependence trees for query expansion to incorporate user 
specific information. Bhatia suggests that the dependence tree approach can be improved by not only 
being more selective about which terms appear in the tree but by weighting the links between elements 
in the tree according to user preference. The claim is that although spanning trees can suggest 
expansion terms based on statistical similarity they do not suggest them based on conceptual similarity.  
The solution presented is to elicit from the user what concepts are present in documents and how they 
relate to each (how similar or dissimilar they are). This can be used to develop a new spanning tree that 
more accurately reflects the user s personal constructs based on concepts rather than explicitly 
mentioned terms. A spanning, or dependence, tree would have to be constructed for each user but the 
argument is that it would better support the users searching and choice of terms.  
 23 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting