EMIM
28
) have already been discussed. The fifth   Porter's algorithm, [PG88],   is similar to the F1 
function   section 2.2.3, placing emphasis on frequently occurring terms in the relevant set. This is 
shown in Equation 18. 
r
n
Porter
i
i
i
=
R -
N
        
Equation 18: Porter term weighting function  
where   ri = number of relevant documents containing term i 
R = number of relevant documents 
ni = number of documents containing term i 
N= number of documents in the collection 
The sixth algorithm   the ZOOM frequency measure [Mar82]   ranks terms by their total frequency of 
occurrence in the retrieved set. All within document occurrences are also included so this measure 
ranks terms by the total frequency within a set of documents. Ties between equally frequent terms are 
resolved by ranking terms alphabetically. 
The seventh algorithm, r lohi, ranks terms according to their frequency of occurrence in the relevant set 
of documents, resolving ties by the tf value of the terms (low tf to high tf). The final algorithm, r hilo, is 
identical to r lohi except that it resolves ties by ranking from high tf to low tf value. 
In the data collection section of these experiments, Efthimiadis s subjects were asked to mark all 
potentially useful expansion terms and the five best terms. The terms were selected from documents 
that the user had assessed as relevant during relevance feedback. Efthimiadis evaluated the performance 
of the eight term ranking algorithms by comparing the rankings given for each query against the list 
generated by the users. For this, he used three criteria.  
i.  comparing systems and user's ranking of term utility. The first test looked at where the user 
selected terms appeared in the system s ranking of terms (the top 25 terms give by EMIM, Porter, etc). 
Term ranking algorithms that have more user selected terms further up the ranking are better than those 
algorithms that place user selected terms further down the ranking of terms.  
The most finely grained test split the system generated list of terms into three sections (top, middle, 
bottom). The user selected terms showed a distribution of 20% 30% 50% (20% of terms in bottom 
third of system ranking, 30% in middle third, 50% in top third) for all measures except ZOOM (with a 
distribution of 30% 30% 40%) and r hilo(40% 30% 30%). The wpq, EMIM and r lohi performed at 
very similar levels, followed by Porter, and, slightly behind, the two F4 variants. The same analysis was 
performed for the five best terms identified by the users, which showed similar results: wpq, EMIM and 
r lohi performing best, followed by Porter, then the F4 variants, and finally ZOOM and r hilo.  
ii.  examining top five ranked terms. The second analysis examined the top five terms in each ranking to 
compare the similarity of the term rankings. The result showed that pairs of algorithms (wpq and 
EMIM, F4 and F4.modified, Porter and ZOOM) were very similar. The terms of r lohi are similar to 
wpq and EMIM, whilst those of r hilo are more close to those of ZOOM than anything else. In certain 
cases, e.g. wpq and EMIM, the top five terms are almost identical with only the ranking differing 
slightly. The major differences were between the F4 cases (mostly influenced by n) and the other 
algorithms (mostly influenced by r and only different is when r is tied).  
iii. mean of their rank position of user's five best terms. The rank position of the users  five best terms 
were summed to determine which algorithms gave the best ranking of these important terms. The results 
(wpq, EMIM > r lohi, Porter > F4.modified >F4 > ZOOM > r hilo) also highlight differences between 
pairs of algorithms but there were no significant differences between the superior wpq, EMIM, r lohi 
and Porter algorithms.  
Each of these analyses were designed to test how good the algorithm was at ranking terms for IQE. In 
each case wpq, and EMIM performed best with Porter and the F4 variants performing well. The ZOOM 
                                                                                                                                                                      
27
 Abbreviated, for convenience, to wpq, section 2.2.3. 
28
 Section 3.1. 
 34 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting