Although indexing makes it possible to access information from very large document collections, the 
conversion from a document text to a list of weighted keywords does result in a loss of information. 
Writing a document is an intentional process; a document is intended to convey a message. The 
translation to a list of keywords retains the essential building blocks of the message, the terms 
themselves, but the message(s) that the author intended cannot be accessed by the retrieval mechanism. 
The effect of this loss of information may be ameliorated or deteriorated by the use of controlled 
vocabularies   pre defined sets of indexing terms, [Ing92, Chap 3]. However, the fact remains that when 
we talk of representing the information content of documents we are only representing the components 
of the message, not the message itself. 
The reduction of the document text into a series of keywords also transforms the task of an IR system 
from retrieving information to retrieving objects that contain information. Some authors argue that 
objects such as documents cannot be held to contain information as such, rather information is a change 
in a cognitive, or internal, state brought about by exposure to the contents of these objects. The 
following early quote by Maron, [Mar64], illustrates this concern,  
"..information is not a stuff contained in books as marbles might be contained in 
a bag   even though we sometimes speak of it in that way. It is, rather a 
relationship. The impact of a given message on an individual is relative to what 
he already knows, and of course, the same message could convey different 
amounts of information to different receivers, depending on each one s internal 
model or map."  
The degradation of the document text, necessary for computation, and the subjectivity of relevance 
results in a layer of indirection between the user and the documents. The goal of the IR system is to 
bridge this gap between the user and potentially relevant material. Indexing techniques identify and 
highlight potentially good indicators of relevant material, and retrieval techniques use these indicators 
of relevance to select which documents to present to the user. How individual retrieval systems use 
these indicators to retrieve documents is the topic of the next section. 
2.2 Retrieval and feedback
Retrieval is the process of matching a representation of an information need, usually a user supplied 
query, to an indexed document representation. Queries will be indexed in the same way as a document 
and compared with a document index to determine if a document is likely to be relevant to a query. 
How the indexed query is compared with the indexed document differentiates the major retrieval 
models. In this section we shall briefly outline the four main models of retrieval: Boolean, vector space, 
probabilistic, and logical, and describe the basic approaches to RF in each of the models. 
2.2.1 Boolean model 
The first operational IR retrieval model was the Boolean model, based on Boolean logic. In this model 
queries are keywords combined, by the user, with the conjunctive (AND), disjunctive (OR) or negation 
(NOT) operators. This is an exact match model: the system only retrieves those documents that exactly 
match the user's query formula. For example, for the query `information AND retrieval AND system' 
the system will return all documents that contain the three words `information', `retrieval' and `system', 
whereas the query `information OR (retrieval AND system)  will return those documents that contain 
the word `information' and those documents that contain both `retrieval' and `system'.   
The Boolean model has been used in a large number of on line public access catalogue (OPAC) 
systems but has been shown to demonstrate a number of difficulties. Firstly, traditional Boolean 
systems do not use term weights and consequently return the complete set of documents that match the 
query as an unordered set. This means the users may have to add or remove terms, or generate more 
complex query expressions to reduce the set of retrieved documents to a manageable size. Willie and 
Bruza, [WB95], argue that the problems with interacting with Boolean systems are not only a matter of 
the formal query language but a conceptual problem: the Boolean model does not lend itself to 
supporting how users think about searching and their individual search techniques. A further problem 
with Boolean systems is that the order in which operators are applied may not be consistent across 
systems, resulting in the fact that different systems may retrieve different documents for the same query, 
[Borg96]. Nevertheless Boolean systems do remain popular with users, perhaps because of the explicit 
control that is offered by these systems to the user. Web search engines often allow Boolean style 
querying performed on an underlying best match model (see section 2.2.2). 
 5 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting