Introduction to Information Retrieval

Information retrieval concerns with providing users with easy access to information of their interest. It is used primarily in search engines,

” Information retrieval deals with the representation, storage, organization of, and access to information items such as documents, Web pages, online catalogs, structured and semi-structured records, multimedia objects. The representation and organization of the information items should be such as to provide the users with easy access to information of their interest. ”

Text is the primary way to store information after speech. When the text got bigger and bigger, we started indexing the information. Just like a librarian indexes the books with respect to their subject matter.

It used to be said that “it is a province of knowledge to speak and privilege of wisdom of listen” but now the quote has changed to “It is province of knowledge to write and privilege of wisdom to query”. It means that that, now everything is stores in the form of text which is our knowledge base and to extract specific knowledge, we have to be wise enough to query our need.


What makes Information Retrieval hard?

  1. Unstructured Text: All information and data are in the form of natural language. They are not is database format
  2. Requires understanding of semantic: For example; cafe, restaurant, hotel all might mean a similar type of thing.
  3. Ambiguous nature of words: For example, a word bat may refer to a cricket bat or the mammal bat. Similarly, bank or mouse also refer to totally two different things. It will make relevant information retrieval difficult.
  4. Web pages change rapidly: There are lots of websites being created everyday which makes retrieving info. time consuming.
  5. Fake information on Web: Not every information available on the Internet are from trusted sources.
  6. New pages are not linked: A recently created webpage, although it contains the most relevant information might not be discovered because, it takes some time before pages are linked together.

Aim of Information Retrieval

The task of IR system is to land the most relevant documents with respect to query. Relevance is subjective( meaning two people may search the same term but their expectation may be different.) It includes

  1. Being on proper subject
  2. Being timely
  3. Being authoritative
  4. Satisfying the goal of the user

The main focus of information retrieval should be user happiness.

The IR System

