Project Report
Submitted in Partial Fulfillment of the Requirements for the Degree of
Bachelor of Technology
Submitted by
Barnan Das
&
Tanmoy Pal
Under the guidance of
Dr. Pabitra Mitra
Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
CERTIFICATE
This is to certify that the project report entitled “Development of Bengali Language Stemmer” is a record of bona fide work carried out by Mr. Barnan Das & Mr. Tanmoy Pal of Bengal College of Engineering and Technology, Durgapur under my supervision and guidance, as part of their Final Year Project 2009, at the Indian Institute of Technology, Kharagpur.
Dr. Pabitra Mitra Date: Place: Kharagpur Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur
ABSTRACT
Since the day man started realizing the importance of information it became necessary for archiving those information in such a way that they become easy to retrieve in the future. The advent of computers made it possible to store large amounts of data or information and thus retrieving those data became a necessity. The area of Information Retrieval (IR) was born in 1950s and since then several IR systems are being developed and used everyday by millions of people all over the world. English being a widely accepted language all over the world, most of the IR systems, web based or stand alone systems, are developed for English documents or contents. A little has been done for Bengali documents. Bengali is the fourth largest language of the world. There is great need for developing technology for processing Bengali language text. A particularly important task is that of developing a search engine for Bengali documents. Many technologies required for this is yet to be developed in Bengali. The goal of this project is to develop the technologies for Bengali and the focus is primarily on developing algorithms for stemming.