Santosh K
IIITH Hyderabad, India
Rahul Sharma
IIITH Hyderabad, India
Chiranjeev Sharma
IIITH Hyderabad, India
ABSTRACT
This paper presents a study on sentiment analysis and opinion mining in Hindi on product reviews. We experimented with several methods, mainly focusing on lexical based approaches. Different lexicons were used on same data set to analyse the significance of lexical based approaches.
2.1 Lexicon
Two different lexicons were used in order to test the efficiency of the lexical based approach for sentiment analysis. Each lexicon contains Adjectives and Adverbs and their corresponding positive and negative scores. HSL lexicon has positive, negative and objective score, where as HSWN lexicon has only positive and negative scores. The scores are the probability values of a word being used in a positive, negative or objective (neutral) sense. For any given word in the lexicon, the sum of all the scores is 1. The total score of a word w is given by, total score(w) = P (p) + P (n) + P (o) (1)
General Terms
Languages, Unsupervised
Keywords
Opinion Mining, Sentiment Analysis
1. INTRODUCTION
In view of the growing content on web in various Indian languages, there is a need for an analysis of the data from various sources like blogs, product reviews and other social networking websites. This classification can be useful in product analysis, marketing strategies, advertisements and other user specific recommendation systems. Sentiment analysis has been done in English and other languages. But it is fairly new in Hindi and other Indian languages. In this paper we propose a method to classify the reviews in to either positive or negative using a lexicon. Two different lexicons, HSL (Hindi Subjective Lexicon)1 [1] and HSWN (Hindi Sentence WordNet)2 were used and each lexicon contains Adjectives, Adverbs and their corresponding scores.
where, P(p), P(n) and P(o) is the probability of word w
References: [1] P. Arora, A. Bakliwal, and V. Varma. Hindi subjective lexicon generation using wordnet graph traversal. In CICLing, 2012. [2] A. Bakliwal, P. Arora, and V. Varma. Hindi subjective lexicon : A lexical resource for hindi polarity classification. In LREC, 2012. 4.1 Analysis on the usage of PoS tagger It can be observed from Table 2 and 3 that the use of Hindi PoS tagger lead to decrease in performance by 3 to 5% for HSL lexicon and no significant change in performance for HSWN lexcicon. In case of the merged lexicon (Table 4), the