Movie Rating and Review Summarization in Mobile Environment
Chien-Liang Liu, Wen-Hoar Hsaio, Chia-Hoang Lee, Gen-Chi Lu, and Emery Jou
Abstract—In this paper, we design and develop a movie-rating and review-summarization system in a mobile environment. The movie-rating information is based on the sentiment-classification result. The condensed descriptions of movie reviews are generated from the feature-based summarization. We propose a novel approach based on latent semantic analysis (LSA) to identify product features. Furthermore, we find a way to reduce the size of summary based on the product features obtained from LSA. We consider both sentiment-classification accuracy and system response time to design the system. The rating and review-summarization system can be extended to other product-review domains easily. Index Terms—Feature extraction, natural language processing (NLP), text analysis, text mining.
I. INTRODUCTION EOPLE’s opinion has become one of the extremely important sources for various services in ever-growing popular social networks. In particular, online opinions have turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities, and manage their reputations. Meanwhile, cellular phones have definitely become the most-vital part of our lives. There is no doubt that the mobile platform is currently one of the most popular platforms in the world. However, digital content displayed in cellular phones is limited in size, since cellular phones are physically small. Hence, a mechanism that can provide users with condensed descriptions of documents will facilitate the delivery of digital content in cellular phones. This paper explores and designs a mobile system for movie rating and review summarization in which semantic orientation of comments, the limitation of small display capability of
References: [1] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: Sentiment classification using machine learning techniques,” in Proc. ACL-02 Conf. Empirical Methods Natural Lang. Process., 2002, pp. 79–86. [2] P. D. Turney, “Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews,” in Proc. 40th Annu. Meeting Assoc. Comput. Linguist., 2002, pp. 417–424. [3] A. Esuli and F. Sebastiani, “Determining the semantic orientation of terms through gloss classification,” in Proc. 14th ACM Int. Conf. Inf. Knowl. Manage., 2005, pp. 617–624. [4] S. H. Choi, Y.-S. Jeong, and M. K. Jeong, “A hybrid recommendation method with reduced data for large-scale application,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 40, no. 5, pp. 557–566, Sep. 2010. [5] T. Mullen and N. Collier, “Sentiment analysis using support vector machines with diverse information sources,” in Proc. EMNLP, 2004, pp. 412– 418. [6] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proc. 10th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2004, pp. 168– 177. [7] V. Hatzivassiloglou and K. R. McKeown, “Predicting the semantic orientation of adjectives,” in Proc. 8th Conf. Eur. Chap. Assoc. Comput. Linguist., Morristown, NJ: Assoc. Comput. Linguist., 1997, pp. 174–181. [8] A. Esuli and F. Sebastiani, “SENTIWORDNET: A publicly available lexical resource for opinion mining,” in Proc. 5th Conf. Lang. Res. Eval., 2006, pp. 417–422. [9] K. Dave, S. Lawrence, and D. M. Pennock, “Mining the peanut gallery: opinion extraction and semantic classification of product reviews,” in Proc. 12th Int. Conf. World Wide Web, New York: ACM, 2003, pp. 519– 528. [10] V. N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995. [11] B. Pang and L. Lee, “Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales,” in Proc. 43rd Annu. Meet. Assoc. Comput. Linguist, Morristown, NJ: Assoc. Comput. Linguist., 2005, pp. 115–124. [12] A. B. Goldberg and X. Zhu, “Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization,” in Proc. TextGraphs: First Workshop Graph Based Methods Nat. Lang. Process, Morristown, NJ: Assoc. Comput. Linguist., 2006, pp. 45–52. [13] B. Snyder and R. Barzilay, “Multiple aspect ranking using the good grief algorithm,” in Proc. HLT-NAACL, 2007, pp. 300–307. [14] L. Zhuang, F. Jing, and X.-Y. Zhu, “Movie review mining and summarization,” in Proc. 15th ACM Int. Conf. Inf. Knowl. Manage., 2006, pp. 43–50. [15] Y. Lu, C. Zhai, and N. Sundaresan, “Rated aspect summarization of short comments,” in Proc. 18th Int. Conf. World Wide Web, New York: ACM, 2009, pp. 131–140. [16] T. Hofmann, J. Puzicha, and M. I. Jordan, “Learning from dyadic data,” in Proc. Conf. Adv. Neural Inform. Process. Syst. II, Cambridge, MA: MIT Press, 1999, pp. 466–472. [17] T. K. Landauer, P. W. Foltz, and D. Laham, “Introduction to latent semantic analysis,” Discourse Processes, vol. 25, pp. 259–284, 1998. [18] T. Joachims, Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Norwell, MA: Kluwer, 2002. [19] C. Silva, U. Lotriˇ , B. Ribeiro, and A. Dobnikar, “Distributed text classic fication with an ensemble kernel-based learning approach,” IEEE Trans. Syst., Man, Cybern. C: Appl. Rev., vol. 40, no. 3, pp. 287–297, May 2010. [20] L. Rokach and O. Maimon, “Top-down induction of decision trees classifiers—A survey,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 35, no. 4, pp. 476–487, Nov. 2005. [21] G. P. Zhang, “Neural networks for classification: A survey,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 30, no. 4, pp. 451–462, Nov. 2000. [22] (2001). LIBSVM: A library for support vector machines [Online]. Available: http://www.csie.ntu.edu.tw/ cjlin/libsvm. [23] T. Hofmann, “Unsupervised learning by probabilistic latent semantic analysis,” Mach. Learn., vol. 42, no. 1/2, pp. 177–196, 2001. [24] A. P. Dempster, N. M. Laird, and D. B. Rubin. (1977). Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc., Series B [Online]. vol. 39, no. 1, pp. 1–38. Available: http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.133.4884. LIU et al.: MOVIE RATING AND REVIEW SUMMARIZATION IN MOBILE ENVIRONMENT 407 [25] C. D. Manning, P. Raghavan, and H. Schtze, Introduction to Information Retrieval. New York: Cambridge Univ. Press, 2008. [26] D. Ramage, P. Heymann, C. D. Manning, and H. Garcia-Molina, “Clustering the tagged web,” in Proc. 2nd ACM Int. Conf. Web Search Data Mining, New York: ACM, 2009, pp. 54–63. Chia-Hoang Lee received the Ph.D. degree in computer science from the University of Maryland, College Park, in 1983. He is currently a Professor with the Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan. He was a Faculty Member with the University of Maryland and Purdue University, West Lafayette, IN. His current research interests include artificial intelligence, human–machine interface systems, natural-language processing, and opinion mining. Chien-Liang Liu received the M.S. and Ph.D. degrees in computer science from National Chiao Tung University, Hsinchu, Taiwan, in 2000 and 2005, respectively. He is currently a Postdoctoral Researcher with the Department of Computer Science, National Chiao Tung University. His current research interests include machine learning, natural-language processing, and data mining. Gen-Chi Lu received the Master’s degree in computer science from National Chiao Tung University, Hsinchu, Taiwan, in 2009. He is currently an Engineer with the Global Legal Division iTEC, Hon Hai Precision Industry Company Ltd., Taipei, Taiwan. His current research interests include natural-language processing, opinion mining, and full-text search. Wen-Hoar Hsaio received the B.S. degree from the Department of Computer Science and Information Engineering, Chung Cheng Institute of Technology, National Defense University, Taipei, Taiwan, in 1980 and the M.S. degree in 1996 from the Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan, where he is currently working toward the Ph.D. degree with the Department of Computer Science. His current research interests include information retrieval, web mining, and machine learning. Emery Jou received the B.S degree in physics from Tsing Hua University, Hsinchu, Taiwan, the M.S. degree in computer science from the University of Texas at Austin, and the Ph.D. degree in computer science from the University of Maryland, College Park. He is currently a Research Scientist with the Institute for Information Industry, Taipei, Taiwan. He was with several Wall Street firms in the United States for more than 12 years (i.e., Morgan Stanley and JPMorganChase) as a System Architect for Security Transaction Processing through Single Sign-on and Public Key Infrastructure. He was also with Thales nCipher, Cambridge, U.K., where he was engaged in Tape Storage Data Encryption and Key Management Systems. In 2009, he was a Visiting Professor with the College of Computer Science, National Chiao Tung University, Hsinchu. He was also a consultant for the Industrial Technology Research Institute, Hsinchu.