Preview

Ghjkjh

Powerful Essays
Open Document
Open Document
8647 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Ghjkjh
Deriving Marketing Intelligence from Online Discussion
Natalie Glance nglance@intelliseek.com Matthew Hurst mhurst@intelliseek.com Kamal Nigam knigam@intelliseek.com Matthew Siegler msiegler@intelliseek.com Robert Stockton rstockton@intelliseek.com Intelliseek Applied Research Center Pittsburgh, PA 15217

Takashi Tomokiyo ttomokiyo@intelliseek.com ABSTRACT
Weblogs and message boards provide online forums for discussion that record the voice of the public. Woven into this mass of discussion is a wide range of opinion and commentary about consumer products. This presents an opportunity for companies to understand and respond to the consumer by analyzing this unsolicited feedback. Given the volume, format and content of the data, the appropriate approach to understand this data is to use large-scale web and text data mining technologies. This paper argues that applications for mining large volumes of textual data for marketing intelligence should provide two key elements: a suite of powerful mining and visualization technologies and an interactive analysis environment which allows for rapid generation and testing of hypotheses. This paper presents such a system that gathers and annotates online discussion relating to consumer products using a wide variety of state-of-the-art techniques, including crawling, wrapping, search, text classification and computational linguistics. Marketing intelligence is derived through an interactive analysis framework uniquely configured to leverage the connectivity and content of annotated online discussion. Categories and Subject Descriptors: H.3.3: Information Search and Retrieval General Terms: Algorithms, Experimentation Keywords: text mining, content systems, computational linguistics, machine learning, information retrieval

from online public communications. For example, there are message boards devoted to a specific gaming platform, newsgroups centered around a particular make and model of motorcycle, and



References: [1] S. Abney. Partial parsing via finite-state cascades. In Workshop on Robust Parsing, 8th European Summer School in Logic, Language and Information, 1996. [2] R. Agrawal, S. Rajagopalan, R. Srikant, and Y. Xu. Mining newsgroups using networks arising from social behavior. In Proceedings of the Twelfth International World Wide Web Conference (WWW2003), 2003. [3] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487–499. Morgan Kaufmann, 12–15 1994. [4] R. Baumgartner, S. Flesca, and G. Gottlob. Declarative information extraction, Web crawling, and recursive wrapping with Lixto. Lecture Notes in Computer Science, 2173, 2001. [5] K. D. Bollacker, S. Lawrence, and C. L. Giles. CiteSeer: An autonomous web agent for automatic retrieval and identification of interesting publications. In Agents ’98, pages 116–123, 1998. [6] H. Chen, J. Hu, and R. W. Sproat. Integrating geometric and linguistic analysis for e-mail signature block parsing. ACM Transactions on Information Systems, 17(4):343–366, 1999. [7] W. W. Cohen. Data integration using similarity joins and a word-based information representation language. ACM Transactions on Information Systems, 18(3):288—321, 2000. [8] W. W. Cohen, L. S. Jensen, and M. Hurst. A flexible learning system for wrapping tables and lists in HTML documents. In Proceedings of The Eleventh International World Wide Web Conference (WWW-2002), Honolulu, Hawaii, 2002. [9] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence, 118(1–2):69–113, 2000. [10] N. Glance and W. Cohen. BoardViewer: Meta-search and community mapping over message boards. Intelliseek Technical Report, 2003. [11] N. Glance, M. Hurst, and T. Tomokiyo. BlogPulse: Automated trend discovery for weblogs. In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004. [12] M. Hurst and K. Nigam. Retrieving topical sentiments from online document collections. In Document Recognition and Retrieval XI, pages 27–34, 2004. [13] L. S. Jensen and W. Cohen. Grouping extracted fields. In Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, 2001. [14] T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Machine Learning: ECML-98, Tenth European Conference on Machine Learning, 1998. [15] D. D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In Machine Learning: Proceedings of the Eleventh International Conference, 1994. [16] D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In SIGIR ’94, pages 3–12, 1994. [17] N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988. [18] A. McCallum and K. Nigam. Employing EM in pool-based active learning for text classification. In Machine Learning: Proceedings of the Fifteenth International Conference, pages 350–358, 1998. [19] J. Myllymaki. Effective web data extraction with standard XML technologies. In Proc. WWWW10, pages 689–696, May 2001. [20] T. Nasukawa, M. Morohashi, and T. Nagano. Customer claim mining: Discovering knowledge in vast amounts of textual data. Technical report, IBM Research, Japan, 1999. [21] T. Nasukawa and J. Yi. Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of K-CAP ’03, 2003. [22] K. Nigam and M. Hurst. Towards a robust metric of opinion. In AAAI Spring Symposium on Exploring Attitude and Affect in Text, 2004. [23] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. In Proceedings of EMNLP 2002, 2002. [24] J. G. Shanahan, Y. Qu, and J. Weibe, editors. Computing Attitude and Affect in Text. Springer, Dordrecht, Netherlands, 2005. [25] T. Tomokiyo and M. Hurst. A language model approach to keyphrase extraction. In Proceedings of the ACL Workshop on Multiword Expressions, 2003. [26] Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1/2):67–88, 1999.

You May Also Find These Documents Helpful

  • Satisfactory Essays

    Mat 540 Quiz

    • 819 Words
    • 4 Pages

    Pioneers McCulloch and Pitts built their neural networks model using a large number of interconnected __________ artificial neurons.…

    • 819 Words
    • 4 Pages
    Satisfactory Essays
  • Better Essays

    Twitter Case Study

    • 1017 Words
    • 5 Pages

    Kirkpatrick, Dave. August 21, 2012. Social Media Marketing: Data mining Twitter for trends, sentiment and influencers. Marketing Sherpa Blog. Retrieved September 2012. From http://sherpablog.marketingsherpa.com/social-networking-evangelism-community/twitter-data-mining/.…

    • 1017 Words
    • 5 Pages
    Better Essays
  • Satisfactory Essays

    Text Mining for Gold

    • 269 Words
    • 2 Pages

    Some of the drawbacks and shortcomings of text mining is that if the topic is unclear it would be hard to retrieve accurate data relevant to the subject because of the substantial amount of information…

    • 269 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    But, in this research work, we have built a sentiment analysis and trained it using natural language processing that has resulted in a very high trustee rate. This sentiment rate which is obtained is slightly to be higher than other algorithms that were previously proposed.…

    • 596 Words
    • 3 Pages
    Satisfactory Essays
  • Powerful Essays

    Cis 500 Data Mining Report

    • 2046 Words
    • 9 Pages

    Web mining to discover business intelligence from Web customers is used in a variety of ways because this technique is designed to discover patterns from the web. One of the most popular ways is to determine the search patterns for a particular group of people from a particular region. Other means include visiting e-commerce websites to determine what the best and worst sellers are. Additionally popular sites can also be identified by determining the number of links that refer to the site. Advantages of using techniques like this for businesses are increased sales because you have the ability to track a web users browsing behavior down to the mouse clicks. The applications of web mining enable a business to personalize services for individual customers on a massive scale. This helps businesses by satisfying customer needs and increasing brand loyalty. By using a personalized and customer oriented approach, the content of a website can be updated and adapted to a customer’s preference. Efforts like this ensure the right offers can be made to the right…

    • 2046 Words
    • 9 Pages
    Powerful Essays
  • Best Essays

    Access to Health Care

    • 2651 Words
    • 11 Pages

    Uzma R., Mitchell T., Day, T., and Hardin, M. (2008). Text mining in healthcare applications…

    • 2651 Words
    • 11 Pages
    Best Essays
  • Powerful Essays

    Midterm Paper

    • 2298 Words
    • 10 Pages

    With the increasing availability of online resources, collecting information on the Web and analyzing data play important roles in today’s problem solving task. 1.…

    • 2298 Words
    • 10 Pages
    Powerful Essays
  • Good Essays

    Alias Name

    • 1240 Words
    • 5 Pages

    An individual is typically referred by numerous name aliases on the web. Accurate identification of aliases of a given person name is useful in various web related tasks such as information retrieval, sentiment analysis, personal name disambiguation, and relation extraction. We propose a method to extract aliases of a given personal name from the web. Given a personal name, the proposed method first extracts a set of candidate aliases. Second, we rank the extracted candidates according to the likelihood of a candidate being a correct alias of the given name. We propose a novel, automatically extracted lexical pattern-based approach to efficiently extract a large set of candidate aliases from snippets retrieved from a web search engine. We define numerous ranking scores to evaluate candidate aliases using three approaches: lexical pattern frequency, word co-occurrences in an anchor text graph, and page counts on the web. To construct a robust alias detection system, we integrate the different ranking scores into a single…

    • 1240 Words
    • 5 Pages
    Good Essays
  • Good Essays

    How to Analyze a Web Page

    • 797 Words
    • 4 Pages

    Over the last twenty years the internet has exploded onto seen. Most webpages are unfortunately posted by people who do not do the research needed to provide individuals with the facts they are looking for. Because of this individuals who are looking for a proven webpage to find truthful information need to know how to analyze the site. Anyone can go on to the web and search for whatever they are looking for. For example, if someone searches “human services” more than 1.5 billion results are available and these results range anywhere from what is human services to how to become a human service worker. Because of this when someone wants information they Google it and will sometimes will take the first result they come to and believe it as fact. In this paper we will be looking at some of the ways to analyze the overwhelming results and how to determine what is relevant to the search.…

    • 797 Words
    • 4 Pages
    Good Essays
  • Better Essays

    Analysis of people‘s aspects, reactions, emotions, etc. regarding entities such as services, products, issues, events and their attributes based on feedback from Web pages is called opinion mining. Opinion mining is also called as sentiment analysis, opinion extraction, sentiment mining, and subjectivity analysis, affect analysis, emotion analysis, review mining, etc. [12]. Opinion mining becomes important for impact analysis and helps in making decisions on constructive developmental directions. It is a research area dealing with usual methods of opinion detection and extraction of sentiments presented in a text…

    • 773 Words
    • 4 Pages
    Better Essays
  • Good Essays

    Audience profiling

    • 1131 Words
    • 5 Pages

    The user may have read an article “Fifa World Cup 2014 Football Special Web Exclusive”. But, if we read the data from tag counting method, we can’t make out the relation between “Fifa World Cup 2014”, “Football” ,“Special” and “Web Exclusive”. Especially, the words “Special” and “Web Exclusive”…

    • 1131 Words
    • 5 Pages
    Good Essays
  • Powerful Essays

    Opinions categorizer: Next is to categorize these opinions into positive and negative category by using the Naive Bayes. Naïve Bayes Classifier is a well known probabilistic classifier which describes its application to text. In order to incorporate unlabelled data, the foundation Naïve Bayes was build. The task of learning of a generative model is to estimate the parameters using labeled training data only. The estimated parameters are used by the algorithm to classify new documents by calculating which class the generated the given document belongs to. The probabilities of the positive and negative count are found according to the nouns (features) using Naive Bayes classifier [47].…

    • 736 Words
    • 3 Pages
    Powerful Essays
  • Powerful Essays

    Bullet Screen Case Study

    • 1149 Words
    • 5 Pages

    While the use of natural language processing on “Bullet Screen” is beneficial to store and organize natural language, as authors provide users with search function to help users find the wonderful part of the video. According to Alexa, in January 2017, in all integrated authors site all over the world, Bilibili (http://search.bilibili.com/) ranked the 224th, while the number of its visitors ranked the 286th. It is the most active Chinese large-scale “Bullet Screen” video site. In this paper, research object is presented and related works about videos with “Bullet Screen” are detailed in 3. Research problems of the “Bullet Screen” retrieval system are discussed. Basic algorithms and the proposed ISB methods are demonstrated. Finally, the conclusions are provided. In fact, different types of “Bullet Screen” are applied in situations and have various characters. Subtitle is the result of the subtitle group (a small number of users) editing the video data. Most users use subtitle information rather than participate in the creation itself, which is similar to authorsb1.0. “Bullet Screen” is similar to Authorsb2.0, in which the users can be more interactive. In this process, the users can participate in the creation of the “Bullet Screen”. To some degree, texts “Bullet Screen” in live site have an impact on the broadcast itself. The anchor can communicate with the users through “Bullet Screen”. In this way, different “Bullet Screen” applications actually conform to the development of the Internet. Their development does not have a reciprocal relationship to each other. Whether it is subtitles, barrage or live barrage, it is not gone but applied in different situations. During the video viewing, the density of the “Bullet Screen” commentary is significantly correlated with the importance of the video…

    • 1149 Words
    • 5 Pages
    Powerful Essays
  • Good Essays

    Database Ralationship

    • 7781 Words
    • 32 Pages

    References: Kedar Bellare and Andrew McCallum. 2007. Learning extractors from unlabeled text using relevant databases. In Sixth International Workshop on Information Integration on the Web. Kedar Bellare and Andrew McCallum. 2009. Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment. In Empirical Methods in Natural Language Processing (EMNLP-09). Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine Learning, 79:151–175. John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In EMNLP. Razvan Bunescu and Raymond Mooney. 2007. Learning to extract relations from the web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL07). R. Bunescu and M. Pasca. 2006. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06). Ying Chen and James Martin. 2007. Towards Robust Unsupervised Personal Name Disambiguation. In EMNLP, pages 190–198. Silviu Cucerzan. 2007. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 708–716. Nilesh N. Dalvi, Ravi Kumar, Bo Pang, and Andrew Tomkins. 2009. Matching Reviews to Objects using a Language Model. In EMNLP, pages 609–618. Nilesh N. Dalvi, Ravi Kumar, and Bo Pang. 2012. Object matching in tweets with spatial models. In WSDM, pages 43–52. Hal Daum´ III, Abhishek Kumar, and Avishek Saha. e 2010. Frustratingly easy semi-supervised domain adaptation. In Proceedings of the ACL Workshop on Domain Adaptation (DANLP). D. Downey, M. Broadhead, and O. Etzioni. 2007. Locating complex named entities in web text. In Procs. of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007). Anthony Fader, Stephen Soderland, and Oren Etzioni. 2009. Scaling wikipedia-based named entity disambiguation to arbitrary web text. In Proceedings of the WikiAI 09 - IJCAI Workshop: User Contributed…

    • 7781 Words
    • 32 Pages
    Good Essays
  • Good Essays

    Ghjk

    • 681 Words
    • 3 Pages

    Final Fantasy is a media franchise created by Hironobu Sakaguchi. It centers on a series of fantasy and science fantasy role-playing video games (RPGs), but includes motion pictures, anime, printed media, and other merchandise. The first game in the series, published in 1987, was conceived by Sakaguchi as his last-ditch effort in the game industry; the title was a success and spawned sequels. The video game series has since branched into other genres. Although most Final Fantasy installments are supposedly independent stories with different settings and main characters, they feature identical elements that define the franchise. Plots center on a group of heroes battling a great evil while exploring the characters' internal struggles and relationships. The series has been commercially and critically successful; it is Square Enix's best selling video game franchise, with more than 100 million units sold, and one of the best-selling video game franchises. It was awarded a star on the Walk of Game in 2006, and holds seven Guinness World Records in the Guinness World Records Gamer's Edition 2008. It has also introduced many features now common in role-playing video games and has been credited with helping to popularize console-based RPGs in markets outside Japan. (Full article...)…

    • 681 Words
    • 3 Pages
    Good Essays

Related Topics