Preview

Database Ralationship

Good Essays
Open Document
Open Document
7781 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Database Ralationship
Linking Named Entities to Any Database
Avirup Sil∗ Temple University Philadelphia, PA avi@temple.edu Yinfei Yang St. Joseph’s University Philadelphia, PA yangyin7@gmail.com Abstract
Existing techniques for disambiguating named entities in text mostly focus on Wikipedia as a target catalog of entities. Yet for many types of entities, such as restaurants and cult movies, relational databases exist that contain far more extensive information than Wikipedia. This paper introduces a new task, called Open-Database Named-Entity Disambiguation (Open-DB NED), in which a system must be able to resolve named entities to symbols in an arbitrary database, without requiring labeled data for each new database. We introduce two techniques for Open-DB NED, one based on distant supervision and the other based on domain adaptation. In experiments on two domains, one with poor coverage by Wikipedia and the other with near-perfect coverage, our Open-DB NED strategies outperform a state-of-the-art Wikipedia NED system by over 25% in accuracy.

Ernest Cronin∗ Penghai Nie St. Joseph’s University St. Joseph’s University Philadelphia, PA Philadelphia, PA ernest.cronin@gmail.com nph87903@gmail.com Ana-Maria Popescu Yahoo! Labs Sunnyvale, CA amp@yahoo-inc.com Alexander Yates Temple University Philadelphia, PA yates@temple.edu

referents, but exclusive focus on Wikipedia as a target for NED systems has significant drawbacks: despite its breadth, Wikipedia still does not contain all or even most real-world entities mentioned in text. As one example, it has poor coverage of entities that are mostly important in a small geographical region, such as hotels and restaurants, which are widely discussed on the Web. 57% of the named-entities in the Text Analysis Conference’s (TAC) 2009 entity linking task refer to an entity that does not appear in Wikipedia (McNamee et al., 2009). Wikipedia is clearly a highly valuable resource, but it should not be thought of as the only one. Instead of relying



References: Kedar Bellare and Andrew McCallum. 2007. Learning extractors from unlabeled text using relevant databases. In Sixth International Workshop on Information Integration on the Web. Kedar Bellare and Andrew McCallum. 2009. Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment. In Empirical Methods in Natural Language Processing (EMNLP-09). Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine Learning, 79:151–175. John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In EMNLP. Razvan Bunescu and Raymond Mooney. 2007. Learning to extract relations from the web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL07). R. Bunescu and M. Pasca. 2006. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06). Ying Chen and James Martin. 2007. Towards Robust Unsupervised Personal Name Disambiguation. In EMNLP, pages 190–198. Silviu Cucerzan. 2007. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 708–716. Nilesh N. Dalvi, Ravi Kumar, Bo Pang, and Andrew Tomkins. 2009. Matching Reviews to Objects using a Language Model. In EMNLP, pages 609–618. Nilesh N. Dalvi, Ravi Kumar, and Bo Pang. 2012. Object matching in tweets with spatial models. In WSDM, pages 43–52. Hal Daum´ III, Abhishek Kumar, and Avishek Saha. e 2010. Frustratingly easy semi-supervised domain adaptation. In Proceedings of the ACL Workshop on Domain Adaptation (DANLP). D. Downey, M. Broadhead, and O. Etzioni. 2007. Locating complex named entities in web text. In Procs. of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007). Anthony Fader, Stephen Soderland, and Oren Etzioni. 2009. Scaling wikipedia-based named entity disambiguation to arbitrary web text. In Proceedings of the WikiAI 09 - IJCAI Workshop: User Contributed Knowledge and Artificial Intelligence: An Evolving Synergy. Xianpei Han and Jun Zhao. 2009. Named entity disambiguation by leveraging Wikipedia semantic knowledge. In Proceeding of the 18th ACM Conference on Information and Knowledge Management (CIKM), pages 215–224. Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Furstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum1. 2011. Robust Disambiguation of Named Entities in Text. In EMNLP, pages 782–792. Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. KnowledgeBased Weak Supervision for Information Extraction of Overlapping Relations. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Fei Huang and Alexander Yates. 2009. Distributional representations for handling sparsity in supervised sequence labeling. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and Soumen Chakrabarti. 2009. Collective annotation of wikipedia entities in web text. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 457–466. Tom Kwiatkowski, Luke Zettlemoyer, Sharon Goldwater, and Mark Steedman. 2011. Lexical Generalization in CCG Grammar Induction for Semantic Parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). M.E. Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the SIGDOC Conference. Thomas Lin, Mausam, and Oren Etzioni. 2012. Entity linking at web scale. In Knowledge Extraction Workshop (AKBC-WEKEX), 2012. D.C. Liu and J. Nocedal. 1989. On the limited memory method for large scale optimization. Mathematical Programming B, 45(3):503–528. G.S. Mann and D. Yarowsky. 2003. Unsupervised personal name disambiguation. In CoNLL. Paul McNamee, Mark Dredze, Adam Gerber, Nikesh Garera, Tim Finin, James Mayfield, Christine Piatko, Delip Rao, David Yarowsky, and Markus Dreyer. 2009. HLTCOE Approaches to Knowledge Base Population at TAC 2009. In Text Analysis Conference. Rada Mihalcea and Andras Csomai. 2007. Wikify!: Linking documents to encyclopedic knowledge. In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM), pages 233–242. Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL-2009), pages 1003–1011. Patrick Pantel and Ariel Fuxman. 2011. Jigs and Lures: Associating Web Queries with Structured Entities. In ACL. L. Ratinov, D. Roth, D. Downey, and M. Anderson. 2011. Local and global algorithms for disambiguation to wikipedia. In Proc. of the Annual Meeting of the Association of Computational Linguistics (ACL). Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Proceedings of the Sixteenth European Conference on Machine Learning (ECML-2010), pages 148–163. Avi Silberschatz, Henry F. Korth, and S. Sudarshan. 2010. Database System Concepts. McGraw-Hill, sixth edition. Daniel S. Weld, Raphael Hoffmann, and Fei Wu. 2009. Using Wikipedia to Bootstrap Open Information Extraction. In ACM SIGMOD Record. Limin Yao, Sebastian Riedel, and Andrew McCallum. 2010. Collective cross-document relation extraction without labelled data. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP-2010), pages 1013–1023. Yiping Zhou, Lan Nie, Omid Rouhani-Kalleh, Flavian Vasile, and Scott Gaffney. 2010. Resolving surface forms to wikipedia topics. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling), pages 1335–1343.

You May Also Find These Documents Helpful

  • Best Essays

    INFS1602 Assignment A

    • 3808 Words
    • 16 Pages

    16. X Ning, H. J. (2008). RSS: A Framwork Enabling Ranked Research on the Semantic Web. Information Processing and Management .…

    • 3808 Words
    • 16 Pages
    Best Essays
  • Powerful Essays

    Nt1330 Unit 1 Assignment

    • 883 Words
    • 4 Pages

    Name Entity (NE) is an expression that refers to proper names such as persons, locations, and organizations. For example: Arafat Awajan is a full professor at Princess Sumaya University for Technology in Jordan, then Arafat Awajan, Princess Sumaya University for Technology, and Jordan would be identified as reference to person, an organization, and location, respectively. The task that attempts to locate, extract, and automatically classify named entities into predefined classes or types in open-domain and unstructured texts, such as newspaper articles, was called Name Entity Recognition (NER)[Shaalan 2014].…

    • 883 Words
    • 4 Pages
    Powerful Essays
  • Better Essays

    Updike, John. “A&P.” Blackboard. ed. ENG 102-329. Ed. Gina Yanuzzi. Mount Laurel: BCC, Spring 2013. 1-8. Electronic.…

    • 1171 Words
    • 5 Pages
    Better Essays
  • Satisfactory Essays

    information. Wikipedia has proven to be too unreliable for a variety of reasons for it to be trusted…

    • 925 Words
    • 4 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Ldr/531

    • 361 Words
    • 2 Pages

    Wikipedia is a virtual library that offers its readers a diverse range of information from articles, books, dictionary, references, thesaurus, timelines, and current events.…

    • 361 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Philip Lau, writer of the essay, “The Limitations of Wikipedia”, is successful in persuading his readers that the webpage Wikipedia should not be used for college level research. In his essay, Philip states that, “Wikipedia can be a beneficial starting point in gaining general information on a subject but users should be wary of incorrect information”. The essayist’s use of examples, facts and quotes are what makes his argument so convincing.…

    • 586 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Isds Ch 5

    • 3328 Words
    • 14 Pages

    11) By applying a learning algorithm to parsed text, researchers from Stanford University's NLP lab have…

    • 3328 Words
    • 14 Pages
    Good Essays
  • Better Essays

    Dave Thomas was an All American philanthropist as well as a most successful business man. Thomas was the founder and CEO of Wendy 's Old Fashioned Hamburgers, which became popular for its square patties. He is also known for personally appearing in eight hundred television commercials for the chain from 1989 to 2002, more than any other person not just in the fast food industry but in television history (Newsweek 1). He created such an atmosphere in these ads that much of the public began to believed he was a professional actor. Starting what would be his long business venture at only fifteen, Dave Thomas would change the face of America (Wikipedia 1).…

    • 1115 Words
    • 5 Pages
    Better Essays
  • Good Essays

    Evaluation of Wikipedia

    • 636 Words
    • 3 Pages

    Cited: Miller, Nora. "Wikipedia Revisited." ETC: A Review Of General Semantics 64.2 (2007): 147-150. Academic…

    • 636 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Wikipedia is a free online encyclopedia. It contains millions of articles and depends mainly on volunteers and contributors to ensure that information are up to date. This is an open source that anyone can add and edit articles; so information may not always be accurate. It is the quickest and easier way to gain general knowledge on a specific topic. An encyclopedia is generally not a good source of reference in a paper; rather it can be a starting point in research the topic. The issue of reliability and credibility arises because of lack of credentials of editors, including biased view in content, articles are not peer-reviewed before publishing, source cited may be invalid, style, and tone lacks purpose. Some advantages of Wikipedia are clarification on topics and information is in clear simple terms for easy understanding.…

    • 875 Words
    • 4 Pages
    Good Essays
  • Better Essays

    The modern computer world brought major changes around us; it introduced a modern way of doing research through the evolution of Wikipedia. “If we value the pursuit of knowledge, we must be free to follow wherever that search may lead us. The free mind is not a barking dog, to be tethered on a ten-foot chain” (Stevenson Jr., 1900-1965).…

    • 1083 Words
    • 5 Pages
    Better Essays
  • Powerful Essays

    Agency Paper

    • 2391 Words
    • 6 Pages

    Agency Paper Sandra Halbeisen October 8, 2013 SOWK 6151 Professor Cynthia Medina Our Lady of The Lake University…

    • 2391 Words
    • 6 Pages
    Powerful Essays
  • Best Essays

    During recent years there has been an expansion of information system covering the same domain. These systems need more and more to communicate, collaborate and exchange content with each other to achieve common goals. Many domains know this revolution, including the learning. Learning systems have a rich and varied content must be accessible, sharable, and exchangeable while keeping the same interpretation during exchange. So we need semantic interoperability between learning systems.…

    • 1899 Words
    • 8 Pages
    Best Essays
  • Good Essays

    wiki

    • 505 Words
    • 3 Pages

    Wikipedia's departure from the expert-driven style of encyclopedia building and the presence of a large body of unacademic content have received extensive attention in print media. In 2006, Time magazine recognized Wikipedia's participation in the rapid growth of online collaboration and interaction by millions of people around the world, in addition to YouTube, Reddit, MySpace, and Facebook.[15] Wikipedia has also been praised as a news source due to articles related to breaking news often being rapidly updated.[16][17][18]…

    • 505 Words
    • 3 Pages
    Good Essays
  • Satisfactory Essays

    docum

    • 625 Words
    • 5 Pages

    all the... Document Details Views: 25 Words: 348 Cite This Essay Ready to get started? Upgrade Products Essays AP Notes Book Notes Citation…

    • 625 Words
    • 5 Pages
    Satisfactory Essays