Preview

truth discovery

Good Essays
Open Document
Open Document
4486 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
truth discovery
Truth Discovery with Multiple Conflicting Information
Providers on the Web




Xiaoxin Yin
UIUC
xyin1@cs.uiuc.edu

Jiawei Han
UIUC
hanj@cs.uiuc.edu

ABSTRACT

Philip S. Yu
IBM T. J. Watson Res. Center psyu@us.ibm.com of information on the web. Even worse, different web sites often provide conflicting information, as shown below.

The world-wide web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web.
Moreover, different web sites often provide conflicting information on a subject, such as different specifications for the same product. In this paper we propose a new problem called Veracity, i.e., conformity to truth, which studies how to find true facts from a large amount of conflicting information on many subjects that is provided by various web sites.
We design a general framework for the Veracity problem, and invent an algorithm called TruthFinder, which utilizes the relationships between web sites and their information, i.e., a web site is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy web sites. Our experiments show that TruthFinder successfully finds true facts among conflicting information, and identifies trustworthy web sites better than the popular search engines.
Keywords: data quality, web mining, link analysis.

Example 1: Authors of books. We tried to find out who wrote the book “Rapid Contextual Design” (ISBN:
0123540518). We found many different sets of authors from different online bookstores, and we show several of them in
Table 1. From the image of the book cover we found that
A1 Books provides the most accurate information. In comparison, the information from Powell’s books is incomplete, and that from Lakeside books is incorrect.
Web site
A1 Books
Powell’s books
Cornwall books
Mellon’s books
Lakeside books
Blackwell



References: Transactions on Internet Technology, 5(1):231–297, 2005. Technical report, Stanford Digital Library Technologies Project, 1998. http://mathworld.wolfram.com/SigmoidFunction.html This query was submitted on Feb 7, 2007.

You May Also Find These Documents Helpful

  • Powerful Essays

    Describe what kinds of information are (and are not) available on the Internet and from other…

    • 4452 Words
    • 19 Pages
    Powerful Essays
  • Powerful Essays

    evaluation of the WMM

    • 2100 Words
    • 9 Pages

    -used as a direct source for information, but also to check information found on the internet sites (sites shown below)…

    • 2100 Words
    • 9 Pages
    Powerful Essays
  • Good Essays

    Using the Internet to find information, it's important to use a search engine to find the information that you need on the Internet. Search engines look for websites based on the criteria a person enters into the search box. The best-known search engine is Google, other popular search engines include Yahoo! search, Bing and…

    • 2519 Words
    • 11 Pages
    Good Essays
  • Satisfactory Essays

    In addition, I felt I received a great amount of information regarding analyzing internet sources and found this lab very helpful.…

    • 269 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    Metzger, M 2007, 'Making sense of credibility on the web: models for evaluating online information and recommendations for future research ', Journal of the American Society for Information Science & Technology, vol. 58, no. 13, pp.2078-2091.…

    • 1633 Words
    • 5 Pages
    Powerful Essays
  • Best Essays

    Websites can be created by anyone with access to a computer and internet [1] and are subject too misleading or incorrect information whether accidentally or maliciously [2]. There are over 70…

    • 2377 Words
    • 10 Pages
    Best Essays
  • Satisfactory Essays

    Mat 540 Quiz

    • 819 Words
    • 4 Pages

    Which of the following refers to developing useful information from the links included in the Web documents?…

    • 819 Words
    • 4 Pages
    Satisfactory Essays
  • Good Essays

    Metzger, M. (2007). Making Sense of Credibility on the Web: Models for Evaluating Online Information and Recommendations for Future Research. Journal of the American Society for Information Science and Technology, 58(13):2078–2091, DOI 10.1002/asi…

    • 665 Words
    • 3 Pages
    Good Essays
  • Powerful Essays

    hum176

    • 1484 Words
    • 6 Pages

    The simplest way is to look at the internet as a vast array of knowledge and ask who you trust to be fair and accurate. If you get your news from newspapers or from television, then you can easily go to their web sites for information. Another way would be to use the WIKI websites. These sites are open for all to post, but are also monitored for their accuracy. The largest of all would be Wikipedia. Even though it is made up of User Generated Content (UGC), recent studies have shown it to be reliable as well. It also allows you to see past updates from users so you can judge for yourself. Many experts also have blogs that are easy to access.…

    • 1484 Words
    • 6 Pages
    Powerful Essays
  • Powerful Essays

    As a researcher, I must be aware of what information from the Internet. I must keep a look out for bias opinions, and cited materials.…

    • 535 Words
    • 3 Pages
    Powerful Essays
  • Good Essays

    Since the internet is where we get most of our information, it is important to examine the source of the information and ensure factual evidence and not the author’s point of view. In evaluating the credibility of internet sources, one must examine whether information is a fact or the author’s opinion. Does it contain original information or simply just links? Is the…

    • 466 Words
    • 2 Pages
    Good Essays
  • Good Essays

    Today in society, source credibility is becoming a bigger issue. Many people and students now a days find themselves in a predicament where credible sources are needed to help with research projects, essays, books etc. The internet is a big place and is filled with insane amounts of information for societies reach. The question is whether these sources are credible sources or just information filled in by average person with little experience or knowledge in the subject.…

    • 1089 Words
    • 5 Pages
    Good Essays
  • Powerful Essays

    In A Ted Talk Analysis

    • 1433 Words
    • 6 Pages

    It’s so easy to create a professional website these days, one that looks trustworthy and reliable. Of course, that does not mean that the information contained within it is also reliable. You would think that would be obvious, we would all like to think that we’ve impervious to bullshit. Alas, that is not the case. We get tricked, mislead, and manipulated, and very often we’re totally unaware of it.…

    • 1433 Words
    • 6 Pages
    Powerful Essays
  • Satisfactory Essays

    workplace ethics

    • 278 Words
    • 2 Pages

    Part 1: For this section of the project, you will select your topic and begin to conduct Internet research related to this topic. You should review a minimum of three sources for this project. Take notes on each source using the following points as a guide:…

    • 278 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Internet Credibility

    • 358 Words
    • 2 Pages

    Indeed, the internet has provided my profession the ability to gather useful information used to resolve situations and better our clientele’s wellbeing. Although, not all the information found on the internet is credible, it is actually the intelligence of the person gathering the information to determine the credibility of…

    • 358 Words
    • 2 Pages
    Satisfactory Essays