Preview

A Parameterized Approach to Spam-Resilient Link Analysis of the Web

Powerful Essays
Open Document
Open Document
13573 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
A Parameterized Approach to Spam-Resilient Link Analysis of the Web
1422

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,

VOL. 20,

NO. 10,

OCTOBER 2009

A Parameterized Approach to Spam-Resilient
Link Analysis of the Web
James Caverlee, Member, IEEE, Steve Webb, Member, IEEE,
Ling Liu, Senior Member, IEEE, and William B. Rouse, Fellow, IEEE
Abstract—Link-based analysis of the Web provides the basis for many important applications—like Web search, Web-based data mining, and Web page categorization—that bring order to the massive amount of distributed Web content. Due to the overwhelming reliance on these important applications, there is a rise in efforts to manipulate (or spam) the link structure of the Web. In this manuscript, we present a parameterized framework for link analysis of the Web that promotes spam resilience through a source-centric view of the
Web. We provide a rigorous study of the set of critical parameters that can impact source-centric link analysis and propose the novel notion of influence throttling for countering the influence of link-based manipulation. Through formal analysis and a large-scale experimental study, we show how different parameter settings may impact the time complexity, stability, and spam resilience of Web link analysis. Concretely, we find that the source-centric model supports more effective and robust rankings in comparison with existing Web algorithms such as PageRank.
Index Terms—Internet search, information search and retrieval, information storage and retrieval, information technology and systems, distributed systems, systems and software, Web search, general, Web-based services, online information services.

Ç
1

INTRODUCTION

T

HE Web is arguably the most massive and successful distributed computing application today. Millions of
Web servers support the autonomous sharing of billions of
Web pages. From its earliest days, the Web has been the subject of intense focus for organizing, sorting, and understanding its massive amount of data.



References: Statistics,” Proc. Seventh Int’l Workshop the Web and Databases (WebDB), 2004. First Int’l Workshop Adversarial Information Retrieval on the Web (AIRWeb), 2005. [3] C. Mann, “Spam þ Blogs ¼ Trouble,” Wired, 2006. [4] J.M. Kleinberg, “Authoritative Sources in a Hyperlinked Environment,” J. ACM, vol. 46, no. 5, 1999. Stanford Univ., 1998. Conf. Data Mining (ICDM), 2001. technical report, Stanford Univ., 2003. World Wide Web Conf. (WWW), 2004. World Wide Web Conf. (WWW), 2004. Patterns,” Proc. 14th ACM Conf. Hypertext and Hypermedia, 2003. Proc. 15th Int’l World Wide Web Conf. (WWW), 2006. Data Bases (VLDB), 2004. Principles of Distributed Computing (PODC), 2007. 14th Int’l World Wide Web Conf. (WWW), 2005. Wide Web Conf. (WWW), 2002. Conf. Web Intelligence (WI), 2005. Interest Group on Information Retrieval (SIGIR), 2005. 31st Int’l Conf. Very Large Data Bases (VLDB), 2005. World Wide Web Conf. (WWW), 2007. [30] M. Kendall and J.D. Gibbons, Rank Correlation Methods. Edward Arnold, 1990. (SIGIR), 2001. Technology, vol. 2, no. 3, 2002. Data Bases (VLDB), 2004. (ASLIB), vol. 56, no. 1, 2004. (SIGIR), 2004.

You May Also Find These Documents Helpful

  • Good Essays

    Reiter, A. (2008, 2 5). Internet Evolution. Retrieved 12 5, 2010, from Internet Evolution: http://www.internetevolution.com/author.asp?section_id=526&doc_id=144810…

    • 879 Words
    • 4 Pages
    Good Essays
  • Good Essays

    The Domain Name System creates it likely to allocate domain terms to crowds of Internet users in an expressive way, liberated of each user's physical site. Because of this, World-Wide Web hyperlinks and Internet contact info can continue reliable and endless smooth if the present Internet direction-finding preparations change or the member uses a portable device. Internet domain names are at ease to recall than IP addresses. Persons take benefit of this once they narrate expressive URLs and e-mail addresses without having to see how the mechanism will really find them.…

    • 453 Words
    • 2 Pages
    Good Essays
  • Satisfactory Essays

    Lab 4 Questions IS3110

    • 398 Words
    • 1 Page

    User downloads and clicks on an unknown e-mail attachment: Effective email attachment filtering and restrictions reduce the likelihood of malicious content entering the network.…

    • 398 Words
    • 1 Page
    Satisfactory Essays
  • Good Essays

    Google vs. Bing

    • 720 Words
    • 3 Pages

    Although on the surface the Google portal website and Microsoft Bing portal web site may look the same, when one looks deeper into the matter one website stacks up much higher than the other. When one adds up all the details included in the package, like design, programs, applications, and number of visitors to the website, a determination can be made as to which website is more user-friendly. The utilities discussed in this paper include: how the search engine generates its responses, and how the design persuades the user to stay within a specific search engine.…

    • 720 Words
    • 3 Pages
    Good Essays
  • Satisfactory Essays

    Mat 540 Quiz

    • 819 Words
    • 4 Pages

    Which of the following refers to developing useful information from the links included in the Web documents?…

    • 819 Words
    • 4 Pages
    Satisfactory Essays
  • Good Essays

    Email Bomb Attacks

    • 102 Words
    • 1 Page

    One variation on the mail bomb automatically subscribes a targeted user to hundreds or thousands of high volume Internet mailing lists, which fill the user’s mailbox and / or mail server. Bombers call this attack list linking. Examples of these mail bomb programs comprises of Unabomber, Extreme Mail, Avalanche, Voodoo, and Kaboom.…

    • 102 Words
    • 1 Page
    Good Essays
  • Better Essays

    Leadership Analysis Paper

    • 1468 Words
    • 6 Pages

    Sergey Brin; Lawrence Page (1998). "The Anatomy of a Large-Scale Hypertextual Web Search Engine". Stanford University. Stanford University. Retrieved 01 March 2014…

    • 1468 Words
    • 6 Pages
    Better Essays
  • Best Essays

    3. Bidgol H., “The Internet Encyclopedia”, Volume 3, 2004, J Wiley and Sons, New Jersey…

    • 3847 Words
    • 16 Pages
    Best Essays
  • Satisfactory Essays

    Fire Truck Crash

    • 318 Words
    • 2 Pages

    A high percentage of users follow unknown links, which can lead to a malicious website. Malicious…

    • 318 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    Do Artifacts Have Politics

    • 2293 Words
    • 10 Pages

    Introna, Lucas D. and Nissenbaum, Helen (2000) Shaping the Web: Why the Politics of Search Engines Matters…

    • 2293 Words
    • 10 Pages
    Powerful Essays
  • Best Essays

    Larry Page

    • 2395 Words
    • 10 Pages

    The idea began while searching a dissertation theme about exploring the mathematical properties of the World Wide Web. According to John Battelle, founder of “Wired” magazine page assumed that web links where just citations so his project named “Backrub” was about classifying and counting all the backlinks of the World Wide Web and according to Page it would make…

    • 2395 Words
    • 10 Pages
    Best Essays
  • Powerful Essays

    Humanities Course Paper

    • 1596 Words
    • 7 Pages

    Smith, R. J. (2010). How the internet has evolved in the past 10 years. Retrieved November 14, 2012 from…

    • 1596 Words
    • 7 Pages
    Powerful Essays
  • Powerful Essays

    Mozart to Metallica: A Comparison of Musical Sequences and Similarities Stuart Cunningham, Vic Grout & Harry Bergen Centre for Applied Internet Research (CAIR), University of Wales, NEWI Plas Coch Campus, Mold Road, Wrexham, LL11 2AW, North Wales, UK Tel: +44(0)1978 293583 Fax: +44(0)1978 293168 s.cunningham@newi.ac.uk | v.grout@newi.ac.uk | h.x.bergen@web.de Abstract Musical composition is a creative art, but is restricted by the limitations of the finite musical information that can be expressed.…

    • 5491 Words
    • 22 Pages
    Powerful Essays
  • Good Essays

    Google vs. Yahoo

    • 466 Words
    • 2 Pages

    Today, in our time of current technology we tend to rely on it more in everyday life. When using the internet, two of the most important websites are Google and Yahoo. They are two of the world’s biggest search engines, and also provide many other web and multimedia services to the world.…

    • 466 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    [2] F. Maggi et al (2013). Two years of Short URLs Internet Measurement: Security Threats and…

    • 6032 Words
    • 25 Pages
    Powerful Essays