Preview

asan

Good Essays
Open Document
Open Document
9931 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
asan
IMPLENTING K-MEANS ITERATION CLUSTERING IN HADOOP MAP/REDUCE A PROJECT REPORT Submitted by ABIRAMI.R (100105311003) BABY LALITHA.S (100105311008) NITHYA.C (100105311043) RADHA.S (100105311048) in partial fulfillment for the award of the degree of BACHELOR OF ENGINEERING in COMPUTER SCIENCE AND ENGINEERING NANDHA COLLEGE OF TECHNOLOGY, ERODE ANNA UNIVERSITY :: CHENNAI 600 025 APRIL 2014 ANNA UNIVERSITY:: CHENNAI 600 025

BONAFIDE CERTIFICATE
Certified that this project report “IMPLEMENTING K-MEANS ITERATION CLUSTERING IN HADOOP MAP/REDUCE” is the bonafide work of “ABIRAMI.R (100105311003), BABY LALITHA.S (100105311008), NITHYA.C (100105311043), RADHA.S (100105311048)” who carried out the project work under my supervision.
SIGNATURE SIGNATURE
Mr. M.VIJAYAKUMAR M.E., Mr. P.KUMAR M.E., MISTE.,(Ph.D).,
HEAD OF THE DEPARTMENT SUPERVISER
Associate Professor, Assistant Professor,
Department of Computer Science Department of Computer Science and Engineering, and Engineering,
Nandha College of Technology, Nandha College of Technology,
Erode -52. Erode -52.

Submitted for the university project viva – voce examination held on ………….....

--------------------------------- ----------------------------------
Internal Examiner External Examiner

ACKNOWLEDGEMENT
We express our thanks to our beloved Chairman of Sri Nandha Educational Trust Thiru. V. Shanmugan, B.Com., and our beloved Secretaries, Thiru. S. Nandhakumar Pradeep, M.B.A., of Sri Nandha Educational Trust and Thiru. S. Thirumoorthi, B.P.T., of Sri Nandha



References: 1. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters.Communications of The ACM 51(1), 107–113 (2008) 2 3. Lammel, R.: Google’s MapReduce Programming Model - Revisited. Science of Computer Programming 70, 1–30 (2008) 4 5. Ghemawat, S., Gobioff, H., Leung, S.: The Google File System. In: Symposium on Operating Systems Principles, pp. 29–43 (2003) 6 7. Borthakur, D.: The Hadoop Distributed File System: Architecture and Design (2007) 8 9. Apache Hadoop. http://hadoop. apache. org/ J. Venner, Pro Hadoop. Apress, June 22, 2009  10 13. F. P. Junqueira, B. C. Reed. "The life and times of a zookeeper, " In Proc. of the 28th ACM Symposium on Principles of Distributed Computing, Calgary, AB, Canada, August 10-12, 2009 Full Text: Access at ACM  14 15. Description if Single Node Cluster Setup at: http://www. michael-noll. com/tutorials/running-hadoop-on-ubuntu-l inux-single-node-cluster/ visited on 21st January, 2012  16

You May Also Find These Documents Helpful

  • Good Essays

    Then we start the MapReduce daemons: the JobTracker is started on master, and TaskTracker daemons are started on all slaves (here: master and slave).…

    • 1876 Words
    • 8 Pages
    Good Essays
  • Better Essays

    References: Attiya, H., & Welch, J. (2004). Distributed Computing: Fundamentals, Simulations, and Advanced Topics . : Wiley-Interscience.…

    • 3954 Words
    • 16 Pages
    Better Essays
  • Good Essays

    Reduce: Reduce step processes the data from the slave nodes and outputs from the map task serves as the input to reduce task and to form the final and ultimate output.…

    • 496 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    [4] Storage Conference. The Hadoop Distributed File System http://storageconference.org/ 2010/ Papers/ MSST/Shvachko.pdf [5] A Tutorial on Clustering Algorithms. K-Means Clustering http://home.dei.polimi.it/matteucc/ Clustering/ tutorial_html/kmeans.html [6] International Journal of Computer Science Issues. Setting up of an Open Source based Private Cloud http://ijcsi.org/papers/IJCSI-8-3-1-354-359.pdf [7] Eucalyptus. Modifying a prepackaged image http://open.eucalyptus.com/participate/wiki/modifyi ng-prepackaged-image [8] Michael G. Noll. Running Hadoop On Ubuntu Linux (Single-Node Cluster) http://www.michaelnoll.com/tutorials/running-hadoop-on-ubuntu-linuxsingle-node-cluster/ [9] 8K Miles Cloud Solutions. Hadoop: CDH3 – Cluster (Fully-Distributed) Setup http://cloudblog.8kmiles.com/2011/12/08/hadoopcdh3-cluster-fully-distributed-setup/ [10] Apache Mahout. Creating Vectors from Text https://cwiki.apache.org/MAHOUT/creatingvectors-from-text.html…

    • 3006 Words
    • 13 Pages
    Powerful Essays
  • Powerful Essays

    References: Brown, B., Chiu, M., Manyika, J. (2011), Are you ready for the era of big data? Retrieved…

    • 1755 Words
    • 6 Pages
    Powerful Essays
  • Satisfactory Essays

    A group of MapReduce jobs G= {0, 1,……g} and a group of Task-Trackers SS = {0,1,…..s}. We also state m and SS to index into the sets of jobs and Task-Trackers. For each TaskTracker S we correlate a series of resources, P = {0,1,….p}. Every resource of Task-Tracker S contains a correlated capacity V. We also take into account the disk bandwidth, memory and CPU capacities for each TaskTracker and our algorithm is designed to contain other resources such as storage capacity. A MapReduce job, (m) contains a group of tasks, called as offering time, that can be shared into map tasks and reduce tasks. Each TaskTracker S gives the cluster a group of job-slots in which tasks can execute. Each job-slot is given a specific job, and the scheduler will…

    • 197 Words
    • 1 Page
    Satisfactory Essays
  • Best Essays

    IBM SUPERCOMPUTER, WATSON

    • 2209 Words
    • 9 Pages

    The ability to coordinate all of these processors into one functioning logarithmic unit required a group of engineers from IBM to develop a specialized kernel-based virtual machine implementation with the ability to process eighty Tera-flops per seconds . The software that allowed all of this to occur is called Apache Hadoop. Hadoop is an open source framework software that is used to organize and manage grid computing environments. Since the theoretical limit of processors with current technology is set at a central processing unit (CPU) clock speed of three giga-hertz, a software model to enhance parallel processing for supercomputers had to be developed. With the use of Hadoop the programmers at IBM were able to more easily write applications for Watson that benefitted and took advantage of parallel processing to increase the speed at which problems could be solved and questions could be answered. The main reason why this makes things faster is the fact that one question can be researched in multiple paths at one time using parallel processing paths…

    • 2209 Words
    • 9 Pages
    Best Essays
  • Powerful Essays

    Cloud Security Report

    • 9993 Words
    • 40 Pages

    [31] Badger, L., Grance, T., Patt-Comer, R., Voas, J. (2012) ‘Cloud Computing Synopsis and Recommendations: Recommendations of the National Institute of Standards and Technology’ NIST Special Publication.…

    • 9993 Words
    • 40 Pages
    Powerful Essays
  • Powerful Essays

    Business Trend Memo

    • 1299 Words
    • 6 Pages

    Hayes, B. (2008, March), “Cloud computing”, Communications of the ACM, 51, 9-11. Retrieved from http://cacm.acm.org/magazines/2008/7/5368-cloud-computing/fulltext…

    • 1299 Words
    • 6 Pages
    Powerful Essays
  • Better Essays

    I would argue that the advertisement “The Foamy Spot” by Rosewood violates the FTC’s standard on deceptive advertising. A number of rules and suggestions laid out in the FTC Deception Policy Statement and the FTC Policy Statement Regarding Advertising Substantiation Program back up my claim.…

    • 794 Words
    • 4 Pages
    Better Essays
  • Best Essays

    Davenport, T. H., Barth, P., & Bean, R. (2012). How 'Big Data ' is different. MIT Sloan…

    • 2200 Words
    • 9 Pages
    Best Essays
  • Good Essays

    Guilty Agent Detection

    • 3833 Words
    • 16 Pages

    Certified that this project report “GUILTY AGENT DETECTION USING DATA ALLOCATION STRATEGIES” is the bonafide work of “JOSEPHINE SHERRINA FERNANDO (31708205050) and S.M.JOY MERCY (31708205051)”who carried out the project work under my supervision.…

    • 3833 Words
    • 16 Pages
    Good Essays
  • Good Essays

    Google's Monopoly

    • 1142 Words
    • 5 Pages

    Google, the little [search] engine that could do whatever it desired, was birthed in 1998 to a world in search of greater knowledge at a faster rate. Google served just that purpose with its collection of hard drives loaded with a vast array of information running through cyberspace. Its mission “is to organize the world’s information and make it universally accessible and useful” (Google).…

    • 1142 Words
    • 5 Pages
    Good Essays
  • Good Essays

    OC-AHPPSO Case Study

    • 891 Words
    • 4 Pages

    The node in the neighbor table which has the maximum weight, claims to be the cluster-head, then applied to conduct optimization using PSO and changes its state as the cluster-head node. The cluster-head nodes broadcast their announcement messages within their transmission range. The nodes that receive this message will change their state to 'cluster-member' and joins the cluster. The maintenance of clusters, using the OC-AHPPSO, is also…

    • 891 Words
    • 4 Pages
    Good Essays
  • Satisfactory Essays

    Data Science

    • 455 Words
    • 2 Pages

    Topics likely to be covered on this exam include: Big Data Analytics, and the Data Scientist Role • The characteristics of Big Data • The practice of analytics • The role and required skills of a Data Scientist Data Analytics Lifecycle • Discovery • Data preparation • Model planning and building • Communicating results • Operationalizing a data analytics project Initial Analysis of the Data • Using basic R commands to analyze data • Using statistical measures and visualization to understand data • The theory, process, and analysis of results to evaluate a model Advanced Analytics for Big Data – Theory and Methods • K-means clustering • Association rules • Linear regression • Naïve Bayesian classifiers • Decision trees • Time Series Analysis • Text Analytics Advanced Analytics for Big Data – Technology and Tools • MapReduce •…

    • 455 Words
    • 2 Pages
    Satisfactory Essays

Related Topics