Aditya Jadhav, Mahesh Kukreja
E-mail: aditya.jadhav27@gmail.com & mr_mahesh_in@yahoo.co.in
Abstract : In the information industry, huge amount of data is widely available and there is an imminent need for turning such data into useful information. This need is fulfilled by the process of exploration and analysis, by automatic or semi-automatic means, of large quantities of data provided by Data Mining. In case of a single system with few processors, there are restrictions on the speed of processing as well as the size of the data that can be processed at a time. The speed as well as the limit on the size of the data to be processed can be increased if data mining is carried out in parallel fashion with the help of the coordinated systems connected in LAN. But the problem with this solution is that LAN is not elastic, i.e. the number of systems in which the work is to be distributed on basis of the size of the data to be processed cannot be changed. Our main aim is to distribute data to be analyzed in various nodes in cloud. For optimum data distribution and efficient data mining as per user’s desire, various algorithms must be implemented.
3.
Elasticity: Computing resources can be rapidly increased or decreased as needed, as well as released for other uses when they are no longer required. Pay as you go: Remittance for only the resources actually used and for only the time used must be done.
4.
1.2 Virtualization In computing, the creation of a virtual (rather than actual) version of something, such as a hardware platform, operating system, a storage device or network resources is known as Virtualization. Virtualization can be viewed as part of an overall trend in enterprise IT that includes autonomic computing, a scenario in which the IT environment will be able to manage itself based on perceived activity, and utility computing, in which computer processing power is seen as
References: [1] Eucalyptus. The Eucalyptus Open-source Cloudcomputing System. http://open.eucalyptus.com/ documents / ccgrid2009.pdf [2] Hadoop Wiki http://wiki.apache.org/hadoop/ [3] Dell. Introduction to Hadoop http://content.dell.com/ us/en/business/d/business~solutions~whitepapers~en /Documents~hadoop-introduction.pdf.aspx [4] Storage Conference. The Hadoop Distributed File System http://storageconference.org/ 2010/ Papers/ MSST/Shvachko.pdf [5] A Tutorial on Clustering Algorithms. K-Means Clustering http://home.dei.polimi.it/matteucc/ Clustering/ tutorial_html/kmeans.html [6] International Journal of Computer Science Issues. Setting up of an Open Source based Private Cloud http://ijcsi.org/papers/IJCSI-8-3-1-354-359.pdf [7] Eucalyptus. Modifying a prepackaged image http://open.eucalyptus.com/participate/wiki/modifyi ng-prepackaged-image [8] Michael G. Noll. Running Hadoop On Ubuntu Linux (Single-Node Cluster) http://www.michaelnoll.com/tutorials/running-hadoop-on-ubuntu-linuxsingle-node-cluster/ [9] 8K Miles Cloud Solutions. Hadoop: CDH3 – Cluster (Fully-Distributed) Setup http://cloudblog.8kmiles.com/2011/12/08/hadoopcdh3-cluster-fully-distributed-setup/ [10] Apache Mahout. Creating Vectors from Text https://cwiki.apache.org/MAHOUT/creatingvectors-from-text.html [11] Amgad Madkour Blog. KMeans Clustering Using Apache Mahout http://amgadmadkour.blogspot.in /2012/07/kmeans-clustering-using-apachemahout.html ISSN (Print): 2278-5140, Volume-1, Issue – 2, 2012 36