Preview

Left Wing Extrimism

Powerful Essays
Open Document
Open Document
7254 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Left Wing Extrimism
WORDNET BASED DOCUMENT CLUSTERING
Ashok Chirla Computer Science Engineering, V.R.Siddhartha Engineering College, Kanuru, Vijayawada, A.P., India ashok.chirla@gmail.com. Abstract— Document clustering is considered as an important tool in the fast developing information explosion era. It is the process of grouping text documents into category groups and has found applications in various domains like information retrieval, web information systems. Ontology based computing is emerging as a natural evolution of existing technologies to design with the information onslaught. In current dissertation work, background knowledge derived from WordNet as ontology is applied during preprocessing of documents for document clustering. Document vectors constructed from WordNet synsets is used as input for clustering. Comparative analysis is done between clustering using k-means and clustering using bi- secting k-means. A document Categorization tool is developed which summarizes the hierarchy of concepts obtained from WordNet during clustering phase. GUI tool contains the association between WordNet concepts and documents belonging to the concept. Keywords: Document clustering, Ontology, BOW, POS Tagging, Stemming, Labeling, bisecting k-means algorithm.
I. INTRODUCTION
With the abundance of text documents available through the Web and corporate document management systems, the partitioning of document sets into previously unseen categories ranks high on the priority list for many applications like business intelligence systems. Nowadays the problem is often not to access text information but to select the relevant documents [2].

The steady development of computer hardware technology in the last few years has led to large supplies of powerful and affordable computers, data collection equipments, and storage media. These technologies provide good support to the database and information industry and make a huge number of databases and information repositories



References: [1] A.Hotho and S.Staab A.Maedche (2001), “Ontology- based Text Clustering”, In proceedings of the IJCAI-2001 workshop Text Learning Beyond Supervision. [3] Michael Steinbach, George Karypis and Vipin Kumar (2001), “A Comparison of Document Clustering Techniques”, Department of Computer Science and Engineering, University of Minnesota, Technical Report 00-034. [4] Fellbaum, Christiane (2005), “WordNet and wordnets”, In Brown, Keith et al. (eds.), Encyclopedia of Language and Linguistics, Second Edition, Oxford: Elsevier, 665-670. [9] S C Punitha, K Mugunthadevi and M Punithavalli (2011), “Impact of Ontology based Approach on Document Clustering” International Journal of Computer Applications 22(2):22–26, May 2011. Published by Foundation of Computer Science. [10] Sam Scott, Stan Matwin(1997), “Text Classification Using WordNet Hypernyms”, Computer Science Dept., University of Ottawa, Ottawa, Canada.

You May Also Find These Documents Helpful

  • Best Essays

    INFS1602 Assignment A

    • 3808 Words
    • 16 Pages

    16. X Ning, H. J. (2008). RSS: A Framwork Enabling Ranked Research on the Semantic Web. Information Processing and Management .…

    • 3808 Words
    • 16 Pages
    Best Essays
  • Satisfactory Essays

    Pt1420 Unit 1 Assignment

    • 303 Words
    • 2 Pages

    The object is to discover terms that have comparative idea or importance as the given term. The Concept Insights benefit performs applied investigation and ordering of archives chosen by the client. The administration fabricates a calculated model in view of the given archives and uses the model to scan for theoretically comparative reports. The relations between the reports are displayed in a chart that is likewise offered to the client. The framework downloads information from the free online reference book…

    • 303 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    Costco Wholesale

    • 1824 Words
    • 9 Pages

    Since 1983, Costco Wholesale has risen to the top as the most proficient, efficient, and effective wholesale distributors in the world. By using a strategy based around ultra-low prices, a limited selection of nationally branded private labeled products, a treasure hunt shopping environment and operating with low operating costs, as well as geographic expansion, Costco has been able to distinguish itself from its competitors as the leading wholesale provider in the world.…

    • 1824 Words
    • 9 Pages
    Powerful Essays
  • Better Essays

    Created in many different forms and formats, data is collected, processed, stored, and retrieved by business to support the many informational needs of organizations.�� INCLUDEPICTURE "https://api.turnitin.com/images/spacer.gif" * MERGEFORMATINET �� HYPERLINK "javascript:void(0);" Business data enters an organization 's information system through software applications. The software applications process and code the data with proprietary formats that are difficult to extract or report without the help of sophisticated report writer or data extraction tools.�� INCLUDEPICTURE "https://api.turnitin.com/images/spacer.gif" * MERGEFORMATINET �� HYPERLINK "javascript:void(0);" Data is the heart of any business. Without good data turned into information, management can not make the proper decisions.�� INCLUDEPICTURE "https://api.turnitin.com/images/spacer.gif" * MERGEFORMATINET �� HYPERLINK "javascript:void(0);" The advances in computer processing power, storage capabilities, and the development of more ways to add information to data have paved the way for a radically new approach to collecting, storing, retrieving, and reporting business information: to build an entire information…

    • 1645 Words
    • 7 Pages
    Better Essays
  • Good Essays

    Concept Briefing

    • 670 Words
    • 3 Pages

    Cataloging is a register of all bibliographic items found in the library. Items can be any kind of entity that is a library based material (book, magazine, audiobook, etc.). Bibliographic control, cataloging teaches us, encompasses all the activities involved in creating, organizing, managing, and maintaining the file of an entity record. To maintain consistency in multiple matching entities, catalogers use the process of collocation to bring them together. The better the catalog, the higher the credibility a library has with its users. Users’ are more content with fast, accurate and effective retrieval of information.…

    • 670 Words
    • 3 Pages
    Good Essays
  • Good Essays

    My essay consists of information about the Canadian Confederation. I included facts with references on my bibliography. On my essay it includes the conferences that happened in establishing the Canadian Conference. I added positive effect of the Canadian confederation and the problems colonies were facing before the Canadian Confederacy was held. Also a conclusion paragraph which includes some of my opinions and why I think it was a good thing that the Canadian confederation was held.…

    • 711 Words
    • 3 Pages
    Good Essays
  • Satisfactory Essays

    the similarity of medical reports is evaluated by calculating the semantic characteristics and syntactic similarity. It relies on an upgraded radiology-specific ontology to measure semantic similarity relationships between unstructured mammographic report concepts. While [7] improved the vector cosine similarity algorithm model which uses (is-a) relationships to measure the degree of similarity. For a fixed concept, after examining all the possible paths they arrived at the conclusion that the shortest similarity vector would be selected for each document then the cosine angle of each vector is calculated to determine the degree of similarity. testing has been done by comparing multiple clinical context reports using anatomy and imaging procedures…

    • 117 Words
    • 1 Page
    Satisfactory Essays
  • Better Essays

    Non-hierarchical cluster analysis (often known as K-means Clustering Method) forms a grouping of a set of units, into a pre-determined number of groups, using an iterative algorithm that optimizes a chosen criterion. Starting from an initial classification, units are transferred from one group to another or swapped with units from other groups, until no further improvement can be made to the criterion value. There is no guarantee that the solution thus obtained will be globally optimal - by starting from a different initial classification it is sometimes possible to obtain a better classification. However, starting from a good initial classification much increases the chances of producing an optimal or near-optimal solution.…

    • 2267 Words
    • 10 Pages
    Better Essays
  • Powerful Essays

    Topic maps are a new ISO standard for describing knowledge structures and associating them with information resources. As such they constitute an enabling technology for knowledge management. Dubbed "the GPS of the information universe", topic maps are also destined to provide powerful new ways of navigating large and interconnected corpora.…

    • 1640 Words
    • 7 Pages
    Powerful Essays
  • Powerful Essays

    Ghjkjh

    • 8647 Words
    • 35 Pages

    References: [1] S. Abney. Partial parsing via finite-state cascades. In Workshop on Robust Parsing, 8th European Summer School in Logic, Language and Information, 1996. [2] R. Agrawal, S. Rajagopalan, R. Srikant, and Y. Xu. Mining newsgroups using networks arising from social behavior. In Proceedings of the Twelfth International World Wide Web Conference (WWW2003), 2003. [3] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487–499. Morgan Kaufmann, 12–15 1994. [4] R. Baumgartner, S. Flesca, and G. Gottlob. Declarative information extraction, Web crawling, and recursive wrapping with Lixto. Lecture Notes in Computer Science, 2173, 2001. [5] K. D. Bollacker, S. Lawrence, and C. L. Giles. CiteSeer: An autonomous web agent for automatic retrieval and identification of interesting publications. In Agents ’98, pages 116–123, 1998. [6] H. Chen, J. Hu, and R. W. Sproat. Integrating geometric and linguistic analysis for e-mail signature block parsing. ACM Transactions on Information Systems, 17(4):343–366, 1999. [7] W. W. Cohen. Data integration using similarity joins and a word-based information representation language. ACM Transactions on Information Systems, 18(3):288—321, 2000. [8] W. W. Cohen, L. S. Jensen, and M. Hurst. A flexible learning system for wrapping tables and lists in HTML documents. In Proceedings of The Eleventh International World Wide Web Conference (WWW-2002), Honolulu, Hawaii, 2002. [9] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence, 118(1–2):69–113, 2000. [10] N. Glance and W. Cohen. BoardViewer: Meta-search and community mapping over message boards. Intelliseek Technical Report, 2003.…

    • 8647 Words
    • 35 Pages
    Powerful Essays
  • Powerful Essays

    The scope of this paper is to provide an introduction to cluster analysis; by giving a general background for…

    • 10565 Words
    • 43 Pages
    Powerful Essays
  • Powerful Essays

    References: [1]GWeijie Su, Xin Jin, “Hidden Markov Model with Parameter-Optimized K-means Clustering for Handwriting Recognition”, International Conference on Internet Computing and Information Services, pp:435-438, 2011…

    • 2858 Words
    • 12 Pages
    Powerful Essays
  • Powerful Essays

    4. Chen Y, Rege M, Dong M, Fotouhi F (2007) Deriving semantics for image clustering from accumulated user feedbacks. In: 15th ACM Int. Conf. on Multimedia, Augsburg, Germany, pp. 313–316…

    • 9915 Words
    • 40 Pages
    Powerful Essays
  • Powerful Essays

    H.3.3 [Information Search and Retrieval]: Retrieval models. J.4 [Social and Behavioral Sciences]: Sociology. Algorithms, Experimentation. Mobile Phone Data, Semantic Label, Trajectory Data Analysis.…

    • 1498 Words
    • 6 Pages
    Powerful Essays
  • Powerful Essays

    [6] Larsen, Jan. Lars Hansen, Kai. Szymkowiak Have, Anna. Christiansen,Torben. Kolenda, Thomas. "Webmining learning from the World Wide Web". Computational Statistics & Data Analysis. 38. 2002. pp 517–532.…

    • 3132 Words
    • 13 Pages
    Powerful Essays