07004272

07004272

Powerful Essays

07004272

2014 IEEE International Conference on Big Data

Random Walks on Adjacency Graphs for Mining
Lexical Relations from Big Text Data
Shan Jiang
Department of Computer Science
University of Illinois at Urbana-Champaign
Urbana, IL, 61801 USA sjiang18@illinois.edu ChengXiang Zhai
Department of Computer Science
University of Illinois at Urbana-Champaign
Urbana, IL, 61801 USA czhai@cs.uiuc.edu Abstract—Lexical relations, or semantic relations of words, are useful knowledge fundamental to all applications since they help to capture inherent semantic variations of vocabulary in human languages. Discovering such knowledge in a robust way from arbitrary text data is a significant challenge in big text data mining. In this paper, we propose a novel general probabilistic approach based on random walks on word adjacency graphs to systematically mine two fundamental and complementary lexical relations, i.e., paradigmatic and syntagmatic relations between words from arbitrary text data. We show that representing text data as an adjacency graph opens up many opportunities to define interesting random walks for mining lexical relation patterns, and propose specific random walk algorithms for mining paradigmatic and syntagmatic relations. Evaluation results on multiple corpora show that the proposed random walkbased algorithms can discover meaningful paradigmatic and syntagmatic relations of words from text data.

I. I NTRODUCTION
The dramatic growth of text data creates great opportunities for applying computational methods to mine “big text data” to discover all kinds of useful knowledge and support many data analytics applications. Unfortunately, text data are unstructured, and effective discovery of knowledge from text data requires the computer to understand natural languages, which is known to be an extremely difficult task. In this paper, we study how to mine two fundamental and complementary types of interesting semantic relations between words from arbitrary text data in a

You May Also Find These Documents Helpful

Nt1310 Unit 3 Study Essay

Nt1310 Unit 3 Study Essay

345263562

345263562

Isds Ch 5

Isds Ch 5

Shivanand R Koppalkar BIAM 530 Week 7 Lowes BI and Data Mining Assignment

Shivanand R Koppalkar BIAM 530 Week 7 Lowes BI and Data Mining Assignment

Lexical Decision Task Essay Example

Lexical Decision Task Essay Example

ChildLine Activity Cards

ChildLine Activity Cards

Access to Health Care

Access to Health Care

Integer Programming Problem Formulation

Integer Programming Problem Formulation

3035045281

3035045281

0715CD042

0715CD042

Tasks of contrastive lexicology

Tasks of contrastive lexicology

Borrowings In Modern English

Borrowings In Modern English

Network Architecture

Network Architecture

A Sad Day for Verona

A Sad Day for Verona

Semantic relations

Semantic relations

Related Topics