Presented by:
Manish Sood, Founder & CEO, Reltio, Inc. manish@reltio.com October, 2012
Image: "Data Deluge," Brett Ryder, The Economist, Feb. 2010
Agenda 1. What is Big Data? 2. What is NoSQL vs. Relational DBs? 3. What is Hadoop (HDFS and MapReduce)? 4. MDM and Big Data – a Case Study
Confidential and Proprietary – please do not distribute without prior permission
2
Trend – Growing data sets
DATA VOLUME
Zettabyte
1.4 Zettabytes in Enterprise Data
2011
Machine To Machine
Exabyte
Petabyte
Interactions
Terabyte
Transactions
Mainframe PC Internet Mobile Machine
Time
Zettabyte = 1,000,000,000,000,000,000,000 Bytes Graph based on IDC and UC Berkeley Data Growth Estimates, Source: IDC & CosmoBC.com: http://techblog.cosmobc.com/2011/08/26/data‐storage‐ infographic/
Confidential and Proprietary – please do not distribute without prior permission
3
Trend – Information Connectivity
Information Connectivity
Internet of Things
Semantic Web Tagging Social Networks Text Files RDBMS Hypertext Blogs RDF Folksonomies User generated content
Web 1.0
Web 2.0
Web 3.0
1990
2000
2010
2020
Confidential and Proprietary – please do not distribute without prior permission
4
Trend – Data Complexity
Text files and Lists Majority of Webpages
Relational Databases
Performance
Social Networks
Internet of Things
Custom work
Data Complexity
Confidential and Proprietary – please do not distribute without prior permission 5
Characteristics of Big Data Velocity
Volume Variety Value
$
10’s of Billions of Daily Records From Terabytes to Petabytes Multi‐ Structured Business Insights
Big data is where the data volume, acquisition velocity, or data representation limits the ability to perform effective analysis using traditional relational approaches or requires the use of significant
Links: Inderpal Bhandari, VP & Chief Data Officer, Express Scripts October, 2012