Jan. 2013
What is big data?
“Big data is not a precise term; rather it's a characterization of the never ending accumulation of all kinds of data, most of it unstructured. It describes data sets that are growing exponentially and that are too large, too raw or too unstructured for analysis using relational database techniques. Whether terabytes or petabytes, the precise amount is less the issue than where the data ends up and how it is used.”------Cite from EMC’s report “Big data: Big opportunity to create business value”.
When explosion happened in mobile network, cloud computing and internet technology, more and more different information appeared. In the past, the numerous terabyte data could be a disaster for any company, because it means high cost of storage and high performance CPU. However, in nowadays, companies discovered many facts they haven’t thought about these data before. Companies started to use data analytics technology to find business values from these terabyte or petabyte data. It seems to be a big opportunity instead of disaster for companies now.
Data is not only defined as structured data. When we talking about big data, it could be categorized into three types of data: structured data, unstructured data, and semistructured data (Please see Chart I). Especially when internet and mobile internet developed rapidly, the unstructured data and semistructured data exploded. For example, a bank could draw a conclusion by analyze unstructured data to find out why number of churn increased.
Most definitions of big data all talk about the size of data. However, size, or volume, is not the only characteristic of big data. There are other two characteristics, variety and velocity. Variety means big data generates from several of sources. Data type was no longer connected to structured data. According to the EMC’s report, most of big data related to unstructured data. Velocity means the speed of data production. Data