Discuss.
Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. We assume that, as technology advances over time, the size of datasets that qualify as big data will also increase.
Imagine all the information you alone generate each time you swipe your credit card, post to social media, drive your car, leave a voicemail, or visit a doctor. Now try to imagine your data combined with the data of all humans, corporations, and organizations in the world! From healthcare to social media, from business to the auto industry, humans are now creating more data than ever before.
To help us talk about “big data,” IBM data scientists break it down into four dimensions: volume, velocity, variety, and veracity.. Volume: Scale of Data
Big data is big. It’s estimated that 2.5 quintillion bytes (2.3 trillion gigabytes) of data are created every day. By 2020, we are expected to create 40 zettabytes (that’s 43 trillion gigabytes) of information, an increase of 300 times the amount of data in existence in 2005. Why are we producing so much data? For starters, 6 of the world’s 7 billion people now have cell phones. As infrastructure becomes increasingly available and affordable, cell phone use such as text messaging is bound to increase exponentially.
The amount of information being collected is so huge that modern database management tools are becoming overloaded and therefore obsolete. The need to find new ways of supporting big data helps explain the need for more data scientists. By 2015, the U.S. will see 1.9 million new IT jobs; 4.4 million will be created globally.
Velocity: Analysis of Streaming Data
The sheer velocity at which we are creating data today is a huge cause of big data. The New York Stock Exchange alone captures one terabyte of trade