Friday, March 7, 2008

Just How Much Data Is There?

About a year ago, an interesting research project concerning the “size of the digital universe”, that is, how much data is there out there – really, was released by EMC. “The Expanding Digital Universe, A Forecast of World Wide Information Growth through 2010” (2007, March) [link], is one of those articles that should be required reading for all CIS-minded individuals. The bottom line of the research is the prediction that “Between 2006 and 2010, the information added annually to the digital universe will increase more than six fold from 161 exabytes to 988 exabytes” (my emphasis).

(Very likely, this will be one of those kinds of blogs at which I will look back at and laugh. However, as this is still only 2008, it may be beneficial to exercise our minds around the EMC prediction.)

The operative word in the above citation is the word “added”. So it is not that world-wide there will be 988 exabytes (EB); it’s that by 2010, 988 exabytes will be added to what is already there. The obvious conclusion is that in terms of data storage, there must be far more that that even now. Further 1,000 exabytes will carry the moniker “zettabyte” [Wikipedia] (ZB). Thus, the next question must be “what exactly is a zettabyte”? Herein lays the problem. The minute we start speaking in astronomical terms, eyes glaze over. The challenge is not whether we can conjure up larger and larger numbers or names for numbers, rather the challenge is if can we understand what those numbers mean.

Zettabyte?



In our entry-level computer classes, we will often describe byte quantities as 1 kB is the equivalent of a single typed page of text, therefore 1MB is the equivalent of a large book (minus the images, of course), 1GB is the equivalent of 1,000 books or a large pickup truck filled with books. Most students have or have experience with devices such as the iPhone with 16GB of storage which could contain hours of video, images, music, games, and more. Hence, the gigabyte is a concept around which we can now get our minds.

Continuing, since a terabyte (TB) is 1,000 GB, using our analogy 1TB is the equivalent of 1,000 pickup trucks filled with books; perhaps the flight-deck of an aircraft carrier would fit the bill nicely. The petabyte (PB), being 1,000TB, would then be 1,000 aircraft carriers covered with 1,000 pickup trucks loaded with 1,000 books each of 1,000 pages of text. Know-center has a fun summary of how to get our minds around what is 988 exabytes [link]; however, while a ZB may indeed be a pile of books stacked all the way to Pluto, I must confess that I do not know how far it is to Pluto so the analogy is lost on me.

So What?



Since the numbers of books or trucks or aircraft carriers is really irrelevant, the angle of approach ought not be “how many” but rather “what can I do with it?” According to the EMC research, “over 95% of the digital universe is ‘unstructured data’ – meaning its content cannot be truly represented by its location in the computer record, such as name, address, or date of last transaction” [p13]. What this means is that while we may create 1ZB of data in 2010, only 5EB of it is actually locatable. Think of it like this: supposing you worked and earned a dollar and then got paid only five cents, the rest being lost to eternity. Some would look at this and be discouraged.

I look at this and see opportunity.

If there is one area in IT where there is desperate need of quality individuals, it is in data base management. There is so much data floating around and relatively zero management, most companies and countries understand that it is not the data but the data management. So the need is definitely there. The problem is, there is not the supply.

When we look at computer education across the nation, while Computer Science (CS) programs are still taking the hit, Computer Information Systems (CIS) programs are staging a comeback. Debra Pearlman wrote two excellent articles for eWeek on this matter (for information on CS enrollments, see “CS Degree Interest Plummeted Since 2000” [eWeek, 2008, March 4], but make sure to remove all sharp objects from the room; for information on CIS career opportunity, see “Tech Job Sector Growing at Record Paces Through 2016” [eWeek, 2007, December 6]).

So, how many zettabytes will be created by 2016? Perhaps by then we will have an iPod that can contain enough music videos to keep one entertained until the sun implodes. Perhaps we will be able to pick up a 1 yottabyte flash drive at Office Depot. Or not. But the reality is that whatever that number will be, the number that needs to be impacted is the 95%.

That is the opportunity.