Home » Technology » Storage » Does big data herald the end of the map paradigm?
I first came across the term Big Data in 2011 when a McKinsey report (Big data: The next frontier for innovation, competition and productivity) landed on my desk. If McKinsey was interested in this phenomenon then it was worth further study. Later texts provided the three Vs framework for big data, where the three Vs are Volume, Velocity and Variety (further definition can be found in Big Data Analytics, (Russom, 2011) TDWI Research).
Big data has been here before of course; remember when we outran the 80 column cards for recording digital information, which were themselves a memory extension? Not all the social networking, crowd-sourced information is relevant to a business, but what we need is the ability to effectively leverage this data when making organisational decisions.
There are two parameters of big data that are important in making sense of all this data. The first is time and the second is location. This blog is about location. So let’s define the big data explosion in geographic terms. If a square metre on the earth represents 1mb of digital data then in 2010 the entire globe was carpeted in data. By 2020 some 1700 earths will be carpeted (after www.chforum.org).
So if location is important how are we squaring up to the challenge? You can judge. Apart from the obvious glitches with satnavs there are more serious failures in our use of location to make decisions. The attack on the Chinese embassy in Sarajevo (May 1999) was a spectacular example. This was reported in the Washington Post. The CIA had two maps. They had a paper map from 1992 based on a Russian survey. The CIA was trying to pinpoint key Serbian targets to attack. They were looking for the Serbian Ministry of Supply. This was a new building. The Chinese Embassy was located in a different part of the town on the 1992 map. A 1997 paper map had a set of new buildings in a different part of town, but without detailed address information. A set of assumptions was made by human beings and the guided missiles hit the target killing 5 people. It was the wrong building.
With geographic information systems we should be better equipped, but these have largely been created in an effort to replicate the paper map paradigm. Information about buildings, road, rivers, rail and other features is arranged in layers. These layers are internally indexed to produce a real world picture, but this requires extensive processing that is not scalable in analysing big data problems. I am reminded of Box’s 1976 axiom: ‘all models are wrong but some are useful’. Our current paradigm requires human interaction to make sense of it.
When I look at the McKinsey report on the techniques for analysing big data they are based on algorithms and computer science (particularly machine learning). They include natural language engines, pattern recognition, cluster analysis and data fusion. On spatial analysis McKinsey states: ‘a set of techniques, some applied from statistics, which analyse the topological, geometric, or geographic properties encoded in a data set. Often the data for spatial analysis come from geographic information systems (GIS) that capture data including location information, e.g., addresses or latitude/longitude coordinates.’ It is interesting reading McKinsey and realising how far away the GIS community is from machine learning. Too much of what we do is based on manual tending of thematic maps. It is apparent that we have to change our basic map paradigm if location information is going to contribute to making sense of the data avalanche being collected by the crowd sourced social network community.
Additional resources and information relating to this subject provided by Dell are available here