Artist Pablo Picasso once said, “Computers are useless. They can only give you answers.” In the days since this famous quote, computers have slimmed down in size, become more nimble in processing speed, and have become mainstays in both corporate and personal lives. And, unlike Picasso’s day, with the right kind of data analysis, computers are able to provide the questions that users may never have thought to ask.
To understand how those questions become conceptualized in today’s world, we need to go deeper into our need for data. Basically, human beings analyze data in two modes: active and passive. The active mode occurs when we require specific known answers from the data, such as “what is the current GDP of Ireland?”, or “what is the average salary made by all employees working in Finance directly under the CFO?” As the Internet Cloud becomes more prolific, we find ourselves searching against more on-demand active questions such as “Who wrote the song currently playing on the radio?” and “what were the coordinates where I took this last photo currently sitting on my iPhone?” The active mode is the part of analysis that stakeholders and consumers find invaluable, that continues to feed analytic research, and that defines the “useless” computer to which Pablo Picasso was referring.
The “passive” mode of data collection refers to analyzing data for data’s sake, and usually makes sense only when viewed in hindsight or when attempting to form patterns out of the data itself. For example, passive data is the footage from a hallway apartment webcam in New York, or the moment-by-moment sensor feeds about traffic on a particular highway. In these cases, passive data often contains very little importance to humans until something very important occurs, such as the root cause of a Monday morning 3-hour traffic jam or a two-minute robbery within the sight of a webcam. The clusters within passive data collection help us draw conclusions from the patterns that form, allowing us to ask questions we may otherwise have ignored.
Despite its overall usefulness, there is an overt negative associated with passive data collection. Consider the incessant monitoring across the multitude of highways, hallway buildings, weather forecasts, blogs, tweets, photos and other activities within nature and society. After a short while, finding relevant events within a data stream become akin to finding a needle within a haystack; there is just too much noise to analyze. This exponential growth of noise, dubbed the “data deluge”, overwhelms organizations because, with so much information on demand at any one moment, we don’t know where to focus. We don’t know the questions that the data is telling us to ask.
Although overwhelming at first blush, there are at least three proposed methods for alleviating the data onslaught: (1) develop a deduplication and backup strategies for eliminating noisy or irrelevant data, (2) increase the technological storage space available to house the ever-expanding data, and (3) design algorithms and models that mimic the human ability to filter out the patterns within the raw data.
These solutions are gaining traction with the big organizations. IBM, for example, has teams dedicated to developing new technologies for data compression, storage, and retention. More far-reaching is their Smarter Planet Initiative, a long-term strategy to connect the large patterns of global markets, workflows, infrastructures, and natural processes into intelligent systems. As CEO Sam Palmisano stated in his “Welcome to the Decade of Smart” remarks in January 2010:
“Trillions of digital devices, connected through the Internet, are producing a vast ocean of data. And all this information–from the flow of markets to the pulse of societies–can be turned into knowledge because we now have the computational power and advanced analytics to make sense of it. With this knowledge we can reduce costs, cut waste, and improve the efficiency, productivity and quality of everything from companies to cities.”
This attempt to reduce the passive data stream—to have intelligent systems pose the questions the data is displaying—will take effort and time and will impact the society across technology, business, and geopolitical lines. But organizing the data into a meaningful pattern is a worthy ambition that rises above these inevitable roadblocks and has the potential to improve mankind beyond our current imagination.
I like this idea of a world that can take its wealth of passive data, filter out the noise, and then discover the meaningful questions behind it. Pablo Picasso, who is also quoted as saying “Art is the elimination of the unnecessary”, would probably agree.