Originally published March 8, 2010 at Dataspora.
In this blog post I’ll attempt to sketch the forces behind what I’m calling, somewhat sensationally, the Data Singularity, and then (in a following post) discuss what I see as its consequences.
In a nutshell, the Data Singularity is this: humans are being spliced out of the data-driven processes around us, and frequently we aren’t even at the terminal node of action. International cargo shipments, high-frequency stock trades, and genetic diagnoses are all made without us.
Absent humans, these data and decision loops have far less friction; they become constrained only by the costs of bandwidth, computation, and storage– all of which are dropping exponentially.
The result is an explosion of data thrown off from these machine-mediated pipelines, along with data about those flows (and data about that data, and so on). The machines all around us — our smart phones, smart cars, and fee-happy bank accounts — are talking, and increasingly we’re being left out of the conversation.
So whether or not the Singularity is Near, the Data Singularity is here, and its consequences are being felt.
But before I discuss these consequences, I’d like to expand on the premise. The world wasn’t always drowning in this data deluge, so how did we get here?
I. Data at the Speed of Speech
For most of human history, information traveled no faster than the sound of the human voice. The origin of human language was the original singularity: it marked the birth of a non-biological information channel, distinct from our DNA.
But despite this achievement , the production of information — whether farmers’ almanacs or merchants’ ledgers — was still constrained the by costs of ink and parchment and the write-speed of the human hand.
All 70,000 volumes of the Library of Alexandria, the collected body of human knowledge in antiquity, could fit on two thumb drives today.
Thus the transmission and production of data, when it was done at all, was painstaking in form, small in scale, and occurred between people.
People --> People
II. Data at the Speed of Light
With the telegraph, for the first time, data flowed at the speed of light.
In the late 18th century, the first substantive telegraph line connected Paris to a suburb 210 kilometers to its north, using optical semaphores rather than electrical currents to communicate. Yet while data hopped between stations at light speed, it had to be routed by human operators at each station.
Centuries earlier, the printing press dramatically reduced the production costs of information. Still, human authors transmitted their hand drafted manuscripts to type setters, who set type with fonts optimally designed for human eyes.
III. Programmable Looms and Reading Machines
Punch cards represented the movement of data away from human-readable, anthropocentric substrates, onto a medium designed principally for consumption by machines.
Punch cards were developed in the early 18th century to control industrial looms , in France.
Now, machines were the final terminus of data transmission. This act of communicating with our machines, programming them, was at the heart of Charles Babbage’s Analytical Engine, which came more than a century later.
People --> Machines
IV. Phonographs and Recording Machines
Developing on the other side of the communication spectrum were machines that excelled at writing and storing data.
The modern rotating disk drive feels less inspired by punch cards, but by Thomas Edison’s cylinder machines, better known as phonographs.
The human voice was a natural data format, and if early pioneers had a vision for the modern human-machine interface, I imagine it would have been to program machines by voice. It’s a vision that still eludes us.
By the middle of the 20th century, a slew of semiconductor technologies emerged to close the loop of data generation: we had machines that produced digital data, and machines that continuously consumed it, without human intervention.
Machines --> Machines
These technologies also sparked the beginning of a less-celebrated, but equally important exponential curve: the falling cost of data storage.
V. Listening to the Pulse of the Planet
The exponential drop in data storage costs has meant that logging historical data about a process, or billions of processes, is economically feasible.
I conjecture that the largest share of data on the planet sits in log files; these are the EKGs of the server farms that manage our cell phones, our e-mail accounts, and every other facet of our online existence — and which consume 3% of the US energy budget .
Ubiquitous networking and cheap bandwidth has meant these pools of storage are no longer isolated on individual sensors, phones, or servers, but form the tributaries feeding an ocean of data in the Cloud.
And yet, funneling these massive volumes of data creates enormous technological pressures, against which companies struggle. So why keep the data?
Because inside these log files, amidst the myriad conversations recorded between machines, lies the pulse of their customers.
Collectively, these logs reveal the pulse of the planet — flight delays, package shipments, job losses, and human sentiments.
And as I’ll discuss in my next post, those who can extract a meaningful signal from this thunderous cacophony — the analysts, statisticians, and data scientists — are uniquely positioned to change the world.