In the last year, the data scientist has been called “the sexiest job of the 21st century.” But if data is the new oil, and data scientists are its petrochemical high priests, who are the oil riggers? Who are the roughnecks doing the dirty work to get data pipelines flowing, unpacking bytes, transforming formats, loading databases?
They are the data engineers, and their brawny skills are more critical than ever. As the era of Big Data pivots from research to development, from theoretical blueprints to concrete infrastructure, the notional demand for data science is being dwarfed by the true need for data engineering.
A stark but recurring reality in the business world is this: when it comes to working with data, statistics and mathematics are rarely the rate-limiting elements in moving the needle of value. Most firms’ unwashed masses of data sit far lower on Maslow’s hierarchy at the level of basic nurture and shelter. What is needed for this data isn’t philosophy, religion, or science — what’s needed is basic, scalable infrastructure.
It’s the data engineers who can build this infrastructure, and they represent the true talent shortage of Silicon Valley and beyond. Their unsexy but critical skills include crafting Hadoop pipelines, programming of job schedulers, and parsing broad classes of data — timestamps, currencies, lat & long coordinates — which are the screws, bolts, and ball bearings in the industrial age of data.
Let us now praise these unsung heroes, the data engineers, who are building the invisible but essential digital underground.
“On a scale of 1-10 of impatience, the best entrepreneurs are an 11.” - Tom Stemberg, Founder of Staples
Curiosity and impatience make for great founder traits, but they often pull in different directions.
Curiosity compels you to sit and study a problem, to voraciously consume every article and reference you can find to wrap your head around a big idea or an imagined future (self-driving cars, space elevators, or self-destructing sexts).
Impatience gets you up out of your chair to do something about it: hire, fundraise, sell, and evangelize.
Curiosity is for academics, impatience for executives, but start-up founders need to be both dreamers and doers, straddling the world of ideas and realities.
(Image credit: A.Koblin for RadioHead)
This is a phrase that has stuck with me since Tim O’Reilly uttered some form of it two years ago. Tim was talking about online cartography, saying it’s not the maps that matter: it’s getting to our destination. Maps are a half-step short of that goal. And in a world of navigational algorithms and self-driving cars, maps become less useful as tools.
Likewise, data visualization is a halfway house: a stopping place on the path from data to decision.
“ You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.”
As we dedicate an increasing fraction of our time interacting with software — from airport check-in terminals and parking meters, to desktop and mobile applications — digital interface design is becoming as important as physical architecture in improving our experience of the world.
Here are Professor Ben Schneiderman’s Eight Golden rules for optimally designing that experience (drawn from his classic text, Designing the User Interface):
Silicon Valley’s first big bang of innovation occurred in 1957, when eight engineers left Shockley Transistor to form FairChild Semiconductor. Back then, the idea of engineers being entrusted as founders of a business was heretical. Forty-one firms were asked to invest, but “none of them were interested”, according to Arthur Rock.
The idea that engineers without MBAs can be successful founders has changed, but what about engineers acting as investors? In my experience, the majority of investment professionals on Sand Hill road are still non-technical.
But that is changing, in two ways.
(L to R: Mike Driscoll, Drew Conway, DJ Patil, Amy Heineike, Pete Skomoroch, Pete Warden, Toby Segaran. Credit: O’Reilly - Link to Video)
This past Tuesday evening at Strata I moderated an Oxford-Style debate between six of the top data scientists in Silicon Valley and beyond. The motion debated was:
“In data science, domain expertise is more important than machine learning skill.”
Last Saturday, I woke up and walked down to my favorite coffee shop in San Francisco, SightGlass coffee in SoMa.
I met up with a couple of entrepreneurs pitching an amazing idea, and while ordering some mind-buzzingly-good drip coffee, ran into a mentor of mine.
I write this because, while these interactions could have happened in the suburbs of Silicon Valley — whether the Coupa Cafe in Palo Alto or Red Rock in Mountain View — they are quintessentially enabled by four qualities of a city like San Francisco:
“If I were starting a NoSQL-in-the-enterprise startup, I would focus on ETL. ETL is a mess, and is a precursor for any fancy uses of data.” - @jaykreps
“@jaykreps ETL is the coal mining of the information age: dirty, important work that fuels the economy.” - @peteskomoroch
One of the largest obstacles facing companies who seek to derive value from data isn’t data’s size. It’s data’s dirtiness.
It’s been said before: 80% of the effort that goes into a data science project is extracting, transforming, and loading (ETL’ing) data into a system where it can be analyzed.