Apache Druid Turns 10: The Untold Origin Story

“We did something crazy: we rolled our own database.” – Eric Tschetter, creator of Druid

Ten years ago today, the Druid data store was introduced to the world by Eric Tschetter, its creator, working at a small start-up named Metamarkets.  Eric had left LinkedIn six months earlier to join us as the first full-time employee, and I was the CTO and co-founder, working in a shoebox office[1] off South Park in San Francisco.  In his blog post, Introducing Druid: Real-Time Analytics at a Billion Rows Per Second, he shared the rationale for Druid’s creation:

“Here at Metamarkets we have developed a web-based analytics console that supports drill-downs and roll-ups of high dimensional data sets – comprising billions of events – in real-time. This [post introduces] Druid, the data store that powers our console. Over the last twelve months, we tried and failed to achieve scale and speed with relational databases (Greenplum, InfoBright, MySQL) and NoSQL offerings (HBase). So instead we did something crazy: we rolled our own database. Druid is the distributed, in-memory OLAP data store that resulted.”

The initial responses from HackerNews were predictably skeptical:

Continue reading “Apache Druid Turns 10: The Untold Origin Story”

Operational intelligence and the new frontier of data

Always-on businesses such as global retailers, social media apps, transportation platforms, and financial marketplaces have mission-critical use cases that require real-time decisions on operational event streams.

Target’s supply chains must adapt to changes in store inventory, Snap’s new app launches must be debugged, Lyft’s drivers must be predictively routed to riders, and at Paypal, fraudulent payments must be flagged and blocked.

These use cases for logistics, app monitoring, and fraud detection aren’t science fiction, they’re real-world examples powered by an emerging technology stack that combines event stream processing, fast OLAP databases, interactive dashboards and machine-learning applications.  Fueled by real-time data coming from instrumented products and services, this technology stack is driving a distinct category of analytics called operational intelligence, which is complementary to traditional business intelligence.

Continue reading “Operational intelligence and the new frontier of data”

Savor the surprises

Alexander Fleming, Photo Credit Historic UK

I was recently listening to Mike Maples interview Andy Rachleff about the search for product-market fit, which is fitting, since Andy coined the term. Andy recounted a pearl of wisdom that Scott Cook had once given him: when doing customer research, savor the surprises. A few years into founding Intuit, Scott uncovered that while QuickBooks was created for personal finance, half their users were businesses. Why? Because most small businesses lacked formal accounting expertise, they preferred simple software.

Savoring surprises is a simple but powerful framing, because it forces reflection. It’s a great question for a job interview (“When you first got to Google, what surprised you?”) or at a cocktail hour (“What surprised you most about Tokyo?”). Who would want to hire or hang out with someone who answers “Nothing”?

Surprises are the bits of data we don’t expect. It is cognitively taxing to retain these bits, rather than burying them to confirm what we think we already know. Savoring surprise is at the heart of the beginner’s mindset. And it is the essence of learning and discovery.

In September 1928, a scientist returned from a two-week vacation and found a mold had contaminated his bacterial culture, and unexpectedly, killed the bacteria around it. Alexander Fleming savored this surprise, rather than ignore it, and it ultimately led to his discovery of penicillin. As he later put it “One sometimes finds what one is not looking for.”

Geoffrey Moore on disrupting business with data intelligence

[A]ll the great business disrupters of the past decade [–] Amazon, Google, Microsoft, Apple, Tesla, Uber, Airbnb, Netflix—they are all running Systems of Observation against the data flows they are privileged to access or host, and then feeding them into Systems of Intelligence to extract insights from them.

Geoffrey MooreIntelligent Computing Systems: How will Enterprise Architecture Evolve?

Colin Ware on cognition

Thinking is not something that goes on entirely, or even mostly, inside people’s heads. Little intellectual work is accomplished with our eyes and ears closed. Most cognition is done as a kind of interaction with cognitive tools, pencils and paper, calculators, and, increasingly, computer-based intellectual supports and information systems. Neither is cognition mostly accomplished alone with a computer. It occurs as a process in systems containing many people and many cognitive tools. Since the beginning of science, diagrams, mathematical notations, and writing have been essential tools of the scientist. Now we have powerful interactive analytic tools, such as MATLAB, Maple, Mathematica, and S-PLUS, together with databases. The entire fields of genomics and proteomics are built on computer storage and analytic tools.

Colin Ware. Information Visualization: Perception for Design.

design principles for data pipelines


(Image: ‘Tower of Babel’ by Pieter The Elder Bruegel, 1563)

Underinvestment in and misunderstanding of ETL is a silent killer in organizations.  It’s why reports are often delayed, why answers across systems rarely agree, and why more than 50% of corporate business intelligence initiatives fail.

ETL is hard because data is messy.  Even the most common attribute of data, time, has thousands of accepted dialects: “Sat Mar 1 10:12:53 PST,” “2014-03-01 18:12:53 +00:00” and “1393697578” are all equivalent.  And there’s a growing chorus of other sources with even less consistency:  geo-coordinates, user agent strings, country codes, and currencies. Each new data type is a layer of bricks in our collective, digital tower of Babel.

Continue reading “design principles for data pipelines”

let us now praise data engineers


In the last year, the data scientist has been called “the sexiest job of the 21st century.”  But if data is the new oil, and data scientists are its petrochemical high priests, who are the oil riggers?  Who are the roughnecks doing the dirty work to get data pipelines flowing, unpacking bytes, transforming formats, loading databases?

They are the data engineers, and their brawny skills are more critical than ever.  As the era of Big Data pivots from research to development, from theoretical blueprints to concrete infrastructure, the notional demand for data science is being dwarfed by the true need for data engineering.

A stark but recurring reality in the business world is this: when it comes to working with data, statistics and mathematics are rarely the rate-limiting elements in moving the needle of value.  Most firms’ unwashed masses of data sit far lower on Maslow’s hierarchy at the level of basic nurture and shelter.  What is needed for this data isn’t philosophy, religion, or science – what’s needed is basic, scalable infrastructure.

It’s the data engineers who can build this infrastructure, and they represent the true talent shortage of Silicon Valley and beyond.  Their unsexy but critical skills include crafting Hadoop pipelines, programming of job schedulers, and parsing broad classes of data – timestamps, currencies, lat & long coordinates – which are the screws, bolts, and ball bearings in the industrial age of data.

Let us now praise these unsung heroes, the data engineers, who are building the invisible but essential digital underground.

the psychology of the enterprise buyer


Consumer startups like Facebook, Twitter, Pinterest, and even DropBox are built by founders who wanted to “make something cool” for their own benefit. Their teams intuitively understand what works because they are their own target audience: young, tech-savvy people looking for better ways to connect, share, and organize their digital stuff.

When it comes to buyer psychology, corporations are not people

By contrast, the challenge for enterprise startups, is that corporations are not really people (their legal personhood aside) — and certainly not our people.

When you’re hungry for lunch, you go and buy a sandwich for a few dollars. When an enterprise is hungry for lunch, it solicits bids from multiple catering companies, negotiates for weeks to months, and signs a contract for a few million dollars.

This gap between the psychology of enterprises and the startups that sell to them is a challenge that consumer startups do not face. Worse, early team members in startups have limited enterprise experience; they are a poor fit to the process-orientation and risk-aversion (or to put it more kindly, risk-balancing) that is rewarded at the higher levels of corporate environments.

Less Goldilocks, more Dunder-Mifflin

Lacking this enterprise DNA, younger startups often build their sales processes in the image of how startups buy rather than how enterprises buy. When startups seek to purchase a software solution, they favor simple, scalable pricing: click a box, swipe a credit card, and start running. Hence the canonical three-column SaaS pricing page (call it Goldilocks pricing) that you see at many SaaS companies—where the middle column invariably feels “just right.”

But large enterprise buyers are less adventure-embracing Goldilocks, and more The Office’s Dunder Mifflin. They require more than three sizes of self-serve, they don’t do click-through contracts, and they rarely pay with credit cards. The reasons are both economic and cultural. Economically, as buying decisions grow larger, the cost of sales — product customization, negotiated contracts, and invoicing — become marginally small. Culturally, Fortune 500 companies expect to have a relationship.

As Box CEO Aaron Levie recently told me, “Look, when Coca Cola writes you a big check, they want to meet you in person.”

Silicon Valley IT is not enterprise IT

Startups also often underestimate the importance of professional services and training for enterprises. They believe every company has a cadre of engineers smart enough to set up and tailor an application accordingly, and business users who can quickly figure it out — whether it be Google Analytics, Hubspot, or Expensify — and get up and running.

But this is not the case in most enterprises. The success of firms like RedHat, MySQL AB, and more recently, Cloudera, testify to the enormous value lies in integration and support, even when that software – whether Linux, MySQL, or Hadoop – is free and open-source.

Seasoned sales executives: The “growth hackers” of enterprise startups

As the venture investing pendulum swings back towards enterprise technology companies, founders and venture capitalists will need to augment their teams with sales executives who can nimbly step around the often woolly, sometimes mammoth challenges of contract negotiations, channel partnerships, and client services engagements. These experienced leaders will be the “growth hackers” of the enterprise realm.

This essay originally appeared in VentureBeat. 

the fuel of founders: curiosity & impatience

“On a scale of 1-10 of impatience, the best entrepreneurs are an 11.” – Tom Stemberg, Founder of Staples

Curiosity and impatience make for great founder traits, but they often pull in different directions.

Curiosity compels you to sit and study a problem, to voraciously consume every article and reference you can find to wrap your head around a big idea or an imagined future (self-driving cars, space elevators, or self-destructing sexts).

Impatience gets you up out of your chair to do something about it: hire, fundraise, sell, and evangelize.

Curiosity is for academics, impatience for executives, but start-up founders need to be both dreamers and doers, straddling the world of ideas and realities.

Robert Oppenheimer, the American Prometheus behind the first atomic bomb, was a dreamer – but he was also impatient.  His colleague Murray Gellman said he lacked the ability to sit still:    

“Germans call it ‘Sitzfleisch’, ‘sitting flesh’ when you sit on a chair.  As far as I know, he never wrote a long paper or did a long calculation, anything of that kind.  He didn’t have the patience for that… [But] he inspired other people to do things, and his influence was fantastic." 

Impatience is the very opposite of Sitzfleisch, and without it, the Manhattan Project would have yielded nothing more than chalk dust.

Curiosity is what drew Steve Jobs to sit in on calligraphy classes at Reed; inspired Larry Ellison to study chip design at U. Chicago; compelled Bill Gates to cram for economics courses at Harvard; lured Larry and Sergey to pursue computer science Ph.D.s at Stanford.  

Impatience is what drove them all to drop out and start Apple, Oracle, Microsoft, and Google.

Silicon Valley’s cult of the drop-out pays homage to impatience – who has time for school when you’re building a billion-dollar business? – but gives short shrift to curiosity which is the heart of innovation.

Nothing fires a healthy impatience more than the desire to see a big idea, born of deep curiosity, brought to life.  As Steve Jobs said, "remembering that you are going to die” is a great motivator.