beyond hadoop: fast queries from big data

There’s an unspoken truth lurking behind the scourge of Big Data and the heralding of Hadoop as its savior: While Hadoop shines as a processing platform, it is awkward as a query tool. Hive was developed by the folks at Facebook in 2008, as a means of providing an easy-to-use, SQL-like query language that wouldContinue reading “beyond hadoop: fast queries from big data”

how Oracle, the Goliath of data, could stumble

 This week’s Oracle World was bracketed by two events. First: the unveiling of Oracle Exalytics, a beefy in-memory appliance dedicated to large-scale analytics, during Larry Ellison’s opening keynote. Second: the undressing of Oracle’s cloud computing initiatives by Marc Benioff, SalesForce’s CEO, and the unceremonious cancellation of his keynote on Wednesday morning. Both events highlight thatContinue reading “how Oracle, the Goliath of data, could stumble”

the secret guild of silicon valley

A couple of weeks ago, I was drinking beer in San Francisco with friends when someone quipped: “You have too many hipsters, you won’t scale like that. Hire some fat guys who know C++.”  It’s funny, but it got me thinking.  Who are the “fat guys who know C++”, or as someone else put it, “theContinue reading “the secret guild of silicon valley”

node.js and the javascript age

Three months ago, we decided to tear down the framework we were using for our dashboard, Python’s Django, and rebuild it entirely in server-side Javascript, using node.js. (If there is ever a time in a start-ups life to remodel parts of your infrastructure, it’s early on, when your range of motion is highest.) This decisionContinue reading “node.js and the javascript age”

the rise of the data web

The future of the web is data, not documents. The web has evolved from Tim Berners-Lee’s original vision of “some big, virtual documentation system in the sky”into an vibrant ecosystem of data where documents — and human actors — will play an ever smaller role. As others have noted, we’ve reached a tipping point in history:Continue reading “the rise of the data web”

the seven secrets of successful data scientists

At O’Reilly’s “Making Data Work” seminar earlier this summer, I teamed up with a few other folks (data diva Hilary Mason, R extraordinaire Joe Adler, and visualization guru Ben Fry) to talk about data. What follows is a blog-ified and amended version of that talk, originally entitled “Secrets of Successful Data Scientists.” 1. Choose TheContinue reading “the seven secrets of successful data scientists”