<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><atom:link rel="hub" href="http://tumblr.superfeedr.com/" xmlns:atom="http://www.w3.org/2005/Atom"/><description>
linkedin
 


twitter
@medriscoll

</description><title>m.e.driscoll: data utopian</title><generator>Tumblr (3.0; @medriscoll)</generator><link>http://medriscoll.com/</link><item><title>eight golden rules of interface design</title><description>&lt;p&gt;&lt;span&gt;As we dedicate an increasing fraction of our time interacting with software &amp;#8212; from airport check-in terminals and parking meters, to desktop and mobile applications &amp;#8212;  digital &lt;/span&gt;&lt;span&gt;interface design is becoming as important as physical architecture in improving our experience of the world.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Here are Professor Ben Schneiderman&amp;#8217;s Eight Golden rules for optimally designing that experience (drawn from his classic text, &lt;em&gt;&lt;a href="http://www.amazon.com/Designing-User-Interface-Ben-Shneiderman/dp/0201694972" target="_blank"&gt;Designing the User Interface&lt;/a&gt;&lt;/em&gt;):&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1 Strive for consistency.&lt;/strong&gt;&lt;br/&gt;&lt;span&gt;Consistent sequences of actions should be required in similar situations; identical terminology should be used in prompts, menus, and help screens; and consistent commands should be employed throughout.&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;&lt;strong&gt;2 Enable frequent users to use shortcuts.&lt;/strong&gt;&lt;br/&gt;&lt;span&gt;As the frequency of use increases, so do the user&amp;#8217;s desires to reduce the number of interactions and to increase the pace of interaction. Abbreviations, function keys, hidden commands, and macro facilities are very helpful to an expert user.&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;&lt;strong&gt;3 Offer informative feedback.&lt;/strong&gt;&lt;br/&gt;&lt;span&gt;For every operator action, there should be some system feedback. For frequent and minor actions, the response can be modest, while for infrequent and major actions, the response should be more substantial.&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;&lt;strong&gt;4 Design dialog to yield closure.&lt;/strong&gt;&lt;br/&gt;&lt;span&gt;Sequences of actions should be organized into groups with a beginning, middle, and end. The informative feedback at the completion of a group of actions gives the operators the satisfaction of accomplishment, a sense of relief, the signal to drop contingency plans and options from their minds, and an indication that the way is clear to prepare for the next group of actions.&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;&lt;strong&gt;5 Offer simple error handling.&lt;/strong&gt;&lt;br/&gt;&lt;span&gt;As much as possible, design the system so the user cannot make a serious error. If an error is made, the system should be able to detect the error and offer simple, comprehensible mechanisms for handling the error.&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;&lt;strong&gt;6 Permit easy reversal of actions.&lt;/strong&gt;&lt;br/&gt;&lt;span&gt;This feature relieves anxiety, since the user knows that errors can be undone; it thus encourages exploration of unfamiliar options. The units of reversibility may be a single action, a data entry, or a complete group of actions.&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;&lt;strong&gt;7 Support internal locus of control.&lt;/strong&gt;&lt;br/&gt;&lt;span&gt;Experienced operators strongly desire the sense that they are in charge of the system and that the system responds to their actions. Design the system to make users the initiators of actions rather than the responders.&lt;/span&gt;&lt;br/&gt;&lt;br/&gt;&lt;strong&gt;8 Reduce short-term memory load.&lt;/strong&gt;&lt;br/&gt;&lt;span&gt;The limitation of human information processing in short-term memory requires that displays be kept simple, multiple page displays be consolidated, window-motion frequency be reduced, and sufficient training time be allotted for codes, mnemonics, and sequences of actions. &lt;/span&gt;&lt;/p&gt;</description><link>http://medriscoll.com/post/23051214037</link><guid>http://medriscoll.com/post/23051214037</guid><pubDate>Mon, 14 May 2012 15:34:00 -0400</pubDate></item><item><title>the rise of the technical VC</title><description>&lt;p&gt;Silicon Valley&amp;#8217;s first &lt;a href="https://twitter.com/#!/medriscoll/statuses/188849469410914304" target="_blank"&gt;big bang of innovation&lt;/a&gt; occurred in 1957, when eight engineers left Shockley Transistor to form FairChild Semiconductor.  Back then, the idea of engineers being entrusted as founders of a business was heretical.  Forty-one firms were asked to invest, but &amp;#8220;none of them were interested&amp;#8221;, according to Arthur Rock.&lt;/p&gt;
&lt;p&gt;The idea that engineers without MBAs can be successful founders has changed, but what about engineers acting as investors?  In my experience, the majority of investment professionals on Sand Hill road are still non-technical.&lt;/p&gt;
&lt;p&gt;But that is changing, in two ways.  &lt;/p&gt;
&lt;p&gt;First, several young prominent venture capitalists who have technical degrees are rising to the top of their profession.  Folks such as Kevin Efrusy (MSEE and BSEE from Stanford) and Jeremy Levine (CS degree from Duke) are ranked #9 and #10, respectively, on this year&amp;#8217;s &lt;a href="http://www.forbes.com/lists/midas/2012/midas-list-top-tech-investors_list.html" target="_blank"&gt;Midas List&lt;/a&gt; of top investors.  And at #1 this year is Jim Breyer, who earned a CS degree from Stanford, and having just turned 50 is still youthful by VC standards.&lt;/p&gt;
&lt;p&gt;Secondly, as technical founders have made their fortunes, many of them have joined the investing class.  Marc Andreessen and Reid Hoffman, two successful technical founders turned investors, were the second and third top investors in 2012.&lt;/p&gt;
&lt;p&gt;And the Midas List doesn&amp;#8217;t cover the funding arena where the influence of technical founders is greatest: angel investing.  Many of the world&amp;#8217;s most successful non-professional investors &amp;#8212; Jeff Bezos, Max Levchin, Andy Bechtolsheim, Paul Graham, Bill Joy, and Marc Benioff &amp;#8212; have, with their spare change and spare time, outperformed entire funds.&lt;/p&gt;
&lt;p&gt;Silicon Valley&amp;#8217;s venture capital community is undergoing the same &amp;#8220;revenge of the nerds&amp;#8221; phenomenon that its businesses underwent in the 1960s and 70s.  Technical founders are launching companies, earning returns, and then spotting new start-ups to invest in &amp;#8212; increasingly without needing surrogates carrying MBAs.&lt;/p&gt;
&lt;p&gt;Or perhaps more accurately, whereas the technical class was previously seen as serving the business class, now it is the business class that serves the technical class.  Mark Zuckerberg&amp;#8217;s having &lt;a href="http://www.businessweek.com/technology/zuckerberg-controlling-57-of-facebook-seen-as-risk-to-investors-02022012.html" target="_blank"&gt;a controlling share&lt;/a&gt; of Facebook is testament to this new reality.&lt;/p&gt;
&lt;p&gt;The rise of the technical VC is part of a larger macro-trend that Marc Andreessen cogently captured in five words: &lt;a href="http://online.wsj.com/article/SB10001424053111903480904576512250915629460.html" target="_blank"&gt;software is eating the world.&lt;/a&gt;  &lt;/p&gt;
&lt;p&gt;One vertical after another &amp;#8212; from media, travel, and (soon I hope) health care and education &amp;#8212; is being transformed by information technology.  Those who conceive, develop, and understand software are the new masters of the universe.  And everyone else &amp;#8212; lawyers, bankers, janitors &amp;#8212; are their servants.&lt;/p&gt;
&lt;p&gt;CEOs and VCs are &lt;a href="http://codeyear.com/" target="_blank"&gt;learning to code&lt;/a&gt; not because their curiosity inspires it, but because their careers depend on it.&lt;/p&gt;</description><link>http://medriscoll.com/post/22488390090</link><guid>http://medriscoll.com/post/22488390090</guid><pubDate>Sat, 05 May 2012 22:12:00 -0400</pubDate></item><item><title>dna dating</title><description>&lt;p&gt;A recent start-up, &lt;a href="http://www.nytimes.com/2012/04/08/technology/in-online-dating-taking-a-chance-on-love-and-algorithms.html" target="_blank"&gt;Yoke.me&lt;/a&gt;, is attempting to build a better dating engine using Big Data and algorithms.  But what mix of data could best be used to algorithmically identify an optimal mate?  Photos, favorite albums, and religious beliefs are a start.&lt;/p&gt;
&lt;p&gt;But how about DNA?&lt;/p&gt;
&lt;p&gt;A couple of years ago at &lt;a href="http://en.wikipedia.org/wiki/Science_Foo_Camp" target="_blank"&gt;SciFoo&lt;/a&gt;, &lt;a href="http://kiwitobes.com/" target="_blank"&gt;Toby Segaran&lt;/a&gt;, &lt;a href="https://twitter.com/#!/@ncbirofl" target="_blank"&gt;Meredith Carpenter&lt;/a&gt;, and I brainstormed about creating a start-up that would do just this.  We dubbed it GeneHarmony.&lt;/p&gt;
&lt;p&gt;Here&amp;#8217;s how it would work: to become a member, you submit a saliva sample to our genomics facility, which sequences all of your genetic quirks (since most of us share DNA which is 99.6% similar, we need only sequence the differences).&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;Once sequenced, your genome would be scanned against all other members, with a focus on genes that are known to be &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/7630893" target="_blank"&gt;predictive of mate compatibility&lt;/a&gt;, and return a rank-ordered list of potential dates.&lt;/p&gt;
&lt;p&gt;The principal of &amp;#8220;opposites attract&amp;#8221; is mirrored at the DNA level. Studies show that individuals who are genetically dissimilar are significantly more likely to marry (the inverse of &amp;#8220;why you shouldn&amp;#8217;t marry your cousin.&amp;#8221;)&lt;/p&gt;
&lt;p&gt;So much of mating is an elaborate system to uncover genetic signals. Many factors which are considered attractive &amp;#8212; facial symmetry, body shape, intelligence, body odor &amp;#8212; are ways in which humans tell suitors &amp;#8220;I have good genes.&amp;#8221;  DNA dating could cut through these perceptual inefficiencies and get right to the genetic point.&lt;/p&gt;
&lt;p&gt;Even better, members&amp;#8217; experiences could be tracked and fed back into the genetic database to create better dating models.  One could even tune the parameters depending on the kind of relationship sought: are you a 22 year-old thrill-seeker looking for fun, or an aging bachelor seeking marriage and stability?&lt;/p&gt;
&lt;p&gt;Of course, the privacy issues raised by such a service are massive. What if the site was used to settle a paternity lawsuit?  Or used to target advertisements?  Facebook&amp;#8217;s privacy issues appear trivial by comparison.&lt;/p&gt;
&lt;p&gt;And yet, for most of us, selecting a partner is the most consequential decision of our lives.  Why shouldn&amp;#8217;t we leverage all of the science and technology we have to improve that choice?&lt;/p&gt;</description><link>http://medriscoll.com/post/20761339349</link><guid>http://medriscoll.com/post/20761339349</guid><pubDate>Mon, 09 Apr 2012 00:07:00 -0400</pubDate></item><item><title>the data science debate: domain expertise or machine learning?</title><description>&lt;p&gt;&lt;p class="MsoPlainText"&gt;&lt;a href="http://vplayer.oreilly.com/?chapter=http://atom.oreilly.com/atom/oreilly/videos/1076046&amp;amp;video_product=urn:x-domain:oreilly.com:product:0636920025467.VIDEO#embedded_player" target="_blank"&gt;&lt;img height="201" src="http://cdn.oreilly.com/radar/images/posts/0312-data-science-debate.jpg" width="580"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;&lt;em&gt;(Photo credit:  O&amp;#8217;Reilly Radar - See &lt;a href="http://vplayer.oreilly.com/?chapter=http://atom.oreilly.com/atom/oreilly/videos/1076046&amp;amp;video_product=urn:x-domain:oreilly.com:product:0636920025467.VIDEO#embedded_player" target="_blank"&gt;Link to Full Video&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;This past Tuesday evening at Strata I moderated an Oxford-Style debate between six of the top data scientists in Silicon Valley and beyond. The motion debated was: &lt;/p&gt;
&lt;p class="MsoPlainText"&gt;&lt;strong&gt;&lt;em&gt;&amp;#8220;In data science, domain expertise is more important than machine learning skill.&amp;#8221;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;&lt;!-- more --&gt;The topic emerged from conversations over dinner the previous night, with Kaggle&amp;#8217;s &lt;a href="http://www.wired.com/wiredenterprise/2011/12/kaggle/" target="_blank"&gt;Jeremy Howard&lt;/a&gt;, LinkedIn&amp;#8217;s &lt;a href="http://www.forbes.com/sites/danwoods/2011/11/27/linkedins-monica-rogati-on-what-is-a-data-scientist/" target="_blank"&gt;Monica Rogati&lt;/a&gt;, and some pre-debate musings of Google&amp;#8217;s Hal Varian.&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;To constrain the question, we added an additional clarification: which of these would you favor more in hiring your company&amp;#8217;s first data scientist?&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;Arguing in favor of the motion (e.g. favoring domain expertise) were: &lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.drewconway.com/zia/" target="_blank"&gt;Drew Conway&lt;/a&gt;, Ph.D. Candidate at NYU, Data Scientist at IA Ventures  &lt;/li&gt;
&lt;li&gt;&lt;a href="http://radar.oreilly.com/djpatil/" target="_blank"&gt;DJ Patil&lt;/a&gt;, Data Scientist in Residence at Greylock Partners  &lt;/li&gt;
&lt;li&gt;&lt;a href="http://thephenomlist.com/lists/8/people/32" target="_blank"&gt;Amy Heineike&lt;/a&gt;, Director of Mathematics at Quid&lt;/li&gt;
&lt;/ul&gt;&lt;p class="MsoPlainText"&gt;Weighing in against the motion (e.g. favoring machine learning skills) were:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://petewarden.typepad.com/" target="_blank"&gt;Pete Warden&lt;/a&gt;, CTO of JetPac&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.linkedin.com/in/peterskomoroch" target="_blank"&gt;Pete Skomoroch&lt;/a&gt;, Principal Data Scientist at LinkedIn  &lt;/li&gt;
&lt;li&gt;&lt;a href="http://blog.kiwitobes.com/" target="_blank"&gt;Toby Segaran&lt;/a&gt;, Author of Collective Intelligence and Google Engineer&lt;/li&gt;
&lt;/ul&gt;&lt;p class="MsoPlainText"&gt;When the Strata audience was initially polled, the vote was 53 to 40 in favor of domain expertise.  Then the debate began with comments from the audience.&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;&lt;strong&gt;The Audience:  s/MachineLearning/DomainExpertise is Easy &lt;/strong&gt;&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;We heard from &lt;a href="http://thenoisychannel.com/" target="_blank"&gt;Daniel Tunkelang&lt;/a&gt;, who argued in favor of domain expertise, stating that it was easier to learn statistics and machine learning than to acquire a lifetime of expertise and intuition (perhaps it comes easy to Dr. Tunkelang, but I&amp;#8217;m not sure how many who have attempted to consume the Elements of Statistical Learning on their own would agree).&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;&lt;a href="http://people.stern.nyu.edu/cperlich/" target="_blank"&gt;Claudia Perlich&lt;/a&gt;, a three-time winner of the KDD Nuggets competition, stood up and shared how she had won contests in domains as varied as “breast cancer, movie prediction, and sales performance – and I can tell you I knew next to nothing about those things when I started.&amp;#8221;&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;The panelists were then asked to weigh in with their thoughts.&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;&lt;strong&gt;The Panelists:  Our Opponents Have Made Our Points for Us  &lt;/strong&gt;&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;Drew Conway, whose popular &lt;a href="http://www.drewconway.com/zia/?p=2378" target="_blank"&gt;Data Science Venn Diagram&lt;/a&gt; includes &amp;#8220;substantive expertise&amp;#8221; as one of its components (and truth be told, &amp;#8220;math &amp;amp; statistics knowledge&amp;#8221;) advocated that asking good questions is the most critical element in a data science project.  And the ability to ask good questions requires domain understanding.&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;Toby Segaran relayed a story about work I had done using &lt;a href="http://www.slideshare.net/dataspora/social-network-analysis-for-telecoms" target="_blank"&gt;social network analysis for modeling telco customer churn&lt;/a&gt;.  He went on to say that, &amp;#8220;Mike, a domain expert in almost nothing, actually outperformed the domain experts.&amp;#8221;  (&lt;em&gt;ed. note: Thanks for the backhanded compliment, Toby :) &lt;/em&gt;).&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;DJ Patil read from the original LinkedIn Data Science job posting, arguing that machine learning skills were not even mentioned.  Rather they were seeking those who had curiosity and the ability to rapidly acquire domain expertise in the area of social network analysis.  He cited their hire of a theoretical physicist from Stanford, Jonathan Goldman &amp;#8212; who did the initial groundbreaking work on the People You May Know algorithm &amp;#8212; as evidence that machine learning skills were not important.&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;Pete Skomoroch fired back that &amp;#8220;since machine learning and physics are both just mathematics&amp;#8221; that Jonathan was actually just a machine learning expert by another name.  Those skills, said Skomoroch, helped him tackle and ultimately succeed in a domain in which he had little prior expertise.&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;Pete Warden, arguing for machine learning skills, cited his own experience at JetPac, his new travel site, where identifying high quality user photos was a high priority.  They &lt;a href="http://www.readwriteweb.com/archives/how_two_startups_used_games_to_beat_the_developer.php" target="_blank"&gt;hosted a competition on Kaggle&lt;/a&gt;, the machine learning contest platform, and in three weeks had built a quality ranking algorithm for just $5,000.&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;Amy Heineike then retorted that Pete Warden had actually made the case against himself.  In outsourcing their machine learning, she claimed, they underscored the importance of the one thing they could not outsource: their own domain expertise.&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;Toby Segaran agreed that company founders have excellent domain expertise: that is why they started their companies.  But when hiring a first data scientist, they need to hire for what they don&amp;#8217;t have:  machine learning skills.  (Zing!)&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;Pete Skomoroch ended the debate with a rhetorical question, asking the audience to consider the most successful companies in recent years: was human intuition or was it analytics driving them?&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;&lt;strong&gt;The Verdict:  Let Us All Now Hail Our Machine Learning Overlords&lt;/strong&gt;&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;In the end, the audience was polled again, and the results were tabulated in parallel by the panel (using what I like to call &lt;a href="https://twitter.com/#!/medriscoll/statuses/91718097320423424" target="_blank"&gt;ManReduce&lt;/a&gt;), the verdict was: 52 for domain expertise, 55 for machine learning.&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;Like any good debate topic, there is merit on both sides of the domain expertise versus machine learning proposition.  As Hal Varian said when we asked him before the panel: &amp;#8220;it depends on the structure of the problem.”  And in fairness to the debate panelists, they did not choose their positions: we assigned teams fifteen minutes before we went on stage.&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;One of the conclusions reached was that, when a problem is well-structured (or to Drew Conway&amp;#8217;s point, when a good question is posed), it is much easier for machine learning to succeed.  Kaggle&amp;#8217;s strength as a contest platform is that domain experts have already framed the problem:  they choose the features of the data to use (feature engineering or &amp;#8220;feature creation&amp;#8221;, as Monica Rogati calls it) as well as the criteria for success. This is the first, hardest step in any data science project.  After this, machine learners can step in and develop the best algorithms for classifying and predicting new data (or, less usefully, explaining old data).&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;Thus who you decide to hire as your first data scientist &amp;#8212; a domain expert or a machine learner &amp;#8212; might be as simple as this: could you currently prepare your data for a Kaggle competition?  If so, then hire a machine learner.  If not, hire a data scientist who has the domain expertise and the data hacking skills to get you there.&lt;/p&gt;
&lt;p class="MsoPlainText"&gt;&lt;em&gt;(Thanks to O&amp;#8217;Reilly Media, and Strata organizers Edd Dumbill and Alistair Croll &amp;#8212; who suggested the Oxford Debate format &amp;#8212;  for hosting a terrific conference).&lt;/em&gt;&lt;/p&gt;
&lt;!--EndFragment--&gt;&lt;/p&gt;</description><link>http://medriscoll.com/post/18784448854</link><guid>http://medriscoll.com/post/18784448854</guid><pubDate>Mon, 05 Mar 2012 03:50:00 -0500</pubDate></item><item><title>start-ups belong in cities</title><description>&lt;p&gt;&lt;img src="http://media.tumblr.com/tumblr_lzuywg9ygU1qhkweo.png"/&gt;&lt;/p&gt;
&lt;p&gt;Last Saturday, I woke up and walked down to my favorite coffee shop in San Francisco, SightGlass coffee in SoMa.&lt;/p&gt;
&lt;p&gt;I met up with a couple of entrepreneurs pitching an amazing idea, and while ordering some mind-buzzingly-good drip coffee, ran into a mentor of mine.&lt;/p&gt;
&lt;p&gt;I write this because, while these interactions could have happened in the suburbs of Silicon Valley &amp;#8212; whether the Coupa Cafe in Palo Alto or Red Rock in Mountain View &amp;#8212; they are quintessentially enabled by four qualities of a city like San Francisco:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt; neighborhoods that mix commerce and living, that &amp;#8220;serve more than one primary function&amp;#8221;&lt;/li&gt;
&lt;li&gt; blocks that are walkable, short and broken up with alleyways and side streets&lt;/li&gt;
&lt;li&gt; buildings which are a diversity of the old and new, luxury and low-rent&lt;/li&gt;
&lt;li&gt; people are prevalent and sufficiently concentrated&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;These four qualities enable the unique vibrancy of urban neighborhoods, and were laid out by Jane Jacobs in her magnum opus &amp;#8220;The Death and Life of Great American Cities.&amp;#8221;&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;I know that, for &lt;a href="http://steveblank.com/secret-history/" target="_blank"&gt;historical reasons &lt;/a&gt; technology start-ups began in Silicon Valley.  But there is something tragic about watching 22-year-old software engineers waiting on city corners for buses to take them to work in the suburbs.&lt;/p&gt;
&lt;p&gt;Especially when San Francisco is undergoing a renaissance of technology firms, driven by a few forces:&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;anchor firms like Twitter, Splunk, DropBox, Zynga, and Square&lt;/li&gt;
&lt;li&gt;flourishing of start-up neighborhoods like SoMa, and now DogPatch&lt;/li&gt;
&lt;li&gt;early stage VC firms with a strong SF presence, like True Ventures and OATV&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;To that end, I&amp;#8217;m a huge supporter of the &lt;a href="http://techcrunch.com/2012/01/13/ron-conway-mayor-lee-and-heather-harde-launch-sfciti-want-to-keep-sf-at-the-forefront-of-tech/" target="_blank"&gt;sf.citi initiative&lt;/a&gt; which is helping strengthen the community of hackers, entrepreneurs, and firms who recognize the unique advantages that a city provides.&lt;/p&gt;
&lt;p&gt;To the aspiring young engineers thinking of coming West, I say: don&amp;#8217;t settle for a bagels and Wi-Fi ride to an office park or even a campus.  Come to a Great American City, we have amazing start-ups for you to join.&lt;/p&gt;
&lt;p&gt;To the entrepreneurs and their investors: curse the overpriced rents in San Francisco but recognize that efficient markets sometimes express a valid point.  Where there is high value there is high cost, and as Richard Florida has observed, the world&amp;#8217;s brightest and most talented people flock to cities.  So invest in them, their start-ups and the cities &amp;#8212; San Francisco, New York, Chicago, London, Beijing &amp;#8212; where they want to live.&lt;/p&gt;</description><link>http://medriscoll.com/post/18137813025</link><guid>http://medriscoll.com/post/18137813025</guid><pubDate>Thu, 23 Feb 2012 13:21:00 -0500</pubDate></item><item><title>ETL: the coal mining of the information age</title><description>&lt;p class="MsoNormal"&gt;&lt;em&gt;&amp;#8220;If I were starting a NoSQL-in-the-enterprise startup, I would focus on ETL. ETL is a mess, and is a precursor for any fancy uses of data.&amp;#8221;&lt;/em&gt; - &lt;a href="https://twitter.com/#!/jaykreps/status/116519274146832385" target="_blank"&gt;@jaykreps&lt;/a&gt;&lt;/p&gt;
&lt;p class="MsoNormal"&gt;&lt;em&gt;&amp;#8220;@jaykreps ETL is the coal mining of the information age: dirty, important work that fuels the economy.&amp;#8221;&lt;/em&gt; - &lt;a href="https://twitter.com/#!/peteskomoroch/status/116527271153643520" target="_blank"&gt;@peteskomoroch&lt;/a&gt;&lt;/p&gt;
&lt;p class="MsoNormal"&gt;One of the largest obstacles facing companies who seek to derive value from data isn&amp;#8217;t data&amp;#8217;s size.  It&amp;#8217;s data&amp;#8217;s dirtiness.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;It&amp;#8217;s been said before: &lt;a href="http://twitter.com/#!/jadler/status/141898071050162177" target="_blank"&gt;80% of the effort&lt;/a&gt; that goes into a data science project is &lt;strong&gt;e&lt;/strong&gt;xtracting, &lt;strong&gt;t&lt;/strong&gt;ransforming, and &lt;strong&gt;l&lt;/strong&gt;oading (ETL&amp;#8217;ing) data into a system where it can be analyzed.&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;This challenge is not simply a consequence of poorly structured data: free form text records are now mostly rare.  Yet there remains bewildering variety within well-structured, regular data.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;Take the basic dimension of time, an attribute that nearly every data set contains.  A date can be expressed as POSIX or ISO8601 strings or a Unix epoch integer, among myriad other forms: &lt;/p&gt;
&lt;ul&gt;&lt;li&gt;  Sat Dec 10&amp;#160;10:37:13 PST&lt;/li&gt;
&lt;li&gt;  2011-12-11T18:37:13.0+0000&lt;/li&gt;
&lt;li&gt;  1323599850&lt;/li&gt;
&lt;/ul&gt;&lt;p class="MsoNormal"&gt;And dates are just the beginning.  There are country codes, currency symbols, geospatial coordinates, and language indicators.  Beyond the data itself, there how it is delimited and encoded (including XML, the &lt;a href="http://www.dataspora.com/2009/08/xml-and-big-data/" target="_blank"&gt;clamshell plastic packaging&lt;/a&gt; of data formats).&lt;/p&gt;
&lt;p class="MsoNormal"&gt;Data platform businesses create value by reducing the friction of data flow among participants.  They do this with standards.  The financial services industry, the most mature of data verticals, has defined symbologies for equities and other tradeable instruments.  Consumer goods have UPC barcodes.  Governments have national postal codes.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;The manufacturing world has long appreciated the value of interchangeable parts, in lowering the costs of creating everything from electronics to airplanes.  Historically, standards arise in one of two ways: through the de jure recommendation of a &lt;a href="http://iso.org" target="_blank"&gt;consortium&lt;/a&gt;, or through the de facto adoption of a market leader&amp;#8217;s schema.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;As the &lt;a href="http://gigaom.com/2008/11/09/mapreduce-leads-the-way-for-parallel-programming/" target="_blank"&gt;industrial revolution of data&lt;/a&gt; continues to unfold, we need data platforms and standards&amp;#8217; bodies to facilitate &amp;#8220;interchangeable data&amp;#8221;.  These will accelerate the growth of a new breed of data-driven applications and services.  Clean coal mining may be a fantasy, but clean data mining may yet be possible.&lt;/p&gt;</description><link>http://medriscoll.com/post/14079568288</link><guid>http://medriscoll.com/post/14079568288</guid><pubDate>Sun, 11 Dec 2011 15:22:00 -0500</pubDate></item><item><title>why everyone should be a medical data donor</title><description>&lt;p&gt;&lt;p class="MsoNormal"&gt;&lt;img src="http://media.tumblr.com/tumblr_lw0n4lI2tY1qhkweo.png"/&gt;&lt;/p&gt;
&lt;p class="MsoNormal"&gt;What happens to your medical records when you die?  &lt;a href="http://twitter.com/#!/gilelbaz" target="_blank"&gt;Gil Elbaz&lt;/a&gt; thinks you ought to donate them to science, a thought he shared with a technology audience this past week.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;It&amp;#8217;s a fascinating idea.  But why wait until you&amp;#8217;re dead?  In the age of the quantified self, why shouldn&amp;#8217;t you be able to give your DNA sequence, your diet, and your disease diagnoses to science while you&amp;#8217;re alive?  Unlike your organs, you can donate your data away and yet still keep it.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;We have companies collecting vast swaths of data about our buying, browsing, and clicking habits to sell us more stuff.  But when it comes to understanding what behaviors keep us healthy, it&amp;#8217;s a rocky landscape of HIPAA-regulated, technologically-challenged health insurers and providers.  We collect so much data about what makes us click, yet so little about makes us tick.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;There are pockets of hope.  Sites such as &lt;a href="http://www.patientslikeme.com/" target="_blank"&gt;PatientsLikeMe&lt;/a&gt; &amp;#8212; which as this writing has 122,640 patients and over a thousand conditions &amp;#8212; and &lt;a href="http://ginger.io/" target="_blank"&gt;Ginger.io&lt;/a&gt; are green sprouts in a bottom-up, democratizing data movement for health.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;Nearly eight out of ten people on the planet earth now own a mobile phone.  These phones send so-called &amp;#8220;heartbeat&amp;#8221; data to cell towers every few seconds.  Imagine if, instead, we had the true heartbeat data of the humans carrying those phones?  A simple cardiac signal can &lt;a href="http://publications.nigms.nih.gov/computinglife/signals.htm" target="_blank"&gt;betray a host of health issues&lt;/a&gt;, from stress and aging to a warning of impending stroke or heart attack.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;I know that I&amp;#8217;m not alone in being willing to give my data to medical science.  If the Fitbit or Jawbone UP had a checkbox that read &amp;#8220;donate my data&amp;#8221;, and the receiving institution was a trusted one, it could be the beginning of a valuable data bank.  If the Red Cross can convince us to stick needles in our arms to give blood, certainly we can endure bracelets on our wrists to give data.&lt;/p&gt;&lt;/p&gt;</description><link>http://medriscoll.com/post/14042364592</link><guid>http://medriscoll.com/post/14042364592</guid><pubDate>Sun, 11 Dec 2011 03:10:00 -0500</pubDate></item><item><title>lies, damned lies, and social media statistics</title><description>&lt;p&gt;&lt;img src="http://media.tumblr.com/tumblr_luz7o4KkLt1qhkweo.png"/&gt;&lt;/p&gt;
&lt;p&gt;Social media statistics &amp;#8212; shares, retweets, and likes &amp;#8212; reflect content&amp;#8217;s value the way a funhouse mirror reflects one&amp;#8217;s looks: grotesquely.  As the web lines its halls with social mirrors, these distortions are influencing the content we create and consume.&lt;/p&gt;
&lt;p&gt;One need look no further than the headlines at Hacker News for a gallery of the grotesque:  &amp;#8221;&lt;em&gt;N Reasons&amp;#8230;&lt;/em&gt;&amp;#8221;, &amp;#8220;&lt;em&gt;Why X is Wrong&lt;/em&gt;&amp;#8221;, &lt;em&gt;&amp;#8220;Free Y&lt;/em&gt;&amp;#8221;, and &amp;#8220;&lt;em&gt;How Z.. Cancer&lt;/em&gt;&amp;#8221;.  Many of these stories are explicitly crafted to achieve fifteen seconds of fame.&lt;/p&gt;
&lt;p&gt;I plead guilty of this seduction  &amp;#8212; with &lt;a href="http://twitter.com/#!/kottke/status/57811848627634176" target="_blank"&gt;@jkottke telling me off&lt;/a&gt; as proof &amp;#8212; because it&amp;#8217;s tempting to believe that metrics are an honest measure of value.  They&amp;#8217;re not.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Social Media Statistics are Biased&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hacker News readers are not a representative audience. Because of the frenzied frequency with which they flood the voting booths of cyberspace, their influence is outsized &amp;#8212; and perversely enough, in inverse proportion to their attention spans.&lt;/p&gt;
&lt;p&gt;We need a balance against these biases.  A retweet from @timoreilly means more than one from @lolz69.  Klout has attempted, &lt;a href="http://techcrunch.com/2011/10/26/nobody-gives-a-damn-about-your-klout-score/" target="_blank"&gt;with some ignominy&lt;/a&gt;, to measure online influence. If we weighted retweet counts by influence, we might have a better measure of an article&amp;#8217;s impact.&lt;/p&gt;
&lt;p&gt;Time matters too. All content is a zero until someone reacts, so we need to gauge the speed of +1s or shares, not just the total.&lt;/p&gt;
&lt;p&gt;And positive feedback loops are everywhere.  We end up reading and sharing the same few dozen articles every day, not because these are always the most valuable, but because once they&amp;#8217;ve bubbled up into the meme pool, they get recirculated and amplified.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Be a First Follower&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The strongest signal of quality should be the content itself, not its number of shares or comments.  If you keep an open mind, you&amp;#8217;ll encounter that joy of discovery once so integral to the web.  &lt;a href="http://www.worrydream.com" target="_blank"&gt;Lovely gems&lt;/a&gt; still lurk out there.  &lt;/p&gt;
&lt;p&gt;Being the first follower takes a smidgeon of bravery.  So ignore what other people think and share something no one else has.  You&amp;#8217;ll be a democratizing force.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Connect with People, Don&amp;#8217;t Collect Them&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Few of us share our ideas, photographs, and experiences online solely to collect followers.  We do so to convince, to delight, to connect with people.  &lt;/p&gt;
&lt;p&gt;If you&amp;#8217;re a creator, never confuse numbers with the value of your creative output.  Resist the urge to chase some earlier success.  If you create something of lasting value, which has staying power after the initial spasms of interest have passed, you will engage with your audience in a way that few metrics reveal.&lt;/p&gt;
&lt;p&gt;Blogging to boost your follower count is like launching a start-up to build your bank balance:  it rarely works.  Instead, focus passionately on creating value, and the rest will come.&lt;/p&gt;</description><link>http://medriscoll.com/post/13076772411</link><guid>http://medriscoll.com/post/13076772411</guid><pubDate>Sun, 20 Nov 2011 15:54:00 -0500</pubDate></item><item><title>what to feed the mythical machine learning beast?</title><description>&lt;p&gt;&lt;p class="MsoNormal"&gt;&lt;img src="http://media.tumblr.com/tumblr_luxmanTEKd1qhkweo.jpg"/&gt;&lt;/p&gt;
&lt;p class="MsoNormal"&gt;One of the holy grails of machine learning is the creation of a system that can &amp;#8220;read the web&amp;#8221; and learn from it, as Isaac Newton read Euclid&amp;#8217;s Elements and taught himself geometry.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;Imagine a mythical beast that could speed-read one-hundred million pages per second, consuming every Wikipedia entry, every scientific article on arxiv.org, every out-of-copyright scanned book, and beyond just indexing that information, could actually reason with it.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;&lt;!-- more --&gt;Building an intelligent machine isn&amp;#8217;t &lt;strike&gt;hard&lt;/strike&gt; impossible.  It&amp;#8217;s building a learning machine, one that mirrors the magic by which a teenager learns to drive a car, play chess, or do calculus in a period of a few dozen hours &amp;#8212; that&amp;#8217;s the magic that we haven&amp;#8217;t yet figured out.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;But I wonder if some of our challenges in creating this mythical learning machine lie with what we&amp;#8217;re trying to feed the beast.  After all, the web of documents was written for human consumption.  Natural language is a lossy compression algorithm; it maps the massive varieties of our experiences into semantic text.  A high-frequency sensory stream of sights, sounds, and experiences gets hashed into &amp;#8220;cold sidewalks are slippery.&amp;#8221;&lt;/p&gt;
&lt;p class="MsoNormal"&gt;To that end, if we want machines to reason about our world, let&amp;#8217;s stop giving them our digested cud of content.  Let&amp;#8217;s provide them direct experience, via the sensor streams that our instrumented planet is emitting via weather stations, transit networks, electrical grids, smart phones, fitbits, and GPS devices. With that data, machines might begin to intuit relationships between weather and sidewalk slips &amp;#8212; in forms that are beyond our own human minds to comprehend.&lt;/p&gt;
&lt;p class="MsoNormal"&gt;It&amp;#8217;s data, not documents, that the mythical machine learning beast will eat.&lt;/p&gt;
&lt;!--EndFragment--&gt;&lt;/p&gt;</description><link>http://medriscoll.com/post/13034410709</link><guid>http://medriscoll.com/post/13034410709</guid><pubDate>Sat, 19 Nov 2011 18:52:00 -0500</pubDate></item><item><title>"We spend more time working than we do on almost any other activity in our lives. People want all..."</title><description>““We spend more time working than we do on almost any other activity in our lives. People want all that time to mean something.””&lt;br/&gt;&lt;br/&gt; - &lt;em&gt; Laslo Bock, &lt;a href="http://www.thinkwithgoogle.com/quarterly/people/laszlo-bock-people-ops.html" target="_blank"&gt;“Passion, Not Perks”&lt;/a&gt;&lt;/em&gt;</description><link>http://medriscoll.com/post/12697565939</link><guid>http://medriscoll.com/post/12697565939</guid><pubDate>Sat, 12 Nov 2011 13:50:13 -0500</pubDate></item><item><title>beyond hadoop: fast queries from big data</title><description>&lt;p&gt;&lt;img src="http://media.tumblr.com/tumblr_luk96hHC3b1qhkweo.png"/&gt;&lt;/p&gt;
&lt;p&gt;There&amp;#8217;s an unspoken truth lurking behind the scourge of Big Data and the heralding of Hadoop as its savior:&lt;/p&gt;
&lt;p&gt;While Hadoop shines as a processing platform, it is awkward as a query tool.&lt;/p&gt;
&lt;p&gt;Hive was developed by the folks at Facebook in 2008, as a means of providing an easy-to-use, SQL-like query language that would compile to MapReduce code.  A year later, Hive was responsible for &lt;a href="http://borthakur.com/ftp/hadoopworld.pdf" target="_blank"&gt;95% of the Hadoop jobs&lt;/a&gt; run on Facebook&amp;#8217;s servers.  This is consistent with another observation made by Cloudera&amp;#8217;s Jeff Hammerbacher: when Hive is installed on a client&amp;#8217;s Hadoop cluster, &lt;a href="http://www.dataspora.com/2009/11/sql-is-dead-long-live-sql/" target="_blank"&gt; its overall usage increases tenfold.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That data-heavy businesses can achieve visibility into the terabytes of logs that they generate is, at a primary level, a major step forward. Before the Hadoop era, this was difficult to impossible without a major engineering investment.  Thus Hadoop has solved the challenge of economically processing data at scale.  Hive has solved the challenge of hand-writing Hadoop queries.&lt;/p&gt;
&lt;p&gt;But there remains a painful challenge that Hive and Hadoop does not solve for: speed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A Powerful But Lumbering Elephant&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;!-- more --&gt;Hadoop does not respond anywhere close to &amp;#8220;human time&amp;#8221;, a term &lt;a href="http://radar.oreilly.com/2011/09/evolution-of-data-products.html" target="_blank"&gt; that describes response thresholds &lt;/a&gt; acceptable to a human user, typically on the order of seconds.  Larry Ellison and his marketing mavens invoke a similar theme when pitching their wares as &amp;#8220;analytics at the speed of thought.&amp;#8221;&lt;/p&gt;
&lt;p&gt;Nonetheless, this sluggishness is not the fault of Hive or Hadoop per se.  If a business user asks a question about a year&amp;#8217;s worth of data with Hive, a set of MapReduce jobs will dutifully scan and process, in parallel, terabytes of data to obtain the answer.  It&amp;#8217;s neither the commodity hardware that most Hadoop clusters use nor &lt;a href="http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/" target="_blank"&gt; some of its IO indulgences &lt;/a&gt; while executing processes, that are to blame. These are the low-order performance bits.&lt;/p&gt;
&lt;p&gt;And while Hadoop jobs do have a fairly constant overhead &amp;#8212; with a lower bound in the range of 15 seconds &amp;#8212; this is often considered trivial within the context of the minutes or hours that most full jobs are expected to take.&lt;/p&gt;
&lt;p&gt;The higher-order bits affecting query performance are: (i) the size of the data being scanned, (ii) the nature of storage, e.g. whether it is kept on disk or in memory, and (iii) the degree of parallelization.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;An Emerging Design Pattern:  Distill, then Store &lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As a result, a common design pattern is emerging among data-heavy firms: Hadoop is used as a pre-processing tool to generate summarized data cubes, which is then loaded into an in-memory, parallelized database &amp;#8212; be it Oracle Exalytics, Netezza, Greenplum or even &lt;a href="http://corp.klout.com/blog/2011/11/big-data-bigger-brains/" target="_blank"&gt;Microsoft SQL Server&lt;/a&gt;.  Occasionally, a traditional database query layer can be bypassed altogether, and summary data cubes can be loaded directly into a desktop analytics tool such as Qlikview, Spotfire, or Tableau.&lt;/p&gt;
&lt;p&gt;At my start-up Metamarkets, we have embraced this design pattern and the role that Hadoop plays in preparing data for fast queries.  Our particular bag of tricks is best described by the &lt;a href="http://metamarketsgroup.com/blog/druid-part-i-real-time-analytics-at-a-billion-rows-per-second/" target="_blank"&gt;three principles of Druid&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Distill:&lt;/strong&gt; We roll data data up the coarsest grain at which a user might have reasonable interest.  Put simply, it is rare that one is concerned with individual events at one-second time frames.  Rolling up to groups of events, with a select set of dimensions and at minutely or hourly granularity, can distill raw data&amp;#8217;s footprint down to 1/100th of its original size.&lt;/li&gt;
&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Distribute:&lt;/strong&gt; While this summarized data is spread across multiple nodes in our cluster, the queries against this data are also distributed and parallelized.  In our quest to break into the &amp;#8220;human time&amp;#8221; threshold, we have increased this parallelization to as many as 1000 cores, allowing each query to hit a large percentage of nodes on our cluster.  In our experience, CPUs are rarely the bottleneck for systems serving human clients, even for a cluster serving hundreds of users concurrently.&lt;/li&gt;
&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Keep in Memory:&lt;/strong&gt; We share Curt Monash&amp;#8217;s sentiment that &lt;a href="http://www.dbms2.com/2011/05/23/databases-ram/" target="_blank"&gt; traditional databases will eventually end up in RAM &lt;/a&gt;, as memory costs continue to fall.  In-memory analytics are popular because they are fast, often 100x to 1000x faster than disk.  This dramatic performance kick is what makes Qlikview such a popular desktop tool. &lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;The end result of these three techniques, each of which independently delivers between a 10 and 1000-fold improvement, is a platform that can run in seconds what previously took minutes or even hours in Hive.&lt;/p&gt;
&lt;p&gt;This approach, for which &lt;a href="http://blog.aggregateknowledge.com/2011/09/08/our-approach/" target="_blank"&gt;we know we are not alone&lt;/a&gt; in pursuing, achieves performance that exceeds or matches any of the &lt;a href="http://gigaom.com/cloud/why-oracles-big-boxes-are-on-the-wrong-side-of-history/" target="_blank"&gt;big box retailers&lt;/a&gt; at a considerably lower price point.&lt;/p&gt;
&lt;p&gt;The commoditization wave that began with massive data processing, initiated by Hadoop, is migrating upwards towards query architectures. Thus the competitive differentiators are shifting away from large-scale data management and towards what might be called Big Analytics, where the next battle for profits will be fought.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;(reblogged from a version I wrote at the &lt;a href="http://metamarketsgroup.com/blog/" target="_blank"&gt;Metamarkets blog&lt;/a&gt;).&lt;/em&gt;&lt;/p&gt;</description><link>http://medriscoll.com/post/12696646461</link><guid>http://medriscoll.com/post/12696646461</guid><pubDate>Sat, 12 Nov 2011 13:28:00 -0500</pubDate></item><item><title>four threats to oracle, data's big box retailer </title><description>&lt;p&gt;&lt;img src="http://media.tumblr.com/tumblr_luk9dbS2WR1qhkweo.png"/&gt; This week&amp;#8217;s Oracle World was bracketed by two events. First: the unveiling of Oracle Exalytics, a beefy in-memory appliance dedicated to large-scale analytics, during Larry Ellison’s opening keynote. Second: the undressing of Oracle’s cloud computing initiatives by Marc Benioff, SalesForce’s CEO, and the unceremonious cancellation of his keynote on Wednesday morning.&lt;/p&gt;
&lt;p&gt;Both events highlight that when it comes to Big Data, analytics and cloud computing, Oracle is on the wrong side of history.&lt;/p&gt;
&lt;h2&gt;&lt;!-- more --&gt;Startups don&amp;#8217;t use Oracle&lt;/h2&gt;
&lt;p&gt;To glimpse the future of the data stack, Oracle need look no further than its own backyard, to what Silicon Valley start-ups are embracing: the distributed processing ecosystem of Hadoop, NoSQL data stores like MongoDB, and cloud platforms like Amazon’s web services.  As Marc Andreessen &lt;a href="http://www.businessinsider.com/boxnet-2011-9#ixzz1ZtG07jRb" target="_blank"&gt;said last week&lt;/a&gt;, &lt;strong&gt;“Not a single one of our startups uses Oracle.”&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span&gt;The truth is, Oracle can&amp;#8217;t support the kind of technology stacks embraced by startups — open-source software, elastic architectures, commodity hardware grids — because it cannibalizes revenue from their existing lines of business.&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;“I don’t care if our commodity X86 business goes to zero,” Ellison said in Oracle’s last earnings call, “We don’t make money selling that.”&lt;/p&gt;
&lt;p&gt;This commoditization wave has sent others, including HP, fleeing from hardware, but it has driven Oracle into the breach as a big box retailer: they&amp;#8217;re attempting to capture higher margins on sales of SPARC architectures.&lt;/p&gt;
&lt;p&gt;But history is not on Oracle’s side.  Here are four realities that Oracle must face to maintain its unassailable position as the world’s leading data firm:&lt;/p&gt;
&lt;h2&gt;Threat #1: The future of data is distributed&lt;/h2&gt;
&lt;p&gt;&amp;#8220;Lots of little servers everywhere, lots of little databases everywhere. Your information got hopelessly fragmented in the process.&amp;#8221; – from Matthew Symonds book &lt;em&gt;&lt;a href="http://www.amazon.com/Softwar-Intimate-Portrait-Ellison-Oracle/dp/074322504X" target="_blank"&gt;Softwar&lt;/a&gt;&lt;/em&gt; (p. 38).&lt;/p&gt;
&lt;p&gt;This is how Larry Ellison described the technology landscape of the 1990s, and his personal jihad against complexity has deepened Oracle’s distrust of distributed computing.&lt;/p&gt;
&lt;p&gt;But the tide of data isn’t turning back, and the scale is too large to contain in any box; Big Data, on the scale of hundreds of terabytes to petabytes, must be distributed across “lots of little servers.” The most viable tool available today for processing and persisting Big Data is Hadoop.&lt;/p&gt;
&lt;p&gt;Whether at the data layer — or a level above, at analytics — firms must adapt to this distributed reality and build tools that enable parallelized, many-to-many migration of data between nodes on Hadoop and those on their own platforms.&lt;/p&gt;
&lt;h2&gt;Threat #2: The future of computing is elastic&lt;/h2&gt;
&lt;p&gt;Metal server boxes don’t bend or expand; they are inelastic, both physically and economically.  In contrast, the needs of businesses are highly elastic; as companies grow, they shouldn’t have to unpack and install boxes to meet their compute needs, any more than they should install generators for more electricity.&lt;/p&gt;
&lt;p&gt;Computing is a utility, compute cycles are fungible, and firms want to pay for what they need, when it’s needed, like electricity.&lt;/p&gt;
&lt;p&gt;The ability to scale storage and compute capacity up or down, within minutes, is liberating for individuals and cost-effective for organizations, but it is impossible with a “cloud in a box.”  It is only enabled by a true cloud computing infrastructure, with virtualization and dynamic provisioning from a common pool of resources.&lt;/p&gt;
&lt;h2&gt;Threat #3: The future of applications is not on the desktop&lt;/h2&gt;
&lt;p&gt;Despite Oracle having developed the first pure network computer in 1996 (or perhaps because of this), far too many of Oracle’s supporting business applications are delivered via the desktop, rather than via web browsers.&lt;/p&gt;
&lt;p&gt;By comparison, Cloudera has created a rich web-based application for managing and monitoring all aspects of Hadoop clusters; Amazon Web Services has a fully-featured web console for interacting with its offerings; and Salesforce’s products are almost exclusively web-driven.&lt;/p&gt;
&lt;p&gt;The expressivity afforded by web browsers has risen dramatically in the last two years, particularly with the emergence of &lt;a href="http://metamarketsgroup.com/blog/node-js-and-the-javascript-age/" target="_blank"&gt;Javascript as the lingua franca&lt;/a&gt; of web application development, and improvements in Javascript engines.&lt;/p&gt;
&lt;p&gt;The same trend from desktop to browser also extends into mobile devices.  An increasingly large fraction of computing occurs on smart phones and tablets, and forward-thinking firms, like Dropbox, have built applications that cater to this reality.&lt;/p&gt;
&lt;h2&gt;Threat #4: The future of analytics is beautiful&lt;/h2&gt;
&lt;p&gt;The decades of disappointment with business intelligence tools isn’t due only to their lack of brains (such that they’ve now fled to the fresh moniker of “business analytics”), but also the absence of beauty. &lt;a href="http://shop.oreilly.com/product/0636920000617.do" target="_blank"&gt;Data is beautiful&lt;/a&gt;, as any reader of Edward Tufte can attest.&lt;/p&gt;
&lt;p&gt;When visualized thoughtfully and artfully, data has an almost hymnal power to persuade decision makers.  And when exploring data of high complexity and dimensionality, the kind that lives in Oracle’s databases, tools that accelerate the &lt;a href="http://gigaom.com/cloud/mean-time-to-pretty-chart-devops-meets-data-porn/" target="_blank"&gt;“mean time to pretty chart”&lt;/a&gt; are essential.&lt;/p&gt;
&lt;p&gt;In addition, analytics tool users are right to expect a smooth user experience on a par with other tools, whether photo editing or word processing, when they are creating and exploring data visualizations.&lt;/p&gt;
&lt;p&gt;Yet amidst all of Oracle’s presentations and marketing materials about big data and analytics, one finds &lt;a href="http://scribe.twitter.com/#!/medriscoll/status/121698127769120770" target="_blank"&gt;not a single dashboard or visualization&lt;/a&gt; to stir the senses.&lt;/p&gt;
&lt;p&gt;While Spotfire and Tableau are notable exceptions to this critique, on the whole, the tools that dot the Oracle landscape lack either brains or beauty.&lt;/p&gt;
&lt;p&gt;Enterprises will be slow to wake up to these realities, and Oracle will continue to profit handsomely from their slumber.&lt;/p&gt;
&lt;p&gt;&lt;span&gt; &lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;Fin: Oracle is ripe for attack by data services&lt;/h2&gt;
&lt;p&gt;The opportunities abound to chip away at the massive market share that Oracle now holds, providing data services to start-ups who won&amp;#8217;t buy Oracle’s capital intensive boxes, and helping medium-sized businesses migrate to flexible, cost-effective, cloud-based alternatives.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;(An earlier version of post was published as a &lt;a href="http://gigaom.com/cloud/why-oracles-big-boxes-are-on-the-wrong-side-of-history/" target="_blank"&gt;guest column&lt;/a&gt; at GigaOm).&lt;/em&gt;&lt;/p&gt;</description><link>http://medriscoll.com/post/11163924351</link><guid>http://medriscoll.com/post/11163924351</guid><pubDate>Fri, 07 Oct 2011 21:59:00 -0400</pubDate></item><item><title>the secret guild of silicon valley</title><description>&lt;p&gt;&lt;img align="top" alt="The governors of the guild of St. Luke, Jan de Bray" height="281" src="http://upload.wikimedia.org/wikipedia/commons/archive/c/ce/20091020112114%21Jan_de_Bray_002.jpg" width="400"/&gt;&lt;/p&gt;
&lt;p&gt;A couple of weeks ago, I was drinking beer in San Francisco with friends when &lt;a href="http://twitter.com/#!/jaykreps/status/101883190515474432" target="_blank"&gt;someone quipped&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&amp;#8220;You have too many hipsters, you won&amp;#8217;t scale like that. Hire some fat guys who know C++.&amp;#8221; &lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It&amp;#8217;s funny, but it got me thinking.  Who are the &amp;#8220;fat guys who know C++&amp;#8221;, or as someone else put it, &amp;#8220;the guys with neckbeards, who keep Google&amp;#8217;s servers running&amp;#8221;? And why is it that if you encounter one, it&amp;#8217;s like pulling on a thread, and they all seem to know each other?&lt;!-- more --&gt;&lt;/p&gt;
&lt;p&gt;The reason is because the top engineers in Silicon Valley, whether they realize it or not, are part of a secret Guild.  They are a confraternity of craftsmen who share a set of traits:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Their craft is creating software&lt;/li&gt;
&lt;li&gt;Their tools of choice are C, C++, and Java &amp;#8212; not Javascript or PHP&lt;/li&gt;
&lt;li&gt;They wear ironic t-shirts, and that is the outer limit of their fashion sense&lt;/li&gt;
&lt;li&gt;They&amp;#8217;re not hipsters who live in the Mission or even in the city; they live near a CalTrain stop, somewhere on the Peninsula&lt;/li&gt;
&lt;li&gt;They meet for Game Night on Thursdays to play Settlers of Catan&lt;/li&gt;
&lt;li&gt;They are passive, logical, and Spock-like&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;They aren&amp;#8217;t interested in tweeting, blogging, or giving talks at conferences.  They care about building and shipping code.  They&amp;#8217;re more likely to be found in IRC chat rooms, filing JIRAs for Apache projects, or spinning out Github repos in their spare time.&lt;/p&gt;
&lt;p&gt;They are part of a nomadic band of software tradesmen, who have mentored one another over the last four decades in Silicon Valley, and they have quietly, steadily built the infrastructure behind the world&amp;#8217;s most successful companies.  When they leave &amp;#8212; as they have places like Netscape, Sun, and Yahoo &amp;#8212; the firms they leave behind wither and die.&lt;/p&gt;
&lt;p&gt;If you want to build a technology company, you&amp;#8217;ll need to hire them, but you&amp;#8217;ll never find a member of the Guild through a recruiter.  They are being cold-called, cold-emailed, and cold-LinkedIn-messaged on a daily basis by recruiters, but their response will be similarly cold.&lt;/p&gt;
&lt;p&gt;A true member of the Guild is only ever an IM away from a new job at Facebook, Google, or the long archipelago of start-ups their fellow members are busy building.  Outwardly successful companies that fail to draw engineers from the Guild will struggle with the performance and stability of their technology &amp;#8212; as LinkedIn did in its early days and as Twitter did until recently.&lt;/p&gt;
&lt;p&gt;It&amp;#8217;s rare for an entrepreneur or executive to earn membership in the Guild, for that requires a path of apprenticeship that few have the talent or stamina for.  But it&amp;#8217;s possible to earn the respect of the Guild, and to convince its members that your company is a hall where they can gather daily to mentor and develop their craft.&lt;/p&gt;
&lt;p&gt;It begins with having an engineering-led culture, where technology decisions are made on their technical merits, never on personal grounds.  It also means allowing craftsmen to solve problems by creating new tools, rather than with just a labored application of the old.  These are values that Google and Facebook, two veritable Guild halls of the Valley, tout to any engineer who asks.&lt;/p&gt;
&lt;p&gt;Finally, the implicit compact that the Guild makes with a company is that their efforts will not be in vain.  The most powerfully attractive force for the Guild is the promise of building a product that will get into the happy hands of hundreds, thousands, or millions.  This is the coveted currency that even companies that have struggled to build an engineering reputation, like foursquare, can offer. &lt;/p&gt;
&lt;p&gt;The Guild of Silicon Valley is largely invisible, but their affiliations have determined the rise and fall of technology giants.  The start-ups who recognize the unsung talents of its members today will be tomorrow&amp;#8217;s success stories.&lt;/p&gt;
&lt;p&gt;[ &lt;em&gt;Addendum:  George E.P. Box said &amp;#8220;All models are wrong.  Some models are useful.&amp;#8221;  While my tongue-in-cheek model of the anti-hipster Guild of Engineers has angered those who interpret it literally, my rhetorical goal is to make a point:  that the hard work of engineering isn&amp;#8217;t glamorous, and is often invisible to the media or the reigning pop culture of start-ups you&amp;#8217;ll find in San Francisco.  If you want to build a successful technology company, you would do well to target the experienced  folks who have been honing their craft in the trenches of Silicon Valley for the last few decades, and those whom they&amp;#8217;ve mentored. ]&lt;/em&gt;&lt;/p&gt;</description><link>http://medriscoll.com/post/9117396231</link><guid>http://medriscoll.com/post/9117396231</guid><pubDate>Fri, 19 Aug 2011 05:20:00 -0400</pubDate></item><item><title>The Big Data Stack, from my piece, Building Data Startups at...</title><description>&lt;img src="http://25.media.tumblr.com/tumblr_lq3p3oPyuH1qj8p0uo1_500.png"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;The Big Data Stack, from my piece, &lt;a href="http://radar.oreilly.com/2011/08/building-data-startups.html" target="_blank"&gt;Building Data Startups&lt;/a&gt; at O’Reilly Radar.&lt;/p&gt;</description><link>http://medriscoll.com/post/9062115121</link><guid>http://medriscoll.com/post/9062115121</guid><pubDate>Wed, 17 Aug 2011 21:50:12 -0400</pubDate></item><item><title>Slides from my presentation from O’Reilly’s Strata...</title><description>&lt;img src="http://24.media.tumblr.com/tumblr_lltxuc4zJ11qj8p0uo1_500.png"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;&lt;a href="http://www.slideshare.net/medriscoll/driscoll-strata-buildingdatastartups25may2011clean" target="_blank"&gt;Slides from my presentation&lt;/a&gt; from O’Reilly’s Strata Online Conference on May 25, 2010.&lt;/p&gt;</description><link>http://medriscoll.com/post/5883154627</link><guid>http://medriscoll.com/post/5883154627</guid><pubDate>Thu, 26 May 2011 21:08:37 -0400</pubDate></item><item><title>"information is the new oil."</title><description>&lt;p&gt;This past February, I moderated an event at Stanford about Predictive Analytics.  I led with a brief introduction, followed by a discussion with &lt;span&gt;Omar Tawakol, CEO of Bluekai, &lt;/span&gt;&lt;span&gt;Scott Burke of Yahoo!, &lt;/span&gt;&lt;span&gt;Matt Barkoff, VP at Badgeville, and &lt;/span&gt;&lt;span&gt;Theresia Gouw Ranzetta, Partner at Accel Partners.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.youtube.com/watch?v=-2240_fDJQE" target="_blank"&gt;See the YouTube video here.&lt;/a&gt; &lt;/p&gt;</description><link>http://medriscoll.com/post/5539615666</link><guid>http://medriscoll.com/post/5539615666</guid><pubDate>Mon, 16 May 2011 04:08:00 -0400</pubDate></item><item><title>node.js and the javascript age</title><description>&lt;p&gt;&lt;img src="http://media.tumblr.com/tumblr_ljw0wmboYw1qhkweo.jpg"/&gt;&lt;/p&gt;
&lt;p&gt;Three months ago, we decided to tear down the framework we were using for our dashboard, Python’s Django, and rebuild it entirely in server-side Javascript, using node.js. (If there is ever a time in a start-ups life to remodel parts of your infrastructure, it’s early on, when your range of motion is highest.)&lt;/p&gt;
&lt;p&gt;This decision was driven by a realization: the LAMP stack is dead. In the two decades since its birth, there have been fundamental shifts in the web’s make-up of content, protocols, servers, and clients. Together, these mark three ages of the web.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://metamarketsgroup.com/blog/node-js-and-the-javascript-age/" target="_blank"&gt;Read Full Post at Metamarkets&lt;/a&gt;&lt;/p&gt;</description><link>http://medriscoll.com/post/4741853977</link><guid>http://medriscoll.com/post/4741853977</guid><pubDate>Tue, 19 Apr 2011 03:03:00 -0400</pubDate></item><item><title>color: the cinderella of dataviz</title><description>&lt;p&gt;&lt;span&gt;&lt;img src="http://media.tumblr.com/tumblr_ljw07hl5NA1qhkweo.png"/&gt;Color is one of the most abused and neglected tools in data visualization. It is abused when we make poor color choices; it is neglected when we rely on poor software defaults. Yet despite its historically poor treatment at the hands of engineers and end-users alike, if used wisely, color is unrivaled as a visualization tool.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.dataspora.com/blog/how-to-color-multivariate-data/" target="_blank"&gt;Read Full Post at Dataspora&lt;/a&gt;&lt;/p&gt;</description><link>http://medriscoll.com/post/4741665885</link><guid>http://medriscoll.com/post/4741665885</guid><pubDate>Tue, 19 Apr 2011 02:47:00 -0400</pubDate></item><item><title>"The final project in every college CS course should be: contribute a feature to an open-source..."</title><description>“The final project in every college CS course should be: contribute a feature to an open-source project.”&lt;br/&gt;&lt;br/&gt; - &lt;em&gt;&lt;a href="http://twitter.com/#!/medriscoll/status/29453498356" target="_blank"&gt;via Twitter.&lt;/a&gt;&lt;/em&gt;</description><link>http://medriscoll.com/post/4741329763</link><guid>http://medriscoll.com/post/4741329763</guid><pubDate>Tue, 19 Apr 2011 02:21:31 -0400</pubDate></item><item><title>the rise of the data web</title><description>&lt;p&gt;&lt;img align="top" src="http://media.tumblr.com/tumblr_ljvxgvgxkW1qhkweo.jpg"/&gt;&lt;span&gt;The future of the web is data, not documents. The web has evolved from Tim Berners-Lee’s original vision of&lt;a href="http://www.ted.com/index.php/talks/tim_berners_lee_on_the_next_web.html" target="_blank"&gt;“some big, virtual documentation system in the sky”&lt;/a&gt; into an vibrant ecosystem of data where documents — and human actors — will play an ever smaller role.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.dataspora.com/blog/the-rise-of-the-data-web/" target="_blank"&gt;Read Full Post at Dataspora&lt;/a&gt;&lt;/p&gt;</description><link>http://medriscoll.com/post/4740884393</link><guid>http://medriscoll.com/post/4740884393</guid><pubDate>Tue, 19 Apr 2011 01:50:00 -0400</pubDate><category>bigdata</category><category>semanticweb</category></item></channel></rss>

