Hype, Hubs & Hadoop: Some Notes from Strata NY 2013 Keynotes

Stratany2013_header_logo_tm_no_ormI didn't physically attend Strata NY + Hadoop World this year, but I did watch the keynotes from the conference. O'Reilly Media kindly makes videos of the keynotes and slides of all talks available very soon after they are given. Among the recurring themes were haranguing against the hype of big data, the increasing utilization of Hadoop as a central platform (hub) for enterprise data, and the importance and potential impact of making data, tools and insights more broadly accessible within an enterprise and to the general public. The keynotes offered a nice mix of business (applied) & science (academic) talks, from event sponsors and other key players in the field, and a surprising - and welcome - number of women on stage.

Atigeo, the company where I now work on analytics and data science, co-presented a talk on Data Driven Models to Minimize Hospital Readmissions at Strata Rx last month, and I'm hoping we will be participating in future Strata events. And I'm hoping that some day I'll be on stage presenting some interesting data and insights at a Strata conference.

Meanwhile, I'll include some of my notes on interesting data and insights presented by others, in the order in which presentations were scheduled, linking each presentation title to its associated video. Unlike previous postings of notes from conferences, I'm going to leave the notes in relatively raw form, as I don't have the time to add more narrative context or visual augmentations to them.

Hadoop's Impact on the Future of Data Management
Mike Olson @mikeolson (Cloudera)

3000 people at the conference (sellout crowd), up from 700 people in 2009.
Hadoop started out as a complement to traditional data processing (offering large-scale processing).
Progressively adding more real-time capabilities, e.g. Impala & Cloudera search.
More and more capabilities migrating form traditional platforms to Hadooop.
Hadoop moving from the periphery to the architectural center of the data center, emerging as an enterprise data hub.
Hub: scalable storage, security, data governance, engines for working with the data in place
Spokes connect to other systems, people
Announcing Cloudera 5, "the enterprise data hub"
Announcing Cloudera Connect Cloud, supporting private & public cloud deployments
Announcing Cloudera Connect Innovators, inaugural innovator is DataBricks (Spark real-time in-memory processing engine)

Separating Hadoop Myths from Reality
Jack Norris (MapR Technologies)

Hadoop is the first open source project that has spawned a market
3:35 compelling graph of Hadoop/HBase disk latency vs. MapR latency
Hadoop is being used in production by many organizations

Big Impact from Big Data
Ken Rudin (Facebook)

Need to focus on business needs, not the technology
You can use science, technology and statistics to figure out what the answers are, but it is still am art to figure out what the right questions are
How to focus on the right questions:
* hire people with academic knowledge + business savvy
* train everyone on analytics (internal DataCamp at Facebook for project managers, designers, operations; 50% on tools, 50% on how to frame business questions so you can use data to get the answers)
* put analysts in org structure that allows them to have impact ("embedded model": hybrid between centralized & decentralized)
Goals of analytics: Impact, insight, actionable insight, evangelism … own the outcome

Five Surprising Mobile Trajectories in Five Minutes
Tony Salvador (Intel Corporation)

Tony is director at the Experience Research Lab (is this the group formerly known as People & Practices?) [I'm an Intel Research alum, and Tony is a personal friend]
Personal data economy: system of exchange, trading personal data for value
3 opportunities
* hyper individualism (Moore's Cloud, programmable LED lights)
* hyper collectivity (student projects with outside collaboration)
* hyper differentiation (holistic design for devices + data)
Big data is by the people and of the people ... and it should be for the people

Can Big Data Reach One Billion People?
Quentin Clark (Microsoft)

Praises Apache, open source, github (highlighted by someone from Microsoft?)
Make big data accessible (MS?)
Hadoop is a cornerstone of big data
Microsoft is committed to making it ready for the enterprise
HD Insight (?) Azure offering for Hadoop
We have a billion users of Excel, and we need to find a way to let anybody with a question get that question answered.
Power BI for Office 365 Preview

What Makes Us Human? A Tale of Advertising Fraud
Claudia Perlich (Dstillery)

A Turing test for advertising fraud
Dstillery: predicting consumer behavior based on browsing histories
Saw 2x performance improvement in 2 weeks; was immediately skeptical
Integrated additional sources of data (10B bid requests)
Found "oddly predictive websites"
e.g., Women's health page --> 10x more likely to check out credit card offer, order online pizza, or reading about luxury cars
Large advertising scam (botnet)
36% of traffic is non-intentional (Comscore)
Co-visitation patterns
Cookie stuffing
Botnet behavior is easier to predict than human behavior
Put bots in "penalty box": ignore non-human behavior

From Fiction to Facts with Big Data Analytics
Ben Werther @bwerther (Platfora)

When it comes to big data, BI = BS
Contrasts enterprises based on fiction, feeling & faith vs. fact-based enterprises
Big data analytics: letting regular business people iteratively interrogate massive amounts of data in an easy-to-use way so that they can derive insight and really understand what's going on
3 layers: Deep processing + acceleration + rich analytics
Product: Hadoop processing + in-memory acceleration + analytics engines + Vizboards
Example: event series analytics + entity-centric data catalog + iterative segmentation

The Economic Potential of Open Data
Michael Chui (McKinsey Global Institute)

[Presentation is based on newly published - and openly accessible (walking the talk!) - report: Open data: Unlocking innovation and performance with liquid information.]

Louisiana Purchase: Lewis & Clark address a big data acquisition problem
Thomas Jefferson: "Your observations are to be taken with great pains & accuracy, to be entered intelligibly, for others as well as yourself"
What happens when you make data more liquid?

4 characteristics of "openness" or "liquidity" of data:
* degree of access
* machine readability
* cost
* rights

Benefits to open data:
* transparency
* benchmarking exposing variability
* new products and services based on open data (Climate Corporation?)

How open data can enable value creation
* matching supply and demand
* collaboration at scale
"with enough eyes on code, all bugs are shallow"
--> "with enough eyes on data, all insights are shallow"
* increase accountability of institutions

Open data can help unlock $3.2B [typo? s/b $3.2T?] to $5.4T in ecumenic value per year across 7 domains
* education
* transportation
* consumer products
* electricity
* oil and gas
* health care
* consumer finance
What needs to happen?
* identify, prioritize & catalyze data to open
* developer, developers, developers
* talent (data scientists, visualization, storytelling)
* address privacy confidentiality, security, IP policies
* platforms, standards and metadata

The Future of Hadoop: What Happened & What's Possible?
Doug Cutting @cutting (Cloudera)

Hadoop started out as a storage & batch processing system for Java programmers
Increasingly enables people to share data and hardware resources
Becoming the center of an enterprise data hub
More and more capabilities being brought to Hadoop
Inevitable that we'll see just about every kind of workload being moved to this platform, even online transaction processing

Designing Your Data-Centric Organization
Josh Klahr (Pivotal)

GE has created 24 data-driven apps in one year
We are working with them as a Pivotal investor and a Pivotal company, we help them build these data-driven apps, which generated $400M in the past year
Pivotal code-a-thon, with Kaiser Permanente, using Hadoop, SQL and Tableau

What it takes to be a data-driven company
* Have an application vision
* Powered by Hadoop
* Driven by Data Science

Encouraging You to Change the World with Big Data
David Parker (SAP)

Took Facebook 9 months to achieve the same number of users that it took radio 40 years to achieve (100M users)
Use cases
At-risk students stay in school with real-time guidance (University of Kentucky)
Soccer players improve with spatial analysis of movement
Visualization of cancer treatment options
Big data geek challenge (SAP Lumira): $10,000 for best application idea

The Value of Social (for) TV
Shawndra Hill (University of Pennsylvania)

Social TV Lab
How we can derive value from the data that is being generated by viewers today?
Methodology: start with Twitter handles of TV shows, identify followers, collect tweets and their networks (followees + followers), build recommendation systems from  the data (social network-based, product network-based & text-based (bag of words)). Correlate words in tweets about a show with demographics about audience (Wordle for male vs. female)
1. You can use Twitter followers to estimate viewer audience demographics
2. TV triggers lead to more online engagement
3. If brands want to engage with customers online, play an online game
Real time response to advertisement (Teleflora during Super Bowl): peaking buzz vs. sustained buzz
Demographic bias in sentiment & tweeting (male vs. female response to Teleflora, others)
Influence = retweeting
Women more likely to retweet women, men more likely to retweet men
4. Advertising response and influence vary by demographic
5. GetGlue and Viggle check-ins can be used as a reliable proxy for viewership to
* predict Nielsen viewership weeks in advance
* predict customer lifetime value
* measure time shifting
All at the individual viewer level (vs. household level)

Ubiquitous Satellite Imagery of our Planet
Will Marshall @wsm1 (Planet Labs)

Ultracompact satellites to image the earth on a much more frequent basis to get inside the human decision-making loop so we can help human action.
Redundancy via large # of small of satellites with latest technology (vs. older, higher-reliability systems on one satellite)
Recency: shows more deforestation than Google Maps, river movement (vs. OpenStreetMap)
API for the Changing Planet, hackathons early next year

The Big Data Journey: Taking a holistic approach
John Choi (IBM)

[No slides?]
Invention of sliced bread 
Big data [hyped] as the biggest thing since the sliced bread
Think about big data as a journey
1. It's all about discipline and knowing where you are going (vs. enamored with tech)
VC $2.6B investment into big data (IBM, SAP, Oracle, … $3-4B more)
2. Understand that any of these technologies do not live in a silo. The thing that you don't want to have happen is that this thing become a science fair project. At the end of the day, this is going to be part of a broader architecture.
3. This is an investment decision, want to have a return on investment.

How You See Data
Sharmila Shahani-Mulligan @ShahaniMulligan (ClearStory Data)

The Next Era of Data Analysis: next big thing is how you analyze data from many disparate sources and do it quickly.
More data: Internal data + external data
More speed: Fast answers + discovery
Increase speed of access & speed of processing so that iterative insight becomes possible.
More people: Collaboration + context
Needs to become easier for everyone across the business (not just specialists) to see insights as insights are made available, have to make decisions faster.
Data-aware collaboration
Data harmonization
Demo: 6:10-8:30

Can Big Data Save Them?
Jim Kaskade @jimkaskade (Infochimps)

1 of 3 people in US has had a direct experience with cancer in their family
1 in 4 deaths are cancer-related
Jim's mom has chronic leukemia
Just got off the phone with his mom (it's his birthday), and she asked "what is it that you do?"
"We use data to solve really hard problems like cancer"
Cancer is 2nd leading cause of death in children
"The brain trust in this room alone could advance cancer therapy more in a year than the last 3 decades."
Bjorn Brucher
We can help them by predicting individual outcomes, and then proactively applying preventative measures.
Big data starts with the application
Stop building your big data sandboxes, stop building your big data stacks, stop building your big data hadoop clusters without a purpose.
When you start with the business problem, the use case, you have a purpose, you have focus.
50% of big data projects fail (reference?)
"Take that one use case, supercharge it with big data & analytics, we can take & give you the most comprehensive big data solutions, we can put it on the cloud, and for some of you, we can give you answers in less than 30 days"
"What if you can contribute to the cure of cancer?" [abrupt pivot back to initial inspirational theme]

Changing the Face of Technology - Black Girls CODE
Peta Clarke @volunteerbgcny (Black Girls Code - NY), Donna Knutt @donnaknutt (Black Girls Code)

Why coding is important: By 2020, 1.4M computing jobs
Women of color currently make up 3% of computing jobs in US
Goal: teach 1M girls to code by 2040
Thus far: 2 years, 2000 girls, 7 states + Johannesburg, South Africa

Beyond R and Ph.D.s: The Mythology of Data Science Debunked
Douglas Merrill @DouglasMerrill (ZestFinance)

[my favorite talk]
Anything which appears in the press in capital letters, and surrounded by quotes, isn't real.
There is no math solution to anything. Math isn't the answer, it's not even the question.
Math is a part of the solution. Pieces of math have different biases, different things they do well, different things they do badly, just like employees. Hiring one new employee won't transform your company; hiring one new piece of math also won't transform your company.
Normal distribution, bell curve: beautiful, elegant
Almost nothing in the real world, is, in fact, normal.
Power laws don't actually have means.
Joke: How do you tell the difference between an introverted and an extroverted engineer? The extroverted one looks at your shoes instead of his own.
The math that you think you know isn't right. And you have to be aware of that. And being aware of that requires more than just math skills.
Science is inherently about data, so "data scientist" is redundant
However, data is not entirely about science
Math + pragmaticism + communication
Prefers "Data artist" to data scientist
Fundamentally, the hard part actually isn't the math, the hard part is finding a way to talk about that math. And, the hard part isn't actually gathering the data, the hard part is talking about that data.
The most famous data artist of our time: Nate Silver.
Data artists are the future.
What the world needs is not more R, what the world needs is more artists (Rtists?)

Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data
Foster Provost (NYU | Stern)

[co-author of my favorite book on Data Science]
Agrees with some of the critiques made by previous speaker, but rather likes the term "data scientist"
Shares some quotes from Data Science and its relationship to Big Data and Data-Driven Decision Making
Gartner Hype Cycle 2012 puts "Predictive Analytics" at the far right ("Plateau of Productivity")  
[it's still there in Gartner Hype Cycle 2013, and "Big Data" has inched a bit higher into the "Peak of Inflated Expectations"]
More data isn't necessarily better (if it's from the same source, e.g., sociodemographic data)
More data from different sources may help.
Using fine-grained behavior data, learning curves show continued improvement to massive scale.
1M merchants, 3M data points (? look up paper)
But sociodemographic + pseudo social network data still does not necessarily do better
See Pseudo-Social Network Targeting from Consumer Transaction Data (Martens & Provost)
Seem to be very few case studies where you have really strong best practices with traditional data juxtaposed with strong best practices with another sort of data.
We see similar learning curves with different data sets, characterized by  massive numbers of individual behaviors, each of which probably contains a small amount of information, and the data items are sparse.
See Predictive Modelling with Big Data: Is Bigger Really Better? (Enrique Junque de Fortuny, David Martens & Foster Provost)
Others have published work on on Fraud detection (Fawcett & FP, 1997; Cortes et al, 2001), Social Network-based Marketing (Hill, et al, 2006), Online Display-ad Targeting (FP, Dalessandro, et al., 2009; Perlich, et al., 2013)
Rarely see comparisons

Take home message:
The Golden Age of Data Science is at hand.
Firms with larger data assets may have the opportunity to achieve significant competitive advantage.
Whether bigger is better for predictive modeling depends on:
a) the characteristics of the data (e.g., sparse, fine-grained data on consumer behavior)
b) the capability to model such data

An Excellent Primer on Data Science and Data-Analytic Thinking and Doing

DataScienceForBusiness_coverO'Reilly Media is my primary resource for all things Data Science, and the new O'Reilly book on Data Science for Business by Foster Provost and Tom Fawcett ranks near the top of my list of their relevant assets. The book is designed primarily to help businesspeople understand the fundamental principles of data science, highlighting the processes and tools often used in the craft of mining data to support better business decisions. Among the many gems that resonated with me are the emphasis on the exploratory nature of data science - more akin to research and development than engineering - and the importance of thinking carefully and critically ("data-analytically") about the data, the tools and overall process. 

CRISP-DM_Process_DiagramThe book references and elaborates on the Cross-Industry Standard Process for Data Mining (CRISP-DM) model to highlight the iterative process typically required to converge on a deployable data science solution. The model includes loops within loops to account for the way that critically analyizing data models often reveals additional data preparation steps that are needed to clean or manipulate the data to support the effective use of data mining tools, and how the evaluation of model performance often reveals issues that require additional clarification from the business owners. The authors note that it is not uncommon for the definition of the problem to change in response to what can actually be done with the available data, and that it is often worthwhile to consider investing in acquiring additional data in order to enable better modeling. Valuing data - and data scientists - as important assets is a recurring theme throughout the book.

DataScienceForBusiness_Figure7_2As a practicing data scientist, I find the book's emphasis on the expected value framework - associating costs and benefits with different performance metrics - to be a helpful guide in ensuring that the right questions are being asked, and that the results achieved are relevant to the business problems that motivate most data science projects. And as someone whose practice of data science has recently resumed after a hiatus, I found the book very useful as a refresher on some of the tools and techniques of data analysis and data mining ... and as a reminder of potential pitfalls such as overfitting models to training data, not appropriately taking into account null hypotheses and confidence intervals, and the problem of multiple comparisons. I've been using the Sci-Kit Learn package for machine learning in Python in my recent data modeling work, and some of the questions and issues raised in this book have prompted me to reconsider some of the default parameter values I've been using.

DataScienceForBusiness_Figure8_5The book includes a nice mix of simplified and real-world examples to motivate and clarify many of the common problems and techniques encountered in data science. It also offers appropriately simplified descriptions and equations for the mathematics that underly some of the key concepts and tools of data science, including one of the clearest definitions of Bayes' rule and its application in constructing Naive Bayes classifiers I've seen. The figures (such as the one above) add considerable clarity to the topics covered throughout the book. I particularly like the chapter highlighting the different visualizations - profit curves, lift curves, cumulative response curves and receiver operator characteristic (ROC) curves - that can be used to help compare and effectively communicate the performance of models. [Side note: it was through my discovery of Tom Fawcett's excellent introduction to ROC analysis that I first encountered the Data Science for Business book. In the interest of full disclosure, I should also note that Tom is a friend and former grad school colleague (and fellow homebrewer) from my UMass days].

The penultimate chapter of the book is on Data Science and Business Strategy, in which the authors elaborate on the importance of making strategic investments in data, data scientists and a culture that enables data science and data scientists to thrive. They note the importance of diversity in the data science team, the variance in individual data scientist capabilities - especially with respect to innate creativity, analytical acument, business sense and perseverence - and the tendency toward replicability of successes in solving data science problems, for both individuals and teams. They also emphasize the importance of attracting a critical mass of data scientists - to support, augment and challenge each other - and progressively systematizing and refining various processes as the data science capability of a team (and firm) matures ... two aspects whose value I can personally attest to based on my own re-immersion in a data science team.

Health, science, knowledge, access and elitism: Lawrence Lessig and science as remix culture

Remix-Lessig I have been an admirer and supporter of Lawrence Lessig's crusade for copyright reform and promotion of remix culture for many years. In a recent talk at CERN, Lessig applied his arguments for a fairer interpretation of fair use in the arts world to opening up the architectures for knowledge access in the world of science. The Harvard Law School professor made a compelling case for the ethical obligation of scientists [at least those in academia] to provide universal access to the knowledge they discover, and chastised those who practice exclusivity - those who choose elite-nment over enlightenment - as "wrong".

I intially discovered the talk by following a @BoingBoing tweet to a two-paragraph blog post about Lessig on science, copyright and the moral case for open access, which included an embedded 50-minute video of Lessig's presentation at CERN on 18 April 2011 entitled "The Architecture of Access to Scientific Knowledge: Just How Badly We Have Messed This Up".

I rarely take the time to watch any videos, and having seen many of Lessig's talks about copyright reform - live and online - I was preparing to simply retweet the link, and move on. But having been thoroughly irritated by a personal encounter with barriers to knowledge access during the [free] webcast from the otherwise enlightening and engaging Behavioral Informatics for Health event earlier this week, I was motivated to see and hear what Lessig had to show and tell. I was excited to discover that Lessig's talk was far more relevant to health and medicine - and the kind of universal access to crucial information that might help those outside of elite schools and hospitals better achieve positive health outcomes - than I initially anticipated.

Ajpm_journal Before sharing some of Lessig's insights and observations, I want to share the source of my personal irritation in encountering preventative measures erected to limit access to one of the two journals being showcased at the behavioral informatics event, a special issue on Cyberinfrastructure for Consumer Health from the American Journal of Preventative Medicine. When I investigated options for accessing some of the interesting articles being mentioned during the event, I discovered that


AJPM pricing options for individuals include a 12-month subscription to the journal for $277, or the purchase of individual articles for $31.50 each. The special issue being showcased at the event included 27 articles, which translates into a total cost of $850 for purchasing this one issue of the journal, whose mission is "the promotion of individual and community health".

Tbm_journal In contast, all the articles from the inaugural issue of the other journal being showcased at the event, Translational Behavioral Medicine, are freely available online, a policy much more in alignment with its mission:

TBM is an international peer-reviewed journal that offers continuous, online-first publication. TBM's mission is to engage, inform, and catalyze dialogue between the research, practice, and policy communities about behavioral medicine. We aim to bring actionable science to practitioners and to prompt debate on policy issues that surround implementing the evidence. TBM's vision is to lead the translation of behavioral science findings to improve patient and population outcomes.

I hope to post another blog entry with some notes from the behavioral informatics event, but in this post, I want to continue on with some of Lessig's commentary about science, knowledge, access and elitism. I'll embed a copy of the video below, follow it with some notes and partial transcriptions I made while watching, and finish off with a brief riff on science as a remix culture.

The Architecture of Access to Scientific Knowledge from lessig on Vimeo.

Lessig begins by talking about two motivations for his talk. The first is the late Supreme Court Justice Byron White, who was considered a liberal when appointed to the court by President John Kennedy in 1962, but became progressively more conservative, as evidenced in his authoring of the majority opinion in the 1986 case of Bowers v Hardwick, which upheld the criminalization of sodomy laws, and included the following statement:

Against this background, to claim that a right to engage in such conduct [sodomy] is "deeply rooted in this Nation's history and tradition" or "implicit in the concept of ordered liberty" is, at best, facetious.

Lessig calls this the White effect:

To be liberal / progressive is always relative to a moment, and that moment changes, and too many are liberal / progressive no more.

HarvardGazette_021111_Gita_019_605 The second, more recent, motivation was a Harvard Gazette article about Gita Gopinath, a macro-economist at Harvard who was born in India. After mentioning that Gopinath, a tenured professor, would like to have more time to read books that are not textbooks, the article concluded with the following sentence:

Still, the shelves in her new office are nearly bare, since, said Gopinath, “Everything I need is on the Internet now.”

Lessig notes:

If you're a member of the knowledge elite, then you have effectively free access to all of this information, but if you're from the rest of the world, not so much.

He goes on to observe:

The thing to recognize is that we built this world, we built this architecture for access. This flows from the deployment of copyright, but here, copyright to benefit publishers, not to enable authors. Not one of these authors gets money from copyright, not one of them wants the distribution of their articles limited, not one of them has a business model that turns upon restricting access to their work, not one of them should support this system.

As a knowledge policy, for the creators of this knowledge, this is crazy.

Lessig tells the story of his third daughter, who was diagnosed with jaundice shortly after her birth, and the concern he felt when the doctor expressed unexpected concern about possible complications. Due to his status as a Harvard professor, he had institutional access to many relevant articles in medical journals. When he calculated the cost for purchasing the 20 articles he tracked down, it would have cost $435 for someone who did not enjoy his level of elite status.

AAFP Even those journals which granted free access sometimes engaged in regulating access to parts of articles. For example, a February 2002 article on "Hyperbilirubinemia in the Term Newborn" in American Family Physician was available for free ... except for a crucial missing chart:

Management of Hyperbilirubinemia in Healthy Term Newborns

The rightsholder did not grant rights to reproduce this item in electronic media. For the missing item, see the original print version of this publication.

Rather than architecting systems to maximize access to knowledge, Lessig suggests that "we are architecting access to maximize revenue" He also shares a chart from An Open Letter to All University Presidents and Provosts Concerning Increasingly Expensive Journals by Theodore Bergstrom & R. Preston McAfee on Journal Prices by Publisher and Discipline Type that shows the cost-per-page of purchasing articles from for-profit journals was 5 times higher, on average, than the cost in not-for-profit journals, leading him to wonder whether academia is creating it's own RIAA:

Really Important Academic Archive: RIAA for the Academy?

Sciencecommons Lessig is co-founder of the Science Commons, a translation of the Creative Commons license to promote open access in the scientific community, with four key principles:

  1. Open access to literature
  2. Access to research tools
  3. Data should be in the public domain
  4. Open cyberinfrastructure

PLoS_logo Lessig championed the Public Library of Science (PLoS) as an exemplar of these principles. Personally, I am very excited about the PLoS publication of a landmark study this week on Sharing Data for Public Health Research by Members of an International Online Diabetes Social Network, by Weitzman, et al., based on data from the TuDiabetes online community, and another recent study by Wicks, et al., based on PatientsLikeMe community members with amyotrophic lateral sclerosis (ALS) published - and freely available - in the journal Nature Biotechnology, Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm.

Having recently read a critique of Science 2.0, cataloging the shortcomings and/or failures of several traditional for-profit publishers to effectively capitalize on the Web 2.0 platform, it is encouraging to see some promising progress in sharing knowledge about chronic conditions in the not-for-profit world.

Lessig proceeds to review some of the issues surrounding the use - and misuse - of copyright in the arts, but I have already written about many of his arguments and examples from that domain in my notes from his keynote at the 2009 Seattle Green Festival. I'll simply note that in viewing his examples in this context, I was struck by the revelation that on a very basic level,

science is a remix culture

Traditionally, much of science has been the exclusive domain of professional scientists, who typically go to great lengths to cite prior work that is related to the experiments and results they report in peer-reviewed publications (indeed, some of the peers reviewing work submitted for publication are among those who are - or [feel they] should be - cited). With the rare exceptions of paradigm shifts, most of science is incremental in nature, and each increment represents a remix with a few added ingredients.

Vrm_header_stacked187 There are several promising signs that people without PhDs, MDs and other "terminal" credentials can participate more fully in the scientific discovery and dissemination process. I enumerated several of these efforts in an earlier post on platform thinking, but in the context of health and medicine - and Harvard - I do want to mention Doc Searls' recent post on Patient-driven health care in which he expands the idea of the patient as a platform and mentions efforts by , Jon Lebkowsky and to promote a vendor relationship management (VRM) model in which patients - and the data about their conditions - will be better able to participate in peer-to-peer collaborations with health care and health information technology professionals.

Lessig laments the current system in which authors - and peer reviewers - of scientific publications do much of the work for free, while for-profit publishers derive nearly of the financial benefits, and do so through restricting access to the knowledge produced by the authors. Given that much of the data used in the experiments reported in professional medical publications comes from patients (the PatientsLikeMe and TuDiabetes studies being particularly notable examples), it makes all the more sense to make the results of these experiments available to all patients ... and at some point, we all are - or will be - patients who might benefit from universal access to this knowledge.

The Power of Pull: Institutions as Platforms for Promoting Individual Passions

PowerOfPull There are a number of interesting and provocative ideas in The Power of Pull, by John Hagel III, John Seely Brown and Lang Davison. I've already tweeted about a number of articles by the authors - based on their book - that highlight the importance of physical places, the ways we can shape serendipity and the essence of leadership as connecting people with similar and complementary passions. Here I will focus primarily on what I see as one of the most radical ideas in the book:

Rather than molding individuals to fit the needs of the institution, institutions will be shaped to provide platforms to help individuals achieve their full potential by connecting with others and better addressing challenging performance need ... Rather than individuals serving the needs of institutions, our institutions will be crafted to serve the needs of individuals.

When I first read this passage (on page 8), I was excited about encountering another example of platform thinking. However, I also thought that it was extremely idealistic, and even though I tend to be extremely idealistic, I was very skeptical about applying this idea to the business world ... or at least aboiut its prospects for realization.

LifeInc-cover_small Having also recently read Douglas Rushkoff's book, Life Incorporated: How the World Became a Corporation and How to Take It Back, I came to understand the history of the corporation as an entity designed for extraction, exploitation and externalization, existing for the primary benefit of shareholders who often have no stake in the actual work done by the corporation or its employees. I believe many corporations today - as well as many other organizations (and individuals) that Rushkoff argues have adopted a corporatist perspective - exhibit these behaviors, but as many investment offerings warn: past performance is not a reliable indicator of future performance.

In The Power of Pull, these engines of extraction are described as "push" organizations, with centralized decision-makers utilizing top-down approaches to forecast demand and supply passive consumers with products and services. Employees of such organizations are treated as standardized parts of a predictable machine, who suppress their intrinsic creative instincts in return for extrinsic rewards, resulting in a "curious combination of boredom and stress".

However, the authors argue that a Big Shift is underway, where knowledge and power is devolving from large, centralized and stable "stocks" toward smaller, decentralized and uncertain "flows". This shift is being propelled, in part, by technology, and is increasingly disrupting economics and politics and the traditional institutions that participate in these domains. Organizations that have succeeded through achieving scalable efficiency will increasingly need to promote more scalable learning, which will call for a new set of perspectives and practices.

Many of these perspectives and practices will flourish along the edges rather than at the core of organizations: 

Edges are places that become fertile ground for innovation because they spawn significant new unmet needs and unexploited capabilities and attract people who are risk takers. Edges therefore become significant drivers of knowledge creation and economic growth, challenging and ultimately transforming traditional arrangements and approaches.

This shift of focus - and prospects for value creation - from the core to the edge will require new approaches:

Rather than trying to pull the edges into the core, as many management pundits recommend, the key institutional challenge will be to develop mechanisms to pull the core out to the most promising edges.

BeyondTheEdge And the best way to pull the core of an organization toward its edges is to more fully draw the core potential within individuals to the surface(s), which can only be done by tapping into their passions and creating a trusting environment in which they are continually willing to stretch themselves toward the edge - or, ideally, beyond the edge.

To build this level of trust, we must begin the process of reintegrating ourselves, and often, in the process, rediscovering ourselves, so that we can present ourselves more fully and authentically to others around us. ... It requires us to get in touch with ourselves, to relearn how to be, in order to more effectively become.

The authors conclude with a compelling vision for integrating the personal with the professional: as institutions evolve to provide "platforms individuals to amplify the power of pull", we will have "the ability to shape a world that encourages and celebrates our efforts to become who we were meant to be". As I said at the outset, this is an incredibly idealistic perspective, but having finished the book, I'm more willing to believe in the prospect of its realization ... or as David Whyte might put it, its incarnation.

DavidWhyte-RiverFlow-cover The poetry of David Whyte - who also often writes about flows - came to mind at several times during my reading of this book, and this passage at the end reminded me of one of my favorite poems, Working Together, which he wrote to commemorate the presentation of The Collier Trophy to The Boeing Company marking the introduction of the the advanced 777 widebody twinjet. I'm not sure where Boeing stands on the push vs. pull spectrum, but their willingness to hire David Whyte, who was described by former CEO Phil Condit as "a storyteller, someone from outside our system saying that there are other ways of looking at the way we do things" - very much in the spirit of The Power of Pull - leads me to suspect that they may be more open to transformation than some other large institutions. In any case, the poem seems like an appropriate ending for this post.

We shape our self
to fit this world
and by the world
are shaped again.

The visible
and the invisible
working together
in common cause,
to produce
the miraculous.

I am thinking of the way
the intangible air
passed at speed
round a shaped wing
holds our weight.

So may we, in this life
to those elements
we have yet to see
or imagine,
and look for the true
shape of our own self
by forming it well
to the great
intangibles about us.

Remembering Community: Fixing the Future via Community Currency at Hour Exchange Portland

Community is more like something that we're remembering than something that we're creating all over again.

I was inspired by physical therapist and sailor Stephen Becket's words at the end of a segment of David Brancaccio's upcoming special edition of PBS Now, Fixing the Future, shown on tonight's PBS Newshour. I'd forgotten how much I enjoyed the series, and how disappointed I was when Now and Bill Moyer's Journal were cancelled. I was grateful to have another glimpse, and look forward to watching the full segment online this weekend (our local PBS affiliate, KCTS, does not appear to be carrying the show).

image from The Newshour segment profiles Hour Exchange Portland, where members of the community contribute and receive services in an exchange that lies entirely outside the traditional financial / banking industry:

We believe in people.

We believe everyone has knowledge and skills that someone in the community can use. We help people find what they need and give what they can. We are neighbors helping neighbors help themselves. We are a community service exchange.

We believe no one is more valuable than you, and neither is their time more valuable. At Hour Exchange Portland everyone's time is equal, an hour for an hour. If you give an hour of your time helping someone, providing a service, then you can receive an hour of someone else's time who provides a service you need. Time is what our members exchange. We are a community currency based on time. We believe all people are created equal, and so is our time. Our time is priceless.

I won't say too much more about the segment, but will include a another one of my favorite excerpts - and embed the 7-minute video - below. The entire hour-long version of Fixing the Future can be found online or seen on many PBS stations this week (at least, outside of Seattle).

DAVID BRANCACCIO: Are you connecting with other people? Are you meeting other people through this?

JENNIFER LUNDEN: This is like the new kind of community. In this country, we have lost a lot of the sense of community, and people are so focused on just surviving economically or doing better than their neighbors economically. We're so focused on stuff, that we have completely lost our sense of community. And Hour Exchange is a way that I have a built-in community. There are about 600 members that I can go to and ask for help.


STEPHEN BECKETT: We just have this arbitrary economic system that we all have -- you know, have grown up in and believe in and contribute to and work in. If it's not working anymore, then let's do something different. I think the seeds already are planted and sprouted and well on their way.

Indeed, a crisis is a terrible thing to waste.

Preemptive Foreclosure: Problems with GoDaddy Domain Name Registration Service

I am frustrated with preemptive actions taken by GoDaddy this week, effectively impounding my web site without adequate notification for at least two days. The web site has since been released, but I wanted to share my experience in case it helps others make better informed choices about domain name registration services ... and to release some of my frustration. I would be very interested in any recommendations for other reasonably priced services that offer better customer service.

My domain name,, is currently registered with, and was set to auto-renew on October 25. Unfortunately, my Visa credit card - which was the card on file at GoDaddy for auto-renewals - was hacked last week, and the old card number was cancelled on October 23. On October 26, I received the following email from GoDaddy, with the subject "Product Failed Billing Notification":

Dear Joe McCarthy,
Customer Number: ******

According to the terms of our agreement(s), we tried to bill your Visa card ending in the last two digits 89 in the amount of $41.96 for the item(s) below, but our billing attempt failed. This could be for a variety of reasons, including an invalid or expired credit card on file.


Product Name Next Billing Date  Qty
.COM Bulk Domain Name Renewal (1-5) (recurring) 10/30/2010 1   $11.62
.NET Bulk Domain Name Renewal (1-5) (recurring) 10/30/2010 1   $15.17
.ORG Bulk Domain Name Renewal (1-5) (recurring) 10/30/2010 1   $15.17


If an item has already expired, it is noted above as "CANCELLED" and can no longer be renewed. PLEASE NOTE: Once an item has been cancelled, all related data – Web site files, emails, databases, etc. – is removed from our server and cannot be recovered.

If there is a date in the "Next Billing Date" column, we will hold your item(s) and attempt to bill again on the date shown, OR you can renew now and qualify for bonus savings.

[instructions / link for online renewal omitted]
Thanks as always for being a Go Daddy customer.

Sincerely,, Inc.

In reading this email, I didn't interpret "hold your item(s)" as "impound your item(s)". I interpreted it as "we will hold your item(s) - rather than try to resell your items". I figured that GoDaddy would simply try to bill the card again on October 30, by which time I hoped to have a new Visa card number that I could substitute for the old number in my account information.

I was surprised when I was contacted by someone today who told me that my web site appeared to be for sale:


Since I have not yet received my new Visa card, I immediately logged into my account to enter a different card number. I then called GoDaddy customer service to (a) determine what notification I missed or misinterpreted that should have informed me of the impending impoundment, and (b) find out how long it would take for the content on my web site to be released / reinstated.

Foreclosure_sign3 During my phone call with the GoDaddy customer service representative, we reviewed two preceding email messages I had received (on September 25 and October 20), notifying me of the upcoming auto-renewal for the domain, and we reviewed the message I included above. The only signal of impending impoundment the representative could point me to was the use of "hold" I alluded to above, i.e., "hold" = "impound" rather than "hold" = " will not resell" ... and I don't know how accurate the "will not resell" interpretation was, either. The representative never offered an apology, repeatedly referring me to their "terms of service", but did try to offer some sympathy using the analogy of a wireless carrier suspending service for a mobile phone. I responded that the public nature of my web site having been "parked" (as shown in the screenshot above) seemed more like a bank posting a foreclosure sign out in front of my house. I'm glad the most recent email message from GoDaddy only thanked me for being "a customer", vs. the thanks I often receive from other companies for being "a valued customer", as I don't feel that my business is valued by GoDaddy.

image from Ironically, I do feel like a valued customer at Optify, a Seattle-based real-time marketing service. It was Jennifer, a customer service representative at Optify, who called me to let me know about the inaccessibility of my web site content (and/or availability of my domain name) - and more generally to see if I had any questions about Optify. I say this is ironic because I signed up for the free trial of Optify on Monday, shortly after reading an article about them in TechCrunch, but I was using my Interrelativity web site primarily to experiment with and learn more about the service, more out of curiosity than with any serious marketing intent. That said, the prompt and helpful followup by Jennifer has proven unexpectedly valuable, and exemplifies a strong customer orientation and valuation.

Interrelativity-LogoNameMantra-320x90 Fortunately, after getting off the phone with Jennifer and updating my credit card information at GoDaddy, the content of my web site was reinstated and available again within an hour or so. While I no longer use for any direct commercial purposes - as I liked to say when I joined Nokia back in 2006, the business of Interrelativity failed, but the dream lives on - my Interrelativity homepage continues to serve as the central online hub for my professional activities, including links to my projects and papers  as well as my accounts on a variety of social media services (such as Twitter, SlideShare and this blog). Given that I am currently in another career transition, exploring new professional opportunities [self-promotional link to my resume / CV (PDF)], the preemption of my web content for 2 days while I am sending links to my homepage to prospective employers may have cost me valuable "impressions".

And, as I noted above, I am also exploring a transition to a new domain name registration and hosting service, so I would welcome any recommendations from others who have enjoyed better customer service from other companies.

Empowered: More Platform Thinking, De-Bureaucratization and Redistribution of Agency

Empowered-book The new book, Empowered, by Josh Bernoff and Ted Schadler of Forrester Research, proclaims an inspiring message: social media is increasingly empowering customers to draw attention to their problems, and the best way for businesses to provide effective solutions is to empower their employees with the same tools. The book makes a strong case for universal employee empowerment by including numerous case studies of companies that have benefited from successfully empowering their employees, as well as a few cases where companies suffered as a result of bureaucratic encumbrances. The main quibble I have with the book is the use of what I consider to be questionable quantitative data, but I don't see that data as essential to the empowering message or case studies presented.

The book describes four technology trends - the proliferation of smart mobile devices, pervasive video, cloud computing services and social technology - and presents a number of case studies about how people are taking advantage of these trends to achieve their goals, sometimes to the detriment of institutions that are not yet taking advantage of them. The authors argue that employee empowerment is more of a management challenge than a technical challenge at this stage, and they effectively highlight the ways that proactive employees - called HEROes (Highly Empowered and Resourceful Operatives) - can use the same tools that empower customers to respond more effectively to their needs. I see many similarities between HEROes and the e-Patients ("engaged, empowered, equipped and expert") I first discovered via Regina Holliday, "e-Patient Dave" deBronkart, Susannah Fox and other Health 2.0 heroes who are advocating platform thinking, de-bureaucratization and the redistribution of agency. [Update: just saw a tweet by @ReginaHolliday to another new book, The Empowered Patient, by Julia Hallisy, suggesting even more convergence - and momentum - in this area.] At the risk of adding the ubiquitous version number to yet another class of agency, I found myself thinking that perhaps we're also entering the era of Employee 2.0.

ItSuckedAndThenICried The book starts off with a case study involving Heather Armstrong, a mommyblogger and author with over a million followers on her Twitter account (@dooce), who experiences a series of mechanical and customer service problems with her new Maytag washing machine during the first few months after her second child was born. In a blog post containing a capital letter or two, capturing the series of problems and failed solutions, she writes about an exchange with an unempowered customer service representative:

And here's where I say, do you know what Twitter is? Because I have over a million followers on Twitter. If I say something about my terrible experience on Twitter do you think someone will help me? And she says in the most condescending tone and hiss ever uttered, "Yes, I know what Twitter is. And no, that will not matter."

I read this and immediately experienced a visceral "Uh, oh..." moment, sort of like watching a horror movie where the naive victim-to-be is about to open a door you just know they shouldn't. As anticipated, she then proceeds to share her frustrations with Maytag with her Twitter followers in a series of status updates. It is difficult to directly measure the long-term influence of this negative publicity, but I would imagine that many of Heather Armstrong's followers were / are young mothers with significant laundering needs who might also be in the market for a washing machine, and would be considerably less likely to purchase a Maytag after reading about her experiences.

Twelpforce This experience is contrasted with that of Josh Korin (@joshkorin), a recruiter with a more modest Twitter following (596) at the time of a suboptimal experience with an Apple iPhone purchased at BestBuy. Like Heather, Josh tweeted about his frustrations with customer service - they initially offered to replace his new iPhone with a Blackberry, even though he'd purchased the insurance plan. However, BestBuy had an empowered TwelpForce in place that monitors and responds pomptly to customer service problems expressed in social media streams (e.g., tweets addressed to @bestbuy or with the #bestbuy or #twelpforce hashtag). Even though Josh posted these messages on a Saturday, he promptly received responses from BestBuy CMO Barry Judge (@bestbuycmo) and empowered "community connector" Coral Biegler (@coral_bestbuy), and an iPhone replacement was arranged that Sunday, transforming a disgruntled customer into an advocate.

The second part of the book explores another acronymized set of concepts, IDEA: Identify mass influencers, Deliver groundswell customer service, Empower customers with mobile information, and Amplify the voice of your fans. I like the ideas [pun partially intended] in this section, and found the additional case studies presented interesting and compelling. However, this is where I encountered questionable data on peer influence metrics, which is based on Forrester's North American Technographics Empowerment Online Survey, Q4 2009 (US). The normal biases that arise in self-reporting (people generally tend to present themselves and their actions in a favorable light) are compounded when one is asking people - in an online survey - about how much online influence they have. I would expect natural "inflationary pressures" would lead respondents to overestimate the number of friends and followers they have, the frequency with which they post social media messages (e.g., Facebook or Twitter status updates) and the percentage of those messages that are about products and services.

To their credit, Forrester provides disclaimers on its web page for the survey, which very carefully highlight the sources of sample bias:

Please note that this was an online survey. Respondents who participate in online surveys have in general more experience with the Internet and feel more comfortable transacting online. The data is weighted to be representative for the total online population on the weighting targets mentioned, but this sample bias may produce results that differ from Forrester’s offline benchmark survey. The sample was drawn from members of MarketTools’ online panel, and respondents were motivated by receiving points that could be redeemed for a reward. The sample provided by MarketTools is not a random sample.

TheTippingPoint-cover Taking a cue from Malcolm Gladwell's 2000 book, The Tipping Point, the potentially biased survey data is used primarily to establish categories of Mass Connectors - the 6.2% of online users who generate 80% of the online impressions (status updates) across social media streams, each clocking in with an average of 537 followers and making an estimated 18,600 impressions per year - and Mass Mavens - the 13.4% of online users who generate 80% of the online posts (blog posts, blog comments, discussion forum posts, and product reviews), clocking in with 54 product or service-related posts per year (vs. the overall average of 6 per year).

Now, just to be clear, as someone who ardently believes that all studies and models are wrong [including my own], but some are useful, I believe that these are useful categories, and while I might question the actual numbers, I do believe that some people are more influential - as mavens and/or connectors - than others. However, I think it's important to note that there are significant questions about the extent of influence mavens and connectors have. For example, Clive Thompson's Fast Company article, Is the Tipping Point Toast?, contrasts Gladwell's focus on an elite few with Duncan Watts' more expansive idea of the connected many with respect to the sources of real influence in society. And given more recent views expressed by Gladwell this week in a New Yorker article on Twitter, Facebook and social activism: Why The Revolution Will Not Be Tweeted, I suspect he may have reservations about his categories of influentials being mapped onto social media at all.

Slack-getting-past-burnout-busywork-and-the-myth-of-total-efficiency The reason I delve so deeply into this issue is that I actually believe that the influence of the connected many is better aligned with the overall message of Empowered than the elite few, and that the authors do themselves - and their message - a disservice via this detour in an otherwise engaging and enlightening book. They talk of efficiency in many places where I think they - and their readers (and clients) - would be best served by focusing on effectiveness (as Tom DeMarco effectively focuses on in his book, Slack). Should HEROes only focus on addressing their efforts toward the Mass Mavens and/or Mass Connectors? That would be efficient, I suppose, but would probably not be very effective.

As an example, another compelling case study described in Empowered is the experience of Dave Carroll, a "not-very-well-known local musician" from Halifax, Nova Scotia, whose guitar was allegedly broken by United Airlines baggage handlers at Chicago O'Hare International Airport on March 31, 2008. Dave responded by recording and posting a trilogy of songs, United Breaks Guitars, on YouTube (the first one, which now has over 9 million views, is embedded below).

As far as I can tell, Dave Carroll - while certainly talented - was probably not very influential at the time he recorded that music video, and if United customer service HEROes (if they exist[ed]) were to focus their efforts primarily on Mass Mavens or Mass Connectors, the empowered response by Dave Carroll may have still slipped under their radar. And yet his video turned out to be very influential: according to the authors, Sysomos estimates that positive sentiment for United Airlines in the blogosphere decreased from 34% to 28% and negative sentiment increased from 22% to 25%, while the proportion of positive stories about United in traditional media went from 39% to 27% with negative stories rising from 18% to 23%. [I recommend Dan Greenfield's analysis of the the social media impact of the United Breaks Guitars video at SocialMediaToday for anyone interested in more details.]

I've written before about how everyone's a customer. I think the central message of Empowered is - or should be - every customer matters.

In another inspiring case study - and this is the last one I'll share here - Kira Wampler, former online engagement leader for the small business division of Intuit (maker of QuickBooks) and now a principal at Ants Eye View, said that her primary customer service goal at Intuit was not to deflect as many calls as possible, but "how do I get you unstuck as quickly as possible?" This reflects a wisdom so clearly articulated in Kathy Sierra's Creating Passionate Users blog, e.g., her post on keeping users engaged, in which she so pithily promotes an empowerment strategy: Give users a way to kick ass.

As Josh Bernoff and Ted Schadler convincingly show, customers have never been so empowered to "kick ass" as they are now. I hope that more businesses will follow their prescriptions to "unleash your employees, energize your customers and transform your business" ... or, as Kathy Sierra might put it, Give employees a way to kick ass!

Stuff-Centered Sociality: Commerce, Conversations and Conservation at a Garage Sale

A box of unwanted stuffed animals Items on display at a garage sale provide myriad conversation contexts for buyers and sellers alike. Amy and I host or participate in a garage sale (or a tag sale as they're known back east) every few years, but it wasn't until this past weekend that I was struck by the way these events - and the stuff being bought and sold at them - offer rich opportunities for telling and hearing the stories of our lives.

Among the stories we heard from buyers who stopped by our garage sale were

  • a Russian woman who searches garage sales for winter coats she sends back to family and friends back home (she bought an old coat of Amy's)
  • a recently retired man who seeks out toys and games to entertain the children of his ungrateful children (i.e., his grandchildren) when they come visit
  • a nanny who visits garage sales with the children she watches in order to have some contact with adults (Amy gave the kids who accompanied her a few stuffed animals)
  • an Apache man who had made a pact with his now deceased mother not to attend her funeral so that her image - and spirit - would always be alive with her son (I'm not sure if he bought anything)

Evan_Hospital_Bear But the garage sale didn't just evoke stories from buyers, the sellers also had their stories to tell, e.g.,

  • a stuffed bear elicited a story about how it was given to our young son over 5 years ago by a paramedic during his ride in an ambulance (his parents had perceived the bear with significantly more sentimentality than he does at this point)
  • a lined flannel shirt I used to wear when we lived in colder climates evoked a few stories about how harsh those winters could be (and often were)
  • children's books prompted exchanges of stories by both sellers and buyers for whom the books had been meaningful in someone's childhood (the prospective buyers projecting an anticipated layer of meaning for their children)

And the stories were not only exchanged between buyers and sellers. I overheard several of the buyers - also known as neighbors - exchanging stories that were prompted by items on display among themselves, and as a multi-family garage sale, several of the sellers also shared stories about their own and each others' stuff: places we've lived, travels we've taken, hobbies we've engaged in ... and disengaged from.

I've been reading, thinking and writing about various manifestations of object-centered sociality for several years now. In a recent post on a thematic variation I call place-centered sociality, I highlighted a distinction between two dimensions of the idea of object-centered sociality: socializing about objects - e.g., talking about places and things through which we share some kind of connection - and socializing with objects - e.g., having a deep relationship with the place or thing itself. In the context of the garage sale, I heard stories representing both forms of object-centered sociality: stories in which the object for sale provided a pretext for a story in which the object was somewhat peripheral, and others in which the object was the centerpiece of the story. A garage sale also reflects the wisdom expressed in The Cluetrain Manifesto: markets are conversations:

For thousands of years, we knew exactly what markets were: conversations between people who sought out others who shared the same interests. Buyers had as much to say as sellers. They spoke directly to each other without the filter of media, the artifice of positioning statements, the arrogance of advertising, or the shading of public relations.

These were the kinds of conversations people have been having since they started to talk. Social. Based on intersecting interests. Open to many resolutions. Essentially unpredictable. Spoken from the center of the self. "Markets were conversations" doesn’t mean "markets were noisy." It means markets were places where people met to see and talk about each other’s work.

The Cluetrain authors focus more on marketing - and its transformation via the empowerment of previously passive recipients of broadcast messages - than markets, per se, although they do talk about as an example of a "virtual flea market". I've had very limited experience with eBay, but my impression is that most socializing there is more about the transactions rather than the stories surrounding the objects being bought and sold.

TheStoriesWeLiveBy At several points throughout the two days, I was reminded of Dan McAdams' insightful book, The Stories We Live By, and his thesis that it is the stories we make up about ourselves that gives our lives meaning, unity and purpose. We start composing these stories in late adolescence - certainly by the time we are signing each others' high school senior yearbooks - and continue adding to and/or revising the narrative throughout much of our lives. Different life stages tend to provoke different editorial strategies, and the various characters we include in the unfolding story personify our basic human needs for power and love, and the things we buy and sell constitute props in the story. McAdams notes that "the good life story is one of the most important gifts we can ever offer each other" ... and these gifts were being exchanged freely by all parties throughout the garage sale. I was tempted to title this post "The Stories We Sell By" or "Sales-Centered Sociality" ... but focusing on stuff seemed more appropriate.

I imagine that this partly due to a priming effect: I recently listened to a KUOW Speakers Forum presentation by Annie Leonard, author of The Story of Stuff - a critique of our materials economy - and I think she would approve of garage sales ... aside from the fact that garages house cars which promote driving which contributes to suburban sprawl and other modern practices that degrade the natural environment. On the plus side, garage sales reduce the extraction, production, distribution, consumption and disposal inherent in our acquisition of new stuff by enabling new people to reuse old stuff, overcoming perceived obsolescence by transferring old stuff to new people at a different stage of life for whom the stuff will be perceived as fresh and useful. But I think the main contribution is that the stories and conversations that arise around the the reselling of old stuff are much more likely to contribute to our happiness than the kinds of social benefits that typically surround transactions involving new stuff.


[In checking the The Story of Stuff blog just now, I see that Saturday, May 15, was Give Your Stuff Away Day. As I alluded to above, Amy ended up giving several items away during the two-day sale, and we will be giving the rest of the stuff away.]

Serendipity Platforms, Unintended Consequences and Explosive Positivity at Web 2.0 Expo

webexsf2010 The keynotes on Day 1 of the Web 2.0 Expo in San Francisco exposed a number of common threads, perhaps best summarized by a quote attributed to Tim O'Reilly by conference co-chair Sarah Milstein:

We're trying to maximize the surface area of serendipity.

The official theme of the event is "Platforms for Growth", and all of the keynotes so far have included observations and insights into the kind of platform thinking and its often unintended - and primarily positive - consequences that Tim and various friends of O'Reilly have been espousing and practicing for some time now.

I'm not actually at the conference but have been following it remotely through the web20tv LiveStream. However, I've been taking lots of notes, and will condense them into a coherent collection below, augmented by links and other embeddable goodies when I can find them. I'll link each of the speaker's names to their profile on the Web 2.0 Expo site, which has - or will have - slides and videos of their talks.

image from Ben Huh, CEO of the Cheezburger Network, spoke about how providing platforms for people to promulgate humor - such as LOLcats, FAIL blog and There, I Fixed It - has resulted in 19,000 submissions per day, 15 million views per month and an immeasurable impact on mutual inspiration and the wealth of networks. He contrasted the construction of Internet culture - a bottom-up process involving hackers, software, subversiveness and co-created and occasionally co-opted meaning - with popular culture - a top-down process that has brought us sitcoms, evening news and Geraldo Rivera. Perhaps due to the humorous nature of the content shared on Ben's platforms, or the occasional dropouts in the live webcast, at one point I wasn't quite sure if he was extolling the virtues of cloud computing or clown computing ... and this prompted additional musing during his talk about other approximate anagrammatical homophones such as subversiveness vs. subservience, and conservative vs. conversative, as well as Victor Borge's quote "Laughter is the shortest distance between two people."

FuseLabs-logo Lili Cheng, General Manager of Microsoft's Future Social Experiences (FUSE) Labs, demonstrated three new web applications created by her team as they confront the challenges of timeliness, unpredictable growth, experimental systems and quality data. Bing Twitter, announced at the Web 2.0 Summit in October, allows users to track trending topics and search for status updates on Twitter., announced at the f8 Facebook Developers Conference two weeks ago, allows Facebook users to share Microsoft Office documents on the web. The newest and most interesting app is Spindex, which Lili acknowledged was a "nerd name" for social personal index, and allows tracking of trending and subtrending topics that are popular among one's friends (her demo tracked trends of a mutual friend, Marc Smith, but I couldn't make out the actual content). Spindex and Docs are both in early beta / invitation-only mode, and Spindex currently requires a Windows Live ID for authentication (Docs requires Facebook authentication). I had previewed her Web 2.0 Expo Powerpoint slides earlier in the day, as they had shown up in her Facebook news feed, and I thought I'd beat the rush by submitting a request for an invitation. Unfortunately, I haven't received an invite code for Spindex or (for which I requested an invite the day two weeks ago), but I'm hoping to try them out soon.

Paul Buccheit, founder of FriendFeed, which was acquired by Facebook, was interviewed by Sarah Milstein, during which he described the recently announced Open Graph protocol as an attempt to simplify the development and use of Facebook applications. Paul championed the widespread provision of lightweight, spontaneous interaction gestures such as liking and quick and easy comments as a way to promote conversations and connections across the web. When asked by Sarah about whether the proliferation of such mechanisms would promote a larger number of ultimately shallower connections, Paul responded that they provide more context for future, deeper conversations to unfold. When asked who he admires, Paul responded that he just follows random links, noting he had just enjoyed reading a blog post by an author he hadn't heard of in a browser tab that he'd opened two weeks ago (reflecting one of my common practices).

Ted-logo June Cohen, Executive Producer of TED, shared the philosophy and practices of radical openness adopted by the organizers of the TED conference series. Despite their concerns about their conscious evolution from conference to media company to platform, their steadfast commitment to their core mission - "ideas worth spreading" - enabled them to progressively provide a model "platform for growth", and she said that all of the unintended consequences have been explosively positive. Noting that taped lectures are not an obvious source of viral content, and providing content for free is not an obvious business model, she reported that the first year they provided their TED talks online (2007?), they increased the ticket prices by 50% (to $6,000), and they still sold out within a week. Although production of high quality videos (and conferences) is expensive, they have found that whenever you have people who are passionate about what they are doing, you can find a sponsor who wants to reach that audience.

A year ago, TED launched the TED Open Translation Project, in which 4,000 volunteers have translated 9,000 videos into 77 languages, and the translated text words / symbols are linked to the segments in the video in which they were originally spoken. More recently, they have licensed the TED brand free of charge to organizers of independent TED conferences; the TEDx series has included 1000 events in 70 countries and 35 languages attended by 50,000 people ... and the original TED organizers are learning from the experimentation carried out by the independent organizers ... demonstrating that a global platform creates a global team. And just today, they announced the TED Open TV Project, with 20 global partners who have agreed to show TED talks without interruptions or commercials. The TED strategic plan: listen - and respond - to what people want.

Buildmeasurelearn Eric Ries, Venture Advisor and evangelist of the Lean Startup, said that we need to stop wasting people's time building products that no one wants and learn how to pivot: building, measuring and learning, being willing to change directions as we learn from customers, and iterating through these three stages as quickly as we can. Arguing that if we really believe the world needs to change in a fundamental way - and many entrepreneurs are driven by some variant of this motivation - we can't afford to rely on faith-based initiatives, i.e., we cannot rely solely on our own intuitions, but rather rigorously validate our "solutions" with real customers as early - and often - as possible. Warning against achieving failure - successfully executing a plan that leads you over a cliff - he emphasized the importance of articulating a compelling vision and building a sustainable organization to support the new product or service in the face of extreme uncertainty, the "soil in which all entrepreneurs live".

I found much of Eric's talk compelling, and yet I also found much of Don Norman's recent arguments about Technology First, Needs Last: The Research-Product Gulf to be even more compelling:

Design research is great when it comes to improving existing product categories, but essentially useless when it comes to breakthroughs [e.g., flush toilets, indoor plumbing, electric lighting, automobiles, airplanes, or modern telecommunication]. ... New conceptual breakthroughs are invariably driven by the development of new technologies. The new technologies, in turn, inspire technologists to invent things. Not sometimes because they themselves dream of having their capabilities, but many times simply because they can build them. In other words, grand conceptual inventions happen because technology has finally made them possible. Do people need them? That question is answered over the next several decades as the technology moves from technical demonstration, to product, to failure, or perhaps to slow acceptance in the commercial world where slowly, after considerable time, the products and applications jointly evolve, and slowly the need develops.

I suppose one difference is in scale, with respect to impact, time frame and return on investment. If you need to actually make some money on your idea in a relatively short period of time, you may want to adopt the lean startup model, and prepare to accept customer-driven compromises along the path toward your grand vision. The next presentation seemed to offer a middle way.

OcarinaSmuleGe Wang, Assistant Professor of Music at Stanford University's Center for Computer Research in Music and Acoustics (CCRMA) and Co-Founder, CTO and Chief Creative Officer of Smule, played us out. Offering an example of Don Norman's observation that new technologies inspire technologists to invent things, Ge shared the process of inside-out design behind the invention of the Ocarina iPhone application: he and his team decided that they wanted to build something musical with the iPhone, taking advantage of its various sensors (multi-touch screen, accelerometer, microphone, GPS), but they weren't initially sure what. The Ocarina app allows you to not only play the iPhone as an instrument, but create and share tablature for songs / musical pieces and to see, hear and play with an organic community of Ocarinists around the world, in a global visualization (and auralization?) of imperfect harmony. Among the unintended consequences was the adoption of the instrument by a nose flautist.

LighterArtSmule Another Smule iPhone application, the Sonic Lighter, has also yielded unanticipated consequences. Sonic Lighter creates a real-time visual and aural simulation of a lighter that responds to tilting, blowing into the microphone and being positioned near another iPhone that is running the app (which creates a flamethrower effect). It also creates a dot on the Sonic Lighter Ignition Map whenever it is lit and then blown out, which has led to an entire category of unanticipated effects now called Lighter Art (I love the double entendre ... and enjoy this much better than the darker art represented by the Oil Spill Crisis Map, for which one can also imagine an ignition component). The first known example of Lighter Art can be seen to the right, where someone created virtual graffiti - the word "hi" - by turning the Sonic Lighter on and off while walking around Pasadena in a pattern that sketched out the letters. Other Smule applications demonstrated include the Magic Piano (only for iPad), and I Am T-Pain, a mobile voice synthesizer / karaoke app, both of which include interactive mapping capabilities that enable people - or cats - to spontaneously play or sing duets with people they don't know.

Ruminating on Ben Huh's earlier presentation, and Victor Borge's quote about connecting via laughter, I found myself wondering whether music might represent the second shortest channel between two people ... or among larger groups.

There's no data like more open data

Ldc-logo When I was working on natural language processing and speech recognition systems in the 90s, one of our mantras was "there's no data like more data", i.e., all things being equal, the accuracy of recognition tends to increase with the addition of more labeled data. The Linguistic Data Consortium at the University of Pennsylvania was [and, I suspect, still is] the primary source for labeled text and speech data, and it was available - for a fee - to all members, most of whom were researchers and developers in academia and industry. Three recent developments in the past week have prompted a reflection on the broader power of data ... and the people and organizations that have access to it.

LibraryOfCongress-logo TwitterLogo The first development was a series of recent announcements about the broader availability of Twitter data. One announcement was that the U.S. Library of Congress was acquiring the entire Twitter public archive:

Every public tweet, ever, since Twitter’s inception in March 2006, will be archived digitally at the Library of Congress. That’s a LOT of tweets, by the way: Twitter processes more than 50 million tweets every day, with the total numbering in the billions.

We thought it fitting to give the initial heads-up to the Twitter community itself via our own feed @librarycongress. (By the way, out of sheer coincidence, the announcement comes on the same day our own number of feed-followers has surpassed 50,000. I love serendipity!)

We will also be putting out a press release later with even more details and quotes. Expect to see an emphasis on the scholarly and research implications of the acquisition.

On the one hand, I believe this is a very positive development. Google, Yahoo and Microsoft all pay for real-time access to the Twitter "firehose", and now researchers and developers with shallower pockets will be able to access the entire Twitter public data archive ... after some yet-to-be-announced delay (it's not clear when the archive will become available, how often it will be updated, or how often developers or their applications will be able to access it).

ChirpLogo A related development, also announced during the recent Twitter's developer conference (Chirp), was that Twitter is offering a stream API to supplement its REST API and Search API. As with the other APIs, there are limitations imposed on its use, lest fail whales become a significantly more common sight, but this still represents a positive development in making more data more openly accessible.

150px-Bork2 150px-Clarence_Thomas_official However, the co-occurrence of these announcements with speculation about President Obama's next nominee for the U.S. Supreme Court reminded me of the release of video rental data during the confirmation hearings for Robert Bork during the Reagan administration; although this data did not seem to affect the outcome, its release did lead to the Video Privacy Protection Act in 1988. Belated revelation of alleged pornographic video rental data shortly after the confirmation of Supreme Court Justice Clarence Thomas in 1991, during the George W. Bush administration, has given rise to speculation about whether Thomas would have been confirmed had this evidence been made available earlier in the process.

FacebookNetflix I don't want to draw too strong of an analogy between private video rental records and public tweets, but given the broadening range of web services that enable people to automatically update their status[es] about their use of those services (e.g., Netflix users can automatically post their movie ratings on Facebook), I find myself speculating about how the Twitter archive might affect future judicial nominations and/or future elections for political offices ... but given my biases toward a more transparent society, I suppose that if the data is out there, I'd rather have it publicly available than have limited access to it.

Zuck-at-f8 F8-logo And speaking of sharing updates and other data across web services, the second recent development in the realm of open data to give me pause were announcements at the Facebook developer's conference (f8) last week. VentureBeat's f8 roundup offers a nice summary of these announcements, which included a Graph API and a "like" button that can be used on any web site ... vastly increasing the prospects for personalization and sociality across the web ... and placing Facebook squarely in the center of this hyperpersonalized and hypersocialized network. Lili Cheng, of Microsoft's FUSE Labs, wrote about the first Facebook partnership announced - and demonstrated - during the keynote at f8, a new Facebook app for sharing documents created by her group.

Readwriteweb-logo As with the Twitter announcement, I see many positive possibilities in these developments, but I see an even darker shadow being cast by the Facebook announcements. Marshall Kirkpatrick at ReadWriteWeb articulated some of my concerns in a post asking Is the New Facebook a Deal with the Devil?

Facebook blew people's minds today at its F8 developer conference but one sentiment that keeps coming up is: this is scary. The company unveiled simple, powerful plans to offer instant personalization on sites all over the web, it kicked off meaningful adoption of the Semantic Web with the snap of the fingers, it revolutionized the relationship between the cookie and the log-in, it probably knocked a whole class of recommendation technology startups that don't offer built-in distribution to 400 million people right out of the market. It popularized social bookmarking and made subscribing to feeds around the web easier than ever before. And it may have created the biggest disruption to web traffic analytics in years: demographically verified visitor stats tied to people's real identities. There was so much big news that the analytics part didn't even come up in the keynote.

This is so much new technology and it's tied in so closely with one very powerful company that there is big reason to stop and consider the possible implications. There are reasons to be scared. The bargain Facebook offers is very, very compelling - but it's not a clear win for the web.

Mashable Pete Cashmore at Mashable offers a somewhat less apprehensive, or perhaps simply more capitalistic, perspective on these developments, Shocker: Facebook Does What’s Best For Facebook:

Facebook is building a database of the world’s preferences, but won’t give others access unless they promote Facebook on their sites (by using Facebook logins). ...

So Facebook is building a database of information about you, but you don’t really own it: Facebook does. ...

Bottom line: when a company solves a problem, should we be surprised that they solve it in a way that creates value for both customers and the company itself? Isn’t that how capitalism works?

Techcrunchuk2 Since then, I've read other commentaries that present a less apprehensive view of these developments, e.g., a comment by Austin on a post in TechCrunch Europe on Privacy issues? Google engineers leaving Facebook in droves:

There are two things going on here:

1. An iFrame on sites that points to Facebook. The iframe request is data loaded so it knows where the user came from. Facebook shows activity and friends that have interacted with the site but the data IS NOT shared. You have to be logged into facebook for it to work. It LOOKs like it is on that site but it isn’t. It is a little window into facebook on a different page.

2. Applications can ask users for access to their data through the service formerly known as ‘connect’. Each and every user has to agree to share the data. If you don’t want to share then don’t use the App.

Facebook isn’t doing anything differently then they did before, it is just easier and more integrated.

Although a subsequent commenter posted an unsubstantiated and rather abusive allegation that Austin works for Facebook (Austin's username is linked to Aqumin, a financial data and analysis firm), no one rebutted his argument.

Radar_logo Another positive perspective was presented in an O'Reilly Radar post by David Recordon - who does work for Facebook - on Why f8 was good for the open web

  • No more 24-hour caching limit (as long as developers using Facebook API data are keeping it up to date and agree to remove it at a user's request).
  • An API that is realtime and isn't just about content (developers can subscribe to changes).
  • The Open Graph protocol benefits the web, not just Facebook, and is licensed under the Open Web Foundation Agreement.
  • Support for OAuth 2.0

I discovered Dave's post via Tim O'Reilly's tweet, and as one of the post prominent proponents of the open web, Tim's endorsement carries a great deal of weight (for me). He also tweeted a link to another positive perspective on the Facebook announcements, by Fred Wilson, a partner at Union Square Ventures, who raised doubts about One Graph to Rule Them All?:

These other social graphs [Twitter, Tumblr, Foursquare, Disqus, GetGlue, and others (remember] can and will grow in the wake of Facebook. I am not sure if Facebook's ambition is to create the one social graph to rule them all but if it is, I don't think they will succeed with that. If it is to empower the creation of many social graphs for various activities and to be in the center of that activity and driving it, I think they are already there and will continue to be there for many years to come.

And referencing Tim brings me to the third (and final) recent development I wanted to mention regarding open data: his keynote on where open source and open data are going in the age of the cloud at the 2010 O'Reilly MySQL Conference and Expo last week. Some of the issues he raised in his talk are reflected in a blog entry he posted last month on The State of the Internet Operating System (a "part 2" followup is promised soon). If I were to highlight one theme from the keynote, it is his statement that the future actually belongs to the data, not the database. I'll highlight a few of his more specific observations and insights below.

The 21st century data challenge is how to deliver algorithmic real-time cloud-based intelligence to mobile applications.This cloud future includes...

  • Devices acting as sensors for intelligent data collection
  • Devices whose UI is on the web rather than the device
  • Feeding data into multiple online services that will turn into a full-on sensor web
  • Setting the stage for robotics, augmented reality and the next generation of personal electronics

The Internet Operating System is a Data Operating System:

  • It helps applications find out about
    • People
    • Places
    • Things
    • Prices
    • Documents
    • Images
    • Sounds
    • Relationships
    • ...
  • and helps people interact with them through services
    • Search
    • Payment
    • Matching and Recognition
    • ...

Referencing an earlier blog post on The War for the Web, Tim asked "Who will own the Internet Operating System? Do we want anyone to own it? If not, we better get busy."

MoneyTech Invoking concepts from Wall Street, via the Money:Tech conference ("Where Web 2.0 meets Wall Street"), and applying them to the prospects for the open web, Tim noted that some financial companies that started out as brokers started trading for their own accounts, against their customer, and warned us to watch for this behavior on the Internet: "The giants of the internet are trading for their own accounts, building a platform on which all roads lead back to themselves."

Noting that each of the players (giants) in "the Internet Operating System game" tends to embrace open source for their own strategic reasons and is giving away something that is valuable to someone else, Tim suggested that we may see "some interesting open source moves around Microsoft's Bing search engine", and offered a partial list of potential open source supporters in different application areas:

  • Search: Microsoft
  • Maps: Microsoft, Nokia, Yelp, Foursquare
  • Speech: Nuance, Microsoft
  • Social Graph: Google
  • Payment: Paypal
  • Cloud infrastructure: VMware
  • Smartphones: Google
  • Device Operating Systems: Google

Shifting his attention from industry to government, Tim presented some Open Government Data Principles - relevant to data sharing by anyone else - composed by a group of 30 other leading strategic thinkers. Interestingly, this group did not include any government representatives.

Government data shall be considered open if it is made public in a way that complies with the principles below:

  1. Complete All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.
  2. Primary Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.
  3. Timely Data is made available as quickly as necessary to preserve the value of the data.
  4. Accessible Data is available to the widest range of users for the widest range of purposes.
  5. Machine processable Data is reasonably structured to allow automated processing.
  6. Non-discriminatory Data is available to anyone, with no requirement of registration.
  7. Non-proprietary Data is available in a format over which no entity has exclusive control.
  8. License-free Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.

Toward the end of his talk, Tim referenced a recent Radar O'Reilly blog post by Nat Torkington on Truly Open Data, in which Nat notes that we have to build some tools to support open data, e.g., tools for provisioning and tracking. In short, we need to make it as easy to share data as it is to share code in open source movement. So maybe a more appropriate title for this post would be "There's no data like more open data and tools" ... but I think I'll save that for a future followup post.