Hype, Hubs & Hadoop: Some Notes from Strata NY 2013 Keynotes

Stratany2013_header_logo_tm_no_ormI didn't physically attend Strata NY + Hadoop World this year, but I did watch the keynotes from the conference. O'Reilly Media kindly makes videos of the keynotes and slides of all talks available very soon after they are given. Among the recurring themes were haranguing against the hype of big data, the increasing utilization of Hadoop as a central platform (hub) for enterprise data, and the importance and potential impact of making data, tools and insights more broadly accessible within an enterprise and to the general public. The keynotes offered a nice mix of business (applied) & science (academic) talks, from event sponsors and other key players in the field, and a surprising - and welcome - number of women on stage.

Atigeo, the company where I now work on analytics and data science, co-presented a talk on Data Driven Models to Minimize Hospital Readmissions at Strata Rx last month, and I'm hoping we will be participating in future Strata events. And I'm hoping that some day I'll be on stage presenting some interesting data and insights at a Strata conference.

Meanwhile, I'll include some of my notes on interesting data and insights presented by others, in the order in which presentations were scheduled, linking each presentation title to its associated video. Unlike previous postings of notes from conferences, I'm going to leave the notes in relatively raw form, as I don't have the time to add more narrative context or visual augmentations to them.

Hadoop's Impact on the Future of Data Management
Mike Olson @mikeolson (Cloudera)

3000 people at the conference (sellout crowd), up from 700 people in 2009.
Hadoop started out as a complement to traditional data processing (offering large-scale processing).
Progressively adding more real-time capabilities, e.g. Impala & Cloudera search.
More and more capabilities migrating form traditional platforms to Hadooop.
Hadoop moving from the periphery to the architectural center of the data center, emerging as an enterprise data hub.
Hub: scalable storage, security, data governance, engines for working with the data in place
Spokes connect to other systems, people
Announcing Cloudera 5, "the enterprise data hub"
Announcing Cloudera Connect Cloud, supporting private & public cloud deployments
Announcing Cloudera Connect Innovators, inaugural innovator is DataBricks (Spark real-time in-memory processing engine)

Separating Hadoop Myths from Reality
Jack Norris (MapR Technologies)

Hadoop is the first open source project that has spawned a market
3:35 compelling graph of Hadoop/HBase disk latency vs. MapR latency
Hadoop is being used in production by many organizations

Big Impact from Big Data
Ken Rudin (Facebook)

Need to focus on business needs, not the technology
You can use science, technology and statistics to figure out what the answers are, but it is still am art to figure out what the right questions are
How to focus on the right questions:
* hire people with academic knowledge + business savvy
* train everyone on analytics (internal DataCamp at Facebook for project managers, designers, operations; 50% on tools, 50% on how to frame business questions so you can use data to get the answers)
* put analysts in org structure that allows them to have impact ("embedded model": hybrid between centralized & decentralized)
Goals of analytics: Impact, insight, actionable insight, evangelism … own the outcome

Five Surprising Mobile Trajectories in Five Minutes
Tony Salvador (Intel Corporation)

Tony is director at the Experience Research Lab (is this the group formerly known as People & Practices?) [I'm an Intel Research alum, and Tony is a personal friend]
Personal data economy: system of exchange, trading personal data for value
3 opportunities
* hyper individualism (Moore's Cloud, programmable LED lights)
* hyper collectivity (student projects with outside collaboration)
* hyper differentiation (holistic design for devices + data)
Big data is by the people and of the people ... and it should be for the people

Can Big Data Reach One Billion People?
Quentin Clark (Microsoft)

Praises Apache, open source, github (highlighted by someone from Microsoft?)
Make big data accessible (MS?)
Hadoop is a cornerstone of big data
Microsoft is committed to making it ready for the enterprise
HD Insight (?) Azure offering for Hadoop
We have a billion users of Excel, and we need to find a way to let anybody with a question get that question answered.
Power BI for Office 365 Preview

What Makes Us Human? A Tale of Advertising Fraud
Claudia Perlich (Dstillery)

A Turing test for advertising fraud
Dstillery: predicting consumer behavior based on browsing histories
Saw 2x performance improvement in 2 weeks; was immediately skeptical
Integrated additional sources of data (10B bid requests)
Found "oddly predictive websites"
e.g., Women's health page --> 10x more likely to check out credit card offer, order online pizza, or reading about luxury cars
Large advertising scam (botnet)
36% of traffic is non-intentional (Comscore)
Co-visitation patterns
Cookie stuffing
Botnet behavior is easier to predict than human behavior
Put bots in "penalty box": ignore non-human behavior

From Fiction to Facts with Big Data Analytics
Ben Werther @bwerther (Platfora)

When it comes to big data, BI = BS
Contrasts enterprises based on fiction, feeling & faith vs. fact-based enterprises
Big data analytics: letting regular business people iteratively interrogate massive amounts of data in an easy-to-use way so that they can derive insight and really understand what's going on
3 layers: Deep processing + acceleration + rich analytics
Product: Hadoop processing + in-memory acceleration + analytics engines + Vizboards
Example: event series analytics + entity-centric data catalog + iterative segmentation

The Economic Potential of Open Data
Michael Chui (McKinsey Global Institute)

[Presentation is based on newly published - and openly accessible (walking the talk!) - report: Open data: Unlocking innovation and performance with liquid information.]

Louisiana Purchase: Lewis & Clark address a big data acquisition problem
Thomas Jefferson: "Your observations are to be taken with great pains & accuracy, to be entered intelligibly, for others as well as yourself"
What happens when you make data more liquid?

4 characteristics of "openness" or "liquidity" of data:
* degree of access
* machine readability
* cost
* rights

Benefits to open data:
* transparency
* benchmarking exposing variability
* new products and services based on open data (Climate Corporation?)

How open data can enable value creation
* matching supply and demand
* collaboration at scale
"with enough eyes on code, all bugs are shallow"
--> "with enough eyes on data, all insights are shallow"
* increase accountability of institutions

Open data can help unlock $3.2B [typo? s/b $3.2T?] to $5.4T in ecumenic value per year across 7 domains
* education
* transportation
* consumer products
* electricity
* oil and gas
* health care
* consumer finance
What needs to happen?
* identify, prioritize & catalyze data to open
* developer, developers, developers
* talent (data scientists, visualization, storytelling)
* address privacy confidentiality, security, IP policies
* platforms, standards and metadata

The Future of Hadoop: What Happened & What's Possible?
Doug Cutting @cutting (Cloudera)

Hadoop started out as a storage & batch processing system for Java programmers
Increasingly enables people to share data and hardware resources
Becoming the center of an enterprise data hub
More and more capabilities being brought to Hadoop
Inevitable that we'll see just about every kind of workload being moved to this platform, even online transaction processing

Designing Your Data-Centric Organization
Josh Klahr (Pivotal)

GE has created 24 data-driven apps in one year
We are working with them as a Pivotal investor and a Pivotal company, we help them build these data-driven apps, which generated $400M in the past year
Pivotal code-a-thon, with Kaiser Permanente, using Hadoop, SQL and Tableau

What it takes to be a data-driven company
* Have an application vision
* Powered by Hadoop
* Driven by Data Science

Encouraging You to Change the World with Big Data
David Parker (SAP)

Took Facebook 9 months to achieve the same number of users that it took radio 40 years to achieve (100M users)
Use cases
At-risk students stay in school with real-time guidance (University of Kentucky)
Soccer players improve with spatial analysis of movement
Visualization of cancer treatment options
Big data geek challenge (SAP Lumira): $10,000 for best application idea

The Value of Social (for) TV
Shawndra Hill (University of Pennsylvania)

Social TV Lab
How we can derive value from the data that is being generated by viewers today?
Methodology: start with Twitter handles of TV shows, identify followers, collect tweets and their networks (followees + followers), build recommendation systems from  the data (social network-based, product network-based & text-based (bag of words)). Correlate words in tweets about a show with demographics about audience (Wordle for male vs. female)
1. You can use Twitter followers to estimate viewer audience demographics
2. TV triggers lead to more online engagement
3. If brands want to engage with customers online, play an online game
Real time response to advertisement (Teleflora during Super Bowl): peaking buzz vs. sustained buzz
Demographic bias in sentiment & tweeting (male vs. female response to Teleflora, others)
Influence = retweeting
Women more likely to retweet women, men more likely to retweet men
4. Advertising response and influence vary by demographic
5. GetGlue and Viggle check-ins can be used as a reliable proxy for viewership to
* predict Nielsen viewership weeks in advance
* predict customer lifetime value
* measure time shifting
All at the individual viewer level (vs. household level)

Ubiquitous Satellite Imagery of our Planet
Will Marshall @wsm1 (Planet Labs)

Ultracompact satellites to image the earth on a much more frequent basis to get inside the human decision-making loop so we can help human action.
Redundancy via large # of small of satellites with latest technology (vs. older, higher-reliability systems on one satellite)
Recency: shows more deforestation than Google Maps, river movement (vs. OpenStreetMap)
API for the Changing Planet, hackathons early next year

The Big Data Journey: Taking a holistic approach
John Choi (IBM)

[No slides?]
Invention of sliced bread 
Big data [hyped] as the biggest thing since the sliced bread
Think about big data as a journey
1. It's all about discipline and knowing where you are going (vs. enamored with tech)
VC $2.6B investment into big data (IBM, SAP, Oracle, … $3-4B more)
2. Understand that any of these technologies do not live in a silo. The thing that you don't want to have happen is that this thing become a science fair project. At the end of the day, this is going to be part of a broader architecture.
3. This is an investment decision, want to have a return on investment.

How You See Data
Sharmila Shahani-Mulligan @ShahaniMulligan (ClearStory Data)

The Next Era of Data Analysis: next big thing is how you analyze data from many disparate sources and do it quickly.
More data: Internal data + external data
More speed: Fast answers + discovery
Increase speed of access & speed of processing so that iterative insight becomes possible.
More people: Collaboration + context
Needs to become easier for everyone across the business (not just specialists) to see insights as insights are made available, have to make decisions faster.
Data-aware collaboration
Data harmonization
Demo: 6:10-8:30

Can Big Data Save Them?
Jim Kaskade @jimkaskade (Infochimps)

1 of 3 people in US has had a direct experience with cancer in their family
1 in 4 deaths are cancer-related
Jim's mom has chronic leukemia
Just got off the phone with his mom (it's his birthday), and she asked "what is it that you do?"
"We use data to solve really hard problems like cancer"
Cancer is 2nd leading cause of death in children
"The brain trust in this room alone could advance cancer therapy more in a year than the last 3 decades."
Bjorn Brucher
We can help them by predicting individual outcomes, and then proactively applying preventative measures.
Big data starts with the application
Stop building your big data sandboxes, stop building your big data stacks, stop building your big data hadoop clusters without a purpose.
When you start with the business problem, the use case, you have a purpose, you have focus.
50% of big data projects fail (reference?)
"Take that one use case, supercharge it with big data & analytics, we can take & give you the most comprehensive big data solutions, we can put it on the cloud, and for some of you, we can give you answers in less than 30 days"
"What if you can contribute to the cure of cancer?" [abrupt pivot back to initial inspirational theme]

Changing the Face of Technology - Black Girls CODE
Peta Clarke @volunteerbgcny (Black Girls Code - NY), Donna Knutt @donnaknutt (Black Girls Code)

Why coding is important: By 2020, 1.4M computing jobs
Women of color currently make up 3% of computing jobs in US
Goal: teach 1M girls to code by 2040
Thus far: 2 years, 2000 girls, 7 states + Johannesburg, South Africa

Beyond R and Ph.D.s: The Mythology of Data Science Debunked
Douglas Merrill @DouglasMerrill (ZestFinance)

[my favorite talk]
Anything which appears in the press in capital letters, and surrounded by quotes, isn't real.
There is no math solution to anything. Math isn't the answer, it's not even the question.
Math is a part of the solution. Pieces of math have different biases, different things they do well, different things they do badly, just like employees. Hiring one new employee won't transform your company; hiring one new piece of math also won't transform your company.
Normal distribution, bell curve: beautiful, elegant
Almost nothing in the real world, is, in fact, normal.
Power laws don't actually have means.
Joke: How do you tell the difference between an introverted and an extroverted engineer? The extroverted one looks at your shoes instead of his own.
The math that you think you know isn't right. And you have to be aware of that. And being aware of that requires more than just math skills.
Science is inherently about data, so "data scientist" is redundant
However, data is not entirely about science
Math + pragmaticism + communication
Prefers "Data artist" to data scientist
Fundamentally, the hard part actually isn't the math, the hard part is finding a way to talk about that math. And, the hard part isn't actually gathering the data, the hard part is talking about that data.
The most famous data artist of our time: Nate Silver.
Data artists are the future.
What the world needs is not more R, what the world needs is more artists (Rtists?)

Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data
Foster Provost (NYU | Stern)

[co-author of my favorite book on Data Science]
Agrees with some of the critiques made by previous speaker, but rather likes the term "data scientist"
Shares some quotes from Data Science and its relationship to Big Data and Data-Driven Decision Making
Gartner Hype Cycle 2012 puts "Predictive Analytics" at the far right ("Plateau of Productivity")  
[it's still there in Gartner Hype Cycle 2013, and "Big Data" has inched a bit higher into the "Peak of Inflated Expectations"]
More data isn't necessarily better (if it's from the same source, e.g., sociodemographic data)
More data from different sources may help.
Using fine-grained behavior data, learning curves show continued improvement to massive scale.
1M merchants, 3M data points (? look up paper)
But sociodemographic + pseudo social network data still does not necessarily do better
See Pseudo-Social Network Targeting from Consumer Transaction Data (Martens & Provost)
Seem to be very few case studies where you have really strong best practices with traditional data juxtaposed with strong best practices with another sort of data.
We see similar learning curves with different data sets, characterized by  massive numbers of individual behaviors, each of which probably contains a small amount of information, and the data items are sparse.
See Predictive Modelling with Big Data: Is Bigger Really Better? (Enrique Junque de Fortuny, David Martens & Foster Provost)
Others have published work on on Fraud detection (Fawcett & FP, 1997; Cortes et al, 2001), Social Network-based Marketing (Hill, et al, 2006), Online Display-ad Targeting (FP, Dalessandro, et al., 2009; Perlich, et al., 2013)
Rarely see comparisons

Take home message:
The Golden Age of Data Science is at hand.
Firms with larger data assets may have the opportunity to achieve significant competitive advantage.
Whether bigger is better for predictive modeling depends on:
a) the characteristics of the data (e.g., sparse, fine-grained data on consumer behavior)
b) the capability to model such data

The Scientific Method: Cultivating Thoroughly Conscious Ignorance

Ignorance_HowItDrivesScience_StuartFirestein_coverStuart Firestein brilliantly captures the positive influence of ignorance as an often unacknowledged guiding principle in the fits and starts that typically characterize the progression of real science. His book, Ignorance: How It Drives Science, grew out of a course on Ignorance he teaches at Columbia University, where he chairs the department of Biological Sciences and runs a neuroscience research lab. The book is replete with clever anecdotes interleaved with thoughtful analyses - by Firestein and other insightful thinkers and doers - regarding the central importance of ignorance in our quests to acquire knowledge about the world.

Each chapter leads off with a short quote, and the one that starts Chapter 1 sets the stage for the entire book:

"It is very difficult to find a black cat in a dark room," warns an old proverb. "Especially when there is no cat."

He proceeds to channel the wisdom of Princeton mathematician Andrew Wiles (who proved Fermat's Last Theorem) regarding the way science advances:

It's groping and probing and poking, and some bumbling and bungling, and then a switch is discovered, often by accident, and the light is lit, and everyone says "Oh, wow, so that's how it looks," and then it's off into the next dark room, looking for the next mysterious black feline.

Firestein is careful to distinguish the "willful stupidity" and "callow indifference to facts and logic" exhibited by those who are "unaware, unenlightened, and surprisingly often occupy elected offices" from a more knowledgeable, perceptive and insightful ignorance. As physicist James Clerk Maxwell describes it, this "thoroughly conscious ignorance is the prelude to every real advance in science."

The author disputes the view of science as a collection of facts, and instead invites the reader to focus on questions rather than answers, to cultivate what poet John Keats' calls "negative capability": the ability to dwell in "uncertainty without irritability". This notion is further elaborated by philosopher-scientist Erwin Schrodinger:

In an honest search for knowledge you quite often have to abide by ignorance for an indefinite period.

PowerOfPullIgnorance tends to thrive more on the edges than in the centers of traditional scientific circles. Using the analogy of a pebble dropped into a pond, most scientists tend to focus near the site where the pebble is dropped, but the most valuable insights are more likely to be found among the ever-widening ripples as they spread across the pond. This observation about the scientific value of exploring edges reminds me of another inspiring book I reviewed a few years ago, The Power of Pull, wherein authors John Hagel III, John Seely Brown & Lang Davison highlight the business value of exploring edges: 

Edges are places that become fertile ground for innovation because they spawn significant new unmet needs and unexploited capabilities and attract people who are risk takers. Edges therefore become significant drivers of knowledge creation and economic growth, challenging and ultimately transforming traditional arrangements and approaches.

On a professional level, given my recent renewal of interest in the practice of data science, I find many insights into ignorance relevant to a productive perspective for a data scientist. He promotes a data-driven rather than hypothesis-driven approach, instructing his students to "get the data, and then we can figure out the hypotheses." Riffing on Rodin, the famous sculptor, Firestein highlights the literal meaning of "dis-cover", which is "to remove a veil that was hiding something already there" (which is the essence of data mining). He also notes that each discovery is ephemeral, as "no datum is safe from the next generation of scientists with the next generation of tools", highlighting both the iterative nature of the data mining process and the central importance of choosing the right metrics and visualizations for analyzing the data.

Professor Firestein also articulates some keen insights about our failing educational system, a professional trajectory from which I recently departed, that resonate with some growing misgivings I was experiencing in academia. He highlights the need to revise both the business model of universities and the pedagogical model, asserting that we need to encourage students to think in terms of questions, not answers. 

W.B. Yeats admonished that "education is not the filling of a pail, but the lighting of a fire." Indeed. TIme to get out the matches.


On a personal level, at several points while reading the book I was often reminded of two of my favorite "life rules" (often mentioned in preceding posts) articulated by Cherie Carter-Scott in her inspiring book, If Life is a Game, These are the Rules:

Rule Three: There are no mistakes, only lessons.
Growth is a process of experimentation, a series of trials, errors, and occasional victories. The failed experiments are as much a part of the process as the experiments that work.

Rule Four: A lesson is repeated until learned.
Lessons will repeated to you in various forms until you have learned them. When you have learned them, you can then go on to the next lesson.

Firestein offers an interesting spin on this concept, adding texture to my previous understanding, and helping me feel more comfortable with my own highly variable learning process, as I often feel frustrated with re-encountering lessons many, many times:

I have learned from years of teaching that saying nearly the same thing in different ways is an often effective strategy. Sometimes a person has to hear something a few times or just the right way to get that click of recognition, that "ah-ha moment" of clarity. And even if you completely get it the first time, another explanation always adds texture.

My ignorance is revealed to me on a daily, sometimes hourly, basis (I suspect people with partners and/or children have an unfair advantage in this department). I have written before about the scope and consequences of others being wrong, but for much of my life, I have felt shame about the breadth and depth of my own ignorance (perhaps reflecting the insight that everyone is a mirror). It's helpful to re-dis-cover the wisdom that ignorance can, when consciously cultivated, be strength.

[The video below is the TED Talk that Stuart Firestein recently gave on The Pursuit of Ignorance.]



PRP, Regenokine & other biologic medicine treatments for joint & tendon problems

Science journalist Jonah Lehrer posted an interesting article last week about aging star athletes' embrace of biologic medicine, "Why Did Kobe Go to Germany? An aging star and the new procedure that could revolutionize sports medicine". The article describes Regenokine, a relatively new procedure for treating joint and tendon problems that sounds similar to the platelet rich plasma (PRP) treatment I underwent for my right elbow nearly 5 years ago. I have enjoyed a nearly full recovery from the pain and limitations of chronic elbow tendinosis that had plagued me on and off for several years prior to treatment, and I enjoyed reading about others' successful treatment experiences and some of the studies about treatment alternatives.

"Biologic medicine" treatments all engage the body in healing itself, typically involving the extraction, manipulation and re-injection of the patient's own blood or other bodily fluid. Regenokine treatment involves withdrawing a small sample of blood from the patient, heating it and then spinning it in a centrifuge to separate the constituent elements; the resulting yellow colored middle layer is then extracted and injected into the patient's problem area (e.g., the knee). PRP involves withdrawing blood and spinning it in a centrifuge, but does not involve heating, and - as the name suggests - the platelet-rich layer is extracted for injection. Bone marrow injections, involving stem cells, use a similar approach.


Unfortunately, the article reports that PRP, Regenokine and other "biologic medicine" treatments face special challenges in securing FDA approval:

The reason Kobe, A-Rod, and other athletes travel to Germany for their biologic treatments involves a vague FDA regulation that mandates that all human tissues (such as blood and bone marrow) can only be "minimally manipulated," or else they are classified as a drug and subject to much stricter governmental regulations. The problem, of course, is figuring out what "minimal" means in the context of biologics. Can the blood be heated to a higher temperature, as with Regenokine? Spun in a centrifuge? Can certain proteins be filtered out? Nobody knows the answer to these questions, and most American doctors are unwilling to risk the ire of regulators.

The article profiles athletes Kobe Bryant and Alex Rodriguez, as well as Regenokine treatment providers Dr. Peter Wehling (Dusseldorf, Germany) and Dr. Chris Renna (Lifespan Medicine, Dallas & Santa Monica) - who are also co-authors of the book End of Pain - and PRP treatment providers Dr. Stephen Sampson (Orthohealing Center & UCLA) and Dr. Allan Mishra (Apex PRP & Stanford), the doctor who treated my elbow.

Lehrer offers a balanced perspective, noting that while a few famous athletes appear to have experienced healing after biologic medicine treatments, there is - as yet - little supporting evidence from rigorous clinical trials, and so these could represent "the latest overhyped medical treatments for desperate athletes". A 2006 article co-authored by Mishra described a pilot study showing the effectiveness of PRP for chronic elbow tendinosis (the problem I was suffering from), and a 2010 article co-authored by Sampson described another pilot study showing the effectiveness of PRP on knee osteoarthritis. However a 2010 article reported on a Dutch study that showed no significant benefit of PRP over saline injections for chronic Achilles tendonopathy. Another Dutch study, involving a double-blind randomized trial of PRP with 230 patients has been completed, but it could be another several years before the results appear in a peer-reviewed medical journal. Mishra's blog includes a recent post referencing other studies supporting the effectiveness of PRP.

I don't know of any studies of Regenokine, but a 2008 pilot study of interleukin-1 receptor antagonist did not demonstrate significant benefit to treating knee osteoarthritis demonstrated "statistically significant improvement of KOOS [Knee injury and Osteoarthritis Outcome Score] symptom and sport parameters", and a 2009 study reports that Autologous conditioned serum (Orthokine) is an effective treatment for knee osteoarthritis. According to a December 2011 post about PRP and Regenokine in the Wordpress blog, Knee Surgery Newsletter (which offers no information about the author), Orthokine was the brand name under which Regenokine was previously marketed, and Regenokine and Orthokine are both brand names for interleukin receptor antagonist treatment.

The Lehrer article also highlights doubts - or what should be doubts - about the effectiveness of the traditional alternative to biologic medicine treatment - surgery - describing the results of a 2002 peer-reviewed study appearing in the New England Journal of Medicine, A Controlled Trial of Arthroscopic Surgery for Osteoarthritis of the Knee:

Consider an influential 2002 trial that compared arthroscopic surgery for knee osteoarthritis to a sham surgery, in which people were randomly assigned to have their knee cut open but without any additional treatment. (The surgeon who performed all the operations was the orthopedic specialist for an NBA team.) The data was clear: there was no measurable difference between those who received the real surgery and those who received the fake one.

As I've noted before in the PRP thread here on my blog, I'm not a medical expert, and I don't even follow the medical literature about PRP or other treatments with any regularity (I discovered this article because I follow @jonahlehrer on Twitter). I have enjoyed a complete recovery of functionality and nearly pain-free use of my elbow following PRP therapy. I like to think that there is a causal relationship in my personal experience - especially after the failure of several other treatments I tried - but as noted in Lehrer's article, more evidence is required to support any general conclusions on the effectiveness of the treatment. Meanwhile, I'm happy that to see PRP and other biologic treatments gain greater recognition and awareness.

Health, science, knowledge, access and elitism: Lawrence Lessig and science as remix culture

Remix-Lessig I have been an admirer and supporter of Lawrence Lessig's crusade for copyright reform and promotion of remix culture for many years. In a recent talk at CERN, Lessig applied his arguments for a fairer interpretation of fair use in the arts world to opening up the architectures for knowledge access in the world of science. The Harvard Law School professor made a compelling case for the ethical obligation of scientists [at least those in academia] to provide universal access to the knowledge they discover, and chastised those who practice exclusivity - those who choose elite-nment over enlightenment - as "wrong".

I intially discovered the talk by following a @BoingBoing tweet to a two-paragraph blog post about Lessig on science, copyright and the moral case for open access, which included an embedded 50-minute video of Lessig's presentation at CERN on 18 April 2011 entitled "The Architecture of Access to Scientific Knowledge: Just How Badly We Have Messed This Up".

I rarely take the time to watch any videos, and having seen many of Lessig's talks about copyright reform - live and online - I was preparing to simply retweet the link, and move on. But having been thoroughly irritated by a personal encounter with barriers to knowledge access during the [free] webcast from the otherwise enlightening and engaging Behavioral Informatics for Health event earlier this week, I was motivated to see and hear what Lessig had to show and tell. I was excited to discover that Lessig's talk was far more relevant to health and medicine - and the kind of universal access to crucial information that might help those outside of elite schools and hospitals better achieve positive health outcomes - than I initially anticipated.

Ajpm_journal Before sharing some of Lessig's insights and observations, I want to share the source of my personal irritation in encountering preventative measures erected to limit access to one of the two journals being showcased at the behavioral informatics event, a special issue on Cyberinfrastructure for Consumer Health from the American Journal of Preventative Medicine. When I investigated options for accessing some of the interesting articles being mentioned during the event, I discovered that


AJPM pricing options for individuals include a 12-month subscription to the journal for $277, or the purchase of individual articles for $31.50 each. The special issue being showcased at the event included 27 articles, which translates into a total cost of $850 for purchasing this one issue of the journal, whose mission is "the promotion of individual and community health".

Tbm_journal In contast, all the articles from the inaugural issue of the other journal being showcased at the event, Translational Behavioral Medicine, are freely available online, a policy much more in alignment with its mission:

TBM is an international peer-reviewed journal that offers continuous, online-first publication. TBM's mission is to engage, inform, and catalyze dialogue between the research, practice, and policy communities about behavioral medicine. We aim to bring actionable science to practitioners and to prompt debate on policy issues that surround implementing the evidence. TBM's vision is to lead the translation of behavioral science findings to improve patient and population outcomes.

I hope to post another blog entry with some notes from the behavioral informatics event, but in this post, I want to continue on with some of Lessig's commentary about science, knowledge, access and elitism. I'll embed a copy of the video below, follow it with some notes and partial transcriptions I made while watching, and finish off with a brief riff on science as a remix culture.

The Architecture of Access to Scientific Knowledge from lessig on Vimeo.

Lessig begins by talking about two motivations for his talk. The first is the late Supreme Court Justice Byron White, who was considered a liberal when appointed to the court by President John Kennedy in 1962, but became progressively more conservative, as evidenced in his authoring of the majority opinion in the 1986 case of Bowers v Hardwick, which upheld the criminalization of sodomy laws, and included the following statement:

Against this background, to claim that a right to engage in such conduct [sodomy] is "deeply rooted in this Nation's history and tradition" or "implicit in the concept of ordered liberty" is, at best, facetious.

Lessig calls this the White effect:

To be liberal / progressive is always relative to a moment, and that moment changes, and too many are liberal / progressive no more.

HarvardGazette_021111_Gita_019_605 The second, more recent, motivation was a Harvard Gazette article about Gita Gopinath, a macro-economist at Harvard who was born in India. After mentioning that Gopinath, a tenured professor, would like to have more time to read books that are not textbooks, the article concluded with the following sentence:

Still, the shelves in her new office are nearly bare, since, said Gopinath, “Everything I need is on the Internet now.”

Lessig notes:

If you're a member of the knowledge elite, then you have effectively free access to all of this information, but if you're from the rest of the world, not so much.

He goes on to observe:

The thing to recognize is that we built this world, we built this architecture for access. This flows from the deployment of copyright, but here, copyright to benefit publishers, not to enable authors. Not one of these authors gets money from copyright, not one of them wants the distribution of their articles limited, not one of them has a business model that turns upon restricting access to their work, not one of them should support this system.

As a knowledge policy, for the creators of this knowledge, this is crazy.

Lessig tells the story of his third daughter, who was diagnosed with jaundice shortly after her birth, and the concern he felt when the doctor expressed unexpected concern about possible complications. Due to his status as a Harvard professor, he had institutional access to many relevant articles in medical journals. When he calculated the cost for purchasing the 20 articles he tracked down, it would have cost $435 for someone who did not enjoy his level of elite status.

AAFP Even those journals which granted free access sometimes engaged in regulating access to parts of articles. For example, a February 2002 article on "Hyperbilirubinemia in the Term Newborn" in American Family Physician was available for free ... except for a crucial missing chart:

Management of Hyperbilirubinemia in Healthy Term Newborns

The rightsholder did not grant rights to reproduce this item in electronic media. For the missing item, see the original print version of this publication.

Rather than architecting systems to maximize access to knowledge, Lessig suggests that "we are architecting access to maximize revenue" He also shares a chart from An Open Letter to All University Presidents and Provosts Concerning Increasingly Expensive Journals by Theodore Bergstrom & R. Preston McAfee on Journal Prices by Publisher and Discipline Type that shows the cost-per-page of purchasing articles from for-profit journals was 5 times higher, on average, than the cost in not-for-profit journals, leading him to wonder whether academia is creating it's own RIAA:

Really Important Academic Archive: RIAA for the Academy?

Sciencecommons Lessig is co-founder of the Science Commons, a translation of the Creative Commons license to promote open access in the scientific community, with four key principles:

  1. Open access to literature
  2. Access to research tools
  3. Data should be in the public domain
  4. Open cyberinfrastructure

PLoS_logo Lessig championed the Public Library of Science (PLoS) as an exemplar of these principles. Personally, I am very excited about the PLoS publication of a landmark study this week on Sharing Data for Public Health Research by Members of an International Online Diabetes Social Network, by Weitzman, et al., based on data from the TuDiabetes online community, and another recent study by Wicks, et al., based on PatientsLikeMe community members with amyotrophic lateral sclerosis (ALS) published - and freely available - in the journal Nature Biotechnology, Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm.

Having recently read a critique of Science 2.0, cataloging the shortcomings and/or failures of several traditional for-profit publishers to effectively capitalize on the Web 2.0 platform, it is encouraging to see some promising progress in sharing knowledge about chronic conditions in the not-for-profit world.

Lessig proceeds to review some of the issues surrounding the use - and misuse - of copyright in the arts, but I have already written about many of his arguments and examples from that domain in my notes from his keynote at the 2009 Seattle Green Festival. I'll simply note that in viewing his examples in this context, I was struck by the revelation that on a very basic level,

science is a remix culture

Traditionally, much of science has been the exclusive domain of professional scientists, who typically go to great lengths to cite prior work that is related to the experiments and results they report in peer-reviewed publications (indeed, some of the peers reviewing work submitted for publication are among those who are - or [feel they] should be - cited). With the rare exceptions of paradigm shifts, most of science is incremental in nature, and each increment represents a remix with a few added ingredients.

Vrm_header_stacked187 There are several promising signs that people without PhDs, MDs and other "terminal" credentials can participate more fully in the scientific discovery and dissemination process. I enumerated several of these efforts in an earlier post on platform thinking, but in the context of health and medicine - and Harvard - I do want to mention Doc Searls' recent post on Patient-driven health care in which he expands the idea of the patient as a platform and mentions efforts by , Jon Lebkowsky and to promote a vendor relationship management (VRM) model in which patients - and the data about their conditions - will be better able to participate in peer-to-peer collaborations with health care and health information technology professionals.

Lessig laments the current system in which authors - and peer reviewers - of scientific publications do much of the work for free, while for-profit publishers derive nearly of the financial benefits, and do so through restricting access to the knowledge produced by the authors. Given that much of the data used in the experiments reported in professional medical publications comes from patients (the PatientsLikeMe and TuDiabetes studies being particularly notable examples), it makes all the more sense to make the results of these experiments available to all patients ... and at some point, we all are - or will be - patients who might benefit from universal access to this knowledge.

Innovation, Research & Reviewing: Revise & Resubmit vs. Rebut for CSCW 2012

cscw2012-logo Research is about innovation, and yet many aspects of the research process often seem steeped in tradition. Many conference program committees and journal editorial boards - the traditional gatekeepers in research communities - are composed primarily of people with a long history of contributions and/or other well-established credentials, who typically share a collective understanding of how research ought to be conducted, evaluated and reported. Some gatekeepers are opening up to new possibilities for innovations in the research process, and one such community is the program committee for CSCW 2012, the ACM Conference on Computer Supported Cooperative Work ... or as I (and some other instigators) like to call it, Computer-Supported Cooperative Whatever.

This year, CSCW is introducing a new dimension to the review process for Papers & Notes [deadline: June 3]. In keeping with tradition, researchers and practitioners involved in innovative uses of technology to enable or enhance communication, collaboration, information sharing and coordination are invited to submit 10-page papers and/or 4-page notes describing their work. The CSCW tradition of a double-blind review process will also continue, in which the anonymous submissions are reviewed by at least three anonymous peers (the program committee knows the identities of authors and reviewers, but the authors and reviewers do not know each others' respective identities). These external reviewers assess the submitted paper or note's prospective contributions to the field, and recommend acceptance or rejection of the submission for publication in the proceedings and presentation at the conference. What's new this year is an addition to the traditional straight-up accept or reject recommendation categories: reviewers will be asked to consider whether a submission might fit into a new middle category, revise & resubmit.

CSCW, CHI and other conferences have enhanced their review processes in recent years by offering authors an opportunity to respond with a rebuttal, in which they may clarify aspects of the submission - and its contribution(s) - that were not clear to the reviewers [aside: I recently shared some reflections on reviews, rebuttals and respect based on my experience at CSCW and CHI]. For papers that are not clear accepts (with uniformly high ratings among reviewers) - or clear rejects (uniformly low ratings) - the program committee must make a judgment call on whether the clarifications proposed in a rebuttal would represent a sufficient level of contribution in a revised paper, and whether the paper could be reasonably expected to be revised in the short window of time before the final, camera-ready version of the paper must be submitted for publication. The new process will allocate more time to allow the authors of some borderline submissions the opportunity to actually revise the submission rather than limiting them to only proposing revisions.

As the Papers & Notes Co-Chairs explain in their call for participation:

Papers and Notes will undergo two review cycles. After the first review a submission will receive either "Conditional Accept," "Revise/Resubmit," or "Reject." Authors of papers that are not rejected have about 6 weeks to revise and resubmit them. The revision will be reviewed as the basis for the final decision. This is like a journal process, except that it is limited to one revision with a strict deadline.

The primary contact author will be sent the first round reviews. "Conditional Accepts" only require minor revisions and resubmission for a second quick review. "Revise/Resubmits" will require significant attention in preparing the resubmission for the second review. Authors of Conditional Accepts and Revise/Resubmits will be asked to provide a description of how reviewer comments were addressed. Submissions that are rejected in the first round cannot be revised for CSCW 2012, but authors can begin reworking them for submission elsewhere. Authors need to allocate time for revisions after July 22, when the first round reviews are returned [the deadline for initial submissions is June 3]. Final acceptance decisions will be based on the second submission, even for Conditional Accepts.

Although the new process includes a revision cycle for about half of the submissions, community input and analysis of CSCW 2011 data has allowed us to streamline the process. It should mean less work for most authors, reviewers, and AC members.

The revision cycle enables authors to spend a month to fix the English, integrating missing papers in the literature, redoing an analysis, or adopt terminology familiar to this field, problems that in the past could lead to rejection. It also provides the authors of papers that would have been accepted anyway to fix minor things noted by reviewers.

This new process is designed to increase the number and diversity of papers accepted into the final program. Some members of the community - especially those in academia - may be concerned that increasing the quantity may decrease the [perceived] quality of submissions, i.e., instead of the "top" 20% of papers being accepted, perhaps as many as 30% (or more) may be accepted (and thus the papers and notes that are accepted won't "count" as much). However, if the quality of that top 30% (or more) is improved through the revision and resubmission process, then it is hoped that the quality of the program will not be adversely affected by the larger number of accepted papers presented there ... and will actually be positively affected by the broader range of accepted papers.

I often like to reflect on Ralph Waldo Emerson's observation:

All life is an experiment. The more experiments you make the better.

If research - and innovation - is about experimentation, then it certainly makes sense to experiment with the ways that experiments are assessed by the research communities to which they may contribute new insights and knowledge.

BeingWrongBook There is a fundamental tension between rigorous validation and innovative exploration. Maintaining high standards is important to ensuring the trustworthiness of science, especially in light of the growing skepticism about science among some segments of the public. But scientists and other innovators who blaze new trails often find it challenging to validate their most far-reaching ideas to the satisfaction of traditional gatekeepers, and so many conferences and journals tend to be filled with more incremental - and more easily validatable - results. This is not necessarily a bad thing, as many far-reaching ideas turn out to be wrong, but I increasingly believe that all studies and models are wrong, but some are useful, and so opening up new or existing channels for reviewing and reporting research will promote greater innovation.

I'm encouraged by the breadth and depth of conversations, conversions and alternatives I've encountered regarding research and its effective dissemination, including First Monday, arXiv and alt.chi. At least one other ACM-sponsored research community - UIST (ACM Symposium on User Interface Software & Technology) - is also considering changes to their review process; Tessa Lau recently wrote about that in a blog post at the Communications for the ACM, Rethinking the Systems Review Process (which, unfortunately, is now behind the ACM paywall ... another issue relevant to disseminating research). The prestigious journal, Nature, recently wrote about the ways social media is influencing scientific research in an article on Peer Review: Trial by Twitter.

I think it is especially important for a conference like CSCW that is dedicated to innovations in communication, collaboration, coordination and information sharing (which [obviously] includes social media) to be experimenting with alternatives, and I look forward to participating in the upcoming journey of discovery. And in the interest of full disclosure, one way I am participating in this journey is as one of the Publicity Co-Chairs for CSCW 2012, but I would be writing about this innovation even if I were not serving in that official capacity.

[Update: Jonathan Grudin, one of the CSCW 2012 Papers & Notes Co-Chairs, has written an excellent overview of the history and motivations of the revise and resubmit process in a Communications of the ACM article on Technology, Conferences and Community: Considering the impact and implications of changes in scholarly communication.]

Reflections on Reviews, Rebuttals and Respect

image from image from Having recently served as associate chair for both the CSCW 2011 and CHI 2011 Papers & Notes Committees, I've read a large number of papers, an even larger number of reviews, and a slightly smaller number of rebuttals. In participating in back-to-back committees, a few perspectives and practices that impact the process of scientific peer review have become clearer to me, and I wanted to share a few of those here. I believe all of these boil down to a matter of mutual respect among the participants, and wanted to delve more deeply into some resources that offer guidelines for respectful practices.

TheFourAgreements I want to start out with a brief review of The Four Agreements, by don Miguel Ruiz, as I believe they provide a strong foundation for how to best approach the review process, as well as other areas of life and work (and I'll include links to earlier elaborations on three of the four agreements):

  1. Be Impeccable With Your Word: Speak with integrity. Say only what you mean. Avoid using the word to speak against yourself or to gossip about others. Use the power of your word in the direction of truth and love.
  2. Don't Take Anything Personally: Nothing others do is because of you. What others say and do is a projection of their own reality, their own dream. When you are immune to the opinions and actions of others, you won't be the victim of needless suffering.
  3. Don't Make Assumptions: Find the courage to ask questions and to express what you really want. Communicate with others as clearly as you can to avoid misunderstandings, sadness and drama. With just this one agreement, you can completely transform your life.
  4. Always Do Your Best: Your best is going to change from moment to moment; it will be different when you are healthy as opposed to sick. Under any circumstance, simply do your best, and you will avoid self-judgment, self-abuse and regret.

I see examples of these agreements being violated throughout all aspects of the review process. Reviewers say hurtful things about authors, their work and/or their papers in their reviews and/or online discussions. Some reviewers appear personally offended that authors would have the audacity to submit a paper the reviewers judge to be unworthy. Many reviews reflect implicit or explicit assumptions the reviewers are making about the paper, the work described by the paper, and/or the authors who have written the paper. Some reviews are so short that I have a hard time believing that the reviewers are really doing their best in fully applying their skills and experience to help us make the best possible decision on a paper (but I acknowledge this is an assumption).

image from Another framework that I believe is helpful to apply in this context is nonviolent communication (NVC), which is predicated on the assumption that everything we do is an attempt to meet our human needs, that conflict sometimes arises through the miscommunication of those needs, and that further conflict can be avoided by refusing to use coercive or manipulative language that is likely to induce fear, guilt, shame, praise, blame, duty, obligation, punishment, or reward. The Wikipedia entry for nonviolent communication offers four steps (that are very similar to some earlier distinctions I'd written about between data, judgments, feelings and wants):

  • making neutral observations (distinguished from interpretations/evaluations e.g. "I see that you are wearing a hat while standing in this building."),
  • expressing feelings (emotions separate from reasons and interpretation e.g. "I am feeling puzzled"),
  • expressing needs (deep motives e.g. "I have a need to learn about other people's motives for doing what they do") and
  • making requests (clear, concrete, feasible and without an explicit or implicit demand e.g. "Please share with me, if you are willing, your reasons for wearing the hat in this building.")

Drawing on both of these sources for inspiration, ideally, a well-written review would have the following characteristics:

  • Focus on the paper, vs. the underlying work or the authors. All comments address [only] what is written in the paper. They should not address the work described by the paper or the authors who have written the paper. In a blind review process, reviewers typically do not have first-hand knowledge of the work described in the paper beyond what is written; reviewers who do have first-hand knowledge should recuse themselves due to a conflict of interest (i.e., they were co-authors or collaborators on the work). Thus, any comments about the work (vs. what is written about the work) are based on assumptions.
  • Follow the principles of non-violent communication (NVC). In particular, use "I" statements wherever possible, and void any direct references to the authors. For example, rather than saying "You don't say how you do X", an NVC phrasing might be something more like "It is not clear to me from the paper how X was done", or rather than saying "Why didn't you do X?", re-phrasing this as "I believe this or a future paper would be strengthened if it included X, or at least a compelling argument as to why X was not done".
  • Be compassionate and generous. Assume that the authors were doing their best in composing the paper, and look for reasons to accept in addition to reasons to reject (the latter usually being more readily identified by people trained in critical thinking). I was particularly inspired by the use of generosity in the directives issued by the CHI 2011 Papers & Notes Chairs at the committee meeting. Perhaps it's the proximity to the holiday season, but I found the use of that term more resonant throughout the meetings than the more traditional (and technical) "reasons to accept" that are often promoted by chairpersons.
  • Reverse the golden rule. The golden rule is "Do unto others as you would have them do unto you". A variation on this theme - which I first encountered in a book about positive psychology called How Full is Your Bucket? - is "Do unto others as they would have you do unto them." Particularly in a multi-disciplinary conference, different norms may be at work. I've had some strong disagreements with reviewers who are used to receiving terse and potentially offensive reviews, who implicitly apply the golden rule and figure if they can take it in reviews of their own papers, so should the authors whose papers they are reviewing. I always try to convince them to break the cycle of violent communication, with varying degrees of success. In a blind review process, of course, reviewers don't know the identities of the authors, and so can't really know how they would "have you do unto them". But I believe it is best to err on the side of nonviolence.

The rebuttal process also offers an opportunity for applying these practices. I won't go into as many details about the rebuttals, but I will say that if there was a category for "best rebuttal" (along the lines of "excellent reviews" and "best paper awards"), I saw two rebuttals among the papers we discussed that were outstanding exemplars of effective rebuttals. These had several factors in common:

  • a heartfelt expression of gratitude for the constructive feedback provided by the reviewers (and the reviews for these submissions were excellent)
  • the correct, gracious and effective identification of misinterpretations by reviewers, and a gentle articulation of the intended interpretation
  • an honest acknowledgment of correctly identified errors or omissions by the reviewers, and an explicit statement of how these would be addressed in a revision (if accepted)

I also witnessed some angry rebuttals, some of which included disparaging remarks about the committee and/or the conference community, none of which had any positive influence on the ultimate decision made on those papers. I won't go into any further details, as I do not believe that would be constructive. However, I would encourage all authors to wait at least 24 hours after they recieve their reviews to even start composing their responses, as I believe this will lead to a more constructive engagement.

Due to the desire to respect confidentiality agreements, I won't disclose any specific reviews or rebuttals from the CSCW or CHI conferences as positive or negative examples, but I will conclude with a few rather extreme examples of negativity - which are so extreme they are humorous - in a blog post on Twisted Bacteria about peer review of scientific papers:

  • This paper is desperate. Please reject it completely and then block the author’s email ID so they can’t use the online system in future.
  • The biggest problem with this manuscript, which has nearly sucked the will to live out of me, is the terrible writing style.
  • The writing and data presentation are so bad that I had to leave work and go home early and then spend time to wonder what life is about.
  • The finding is not novel and the solution induces despair.

There are several more examples of violations of The Four Agreements and the principles of nonviolent communication available at Twisted Bacteria, and I'm grateful that the reviews I've seen (and written) in the CSCW and CHI communities do not reflect the extreme expressions found in this selection from the environmental microbiology community.

I hope that highlighting some of the more positive and constructive approaches one might take to peer reviewing (and rebutting) will promote a more mindful, respectful and effective process for all participants.

Virtual Reality, Somatic Cognition, Homuncular Flexibility and Object-Centered Sociality and Learning

VirtualReality Jaron Lanier recently wrote about virtual reality and its potential application to learning, utilizing some evocative terms and offering an educational scenario that reminds me of a seminal 1997 paper that described how a Nobel prize-winning biologist fused with her objects of study. The Saturday Wall Street Journal article gave me a keener appreciation for the potential applications of virtual reality (VR) - immersive computer-generated environments that model real or imaginary worlds - and for the pervasiveness of object-centered sociality, a concept I first encountered via Jyri Engestrom.

Crane-sm6 Lanier's article is about new frontiers for avatars - "movable representations of ourselves in cyberspace" - and how they can be used to manifest somatic cognition - the mapping of human body motion "into a theater or thought and strategy not usually available to us" in which one's hands (or presumably, other body parts) can solve complicated puzzles more quickly than one's head (or conscious mind). The examples he gives of somatic cognition outside the realm of virtual reality include professional musicians, athletes, surgeons and pilots, and I found myself thinking of a documentary I saw years ago on heavy machinery, and the way that a crane operator who was interviewed described the bewildering array of levers as virtual extensions of his arms and hands.

After describing a software bug in an early VR system that gave his humanoid avatar a gigantic hand, Lanier generalizes homuncular flexibility as a more general principle: "people can learn to inhabit other bodies not just with oddly shaped limbs [gigantic hands], or limbs attached in unfamiliar places, but even bodies with different numbers of limbs [lobster avatars]". Dean Eckles generalizes this notion even further - in a 2009 blog post reviewing a 2006 article by Lanier on homonucular flexibility (which offers more details about the lobster) - to distal attribution: our propensity for attributing sensory perceptions to internal or external - or proximal or more distant - sources.

However, it is Lanier's reference to an experiment with elementary school children being turned into the things they were studying that I found most interesting [although I have not been able to track down the reference]:

Some [students] were turned into molecules, dancing and squirming to dock with other molecules. In this case the molecule serves the role of the piano, and instead of harmony puzzles, you are learning chemistry. Somatic cognition offers an overwhelming emotional appeal for education, because it leverages vanity. You become the thing you are studying. Your sensory motor loop is modified to incorporate the logic of a science, and you develop body intuition about that logic.

This idea of fusing or becoming one with the object of study is one of the two primary manifestations of object-centered sociality articulated in Karin Knorr Cetina's seminal paper, "Sociality with Objects: Social Relations in Postsocial Knowledge Societies", [Theory, Culture & Society, 1997, Vol. 14(4):1-30]. As I noted in an earlier post on place-centered sociality, the other manifestation of object-centered sociality - sociality (interactions and relationships) through objects, such as online photos, videos or even blog posts - is better known, at least among many of those who study online social media (and mediation). But Lanier's article evokes the manifestation of sociality with objects themselves, reminding me of what I earlier wrote about Knorr Cetina's articulation of how this can promote deeper investigation and learning:

[Knorr Cetina] looks specifically at knowledge objects, and how they are increasingly produced by specialists and experts rather than through a broader form of participatory interpretation. She argues that experts' relationships with knowledge objects can be best characterized by a the notion of lack and a corresponding structure of wanting [emphasis hers] because these objects "seem to have the capacity to unfold indefinitely": new results that add to objects of knowledge have the side effect of opening up new questions. This perpetual unfolding gives rise to "a libidinal dimension or dimension of knowledge activities" - an "arousal" and "deep emotional investment" - by the person studying the knowledge object. As an example, she describes the way that biologist Barbara McClintock, who won the Nobel Prize for her discovery of genetic transposition, would totally immerse herself in her study of plant chromosomes, identifying with the chromosomes and imagining how they might see the world - evoking an image (for me) of object-centered empathy more than sociality.

Kinect The prospect of empowering future Nobel laureates with virtual reality technology to engage with and virtually embody objects of knowledge at an early age is very exciting. Lanier mentions the Kinect camera for Xbox 360 made by Microsoft (his employer), which will likely put virtual reality technology in the hands (or homes) of millions of people in the near future.

The primary emphasis of Kinect marketing is on fun and games, but based on Lanier's article, and Knorr Cetina's insights into object-centered learning, Kinect might also provide a platform for a new approach to education. In an ideal world, of course, fun and learning would not be such distinct concepts ... perhaps this new technology will help promote a new dimension of convergence in the not-too-distant future.

Creativity, Distractability and Structured vs. Unstructured Procrastination

I have been practicing structured procrastination while allowing a few blog posts to, uh, ferment a bit longer (not to mention other things I want to get done). As evidence, after reading Jonah Lehrer's recent post about unstructured procrastination - Are Distractable People More Creative? - I feel inclined to write about that, rather than finish the other partially composed posts ... not to mention other important items on my todo list. But I'll postpone writing about unstructured procrastination until I write a bit about structured procrastination.

Several years ago, I encountered Stanford Philosophy Professor John Perry's inspiring account of structured procrastination, which offers a more elaborate and erudite rationalization of a practice that I'd previously justified by way of British mathematician and philosopher Bertrand Russell's famous quote:

The time you enjoy wasting is not wasted time.

image from Perry defines structured procrastination as a practice in which one chooses to postpone working on the most important thing(s) one needs to do by working on other, less important, things. He finds that he can be tremendously productive by this dynamic prioritization, getting all kinds of things done while avoiding the thing(s) he thinks he should really be doing.

I have been intending to write this essay for months. Why am I finally doing it? Because I finally found some uncommitted time? Wrong. I have papers to grade, textbook orders to fill out, an NSF proposal to referee, dissertation drafts to read. I am working on this essay as a way of not doing all of those things. This is the essence of what I call structured procrastination, an amazing strategy I have discovered that converts procrastinators into effective human beings, respected and admired for all that they can accomplish and the good use they make of time. All procrastinators put off things they have to do. Structured procrastination is the art of making this bad trait work for you. The key idea is that procrastinating does not mean doing absolutely nothing. Procrastinators seldom do absolutely nothing; they do marginally useful things, like gardening or sharpening pencils or making a diagram of how they will reorganize their files when they get around to it. Why does the procrastinator do these things? Because they are a way of not doing something more important. If all the procrastinator had left to do was to sharpen some pencils, no force on earth could get him do it. However, the procrastinator can be motivated to do difficult, timely and important tasks, as long as these tasks are a way of not doing something more important.

Structured procrastination means shaping the structure of the tasks one has to do in a way that exploits this fact. The list of tasks one has in mind will be ordered by importance. Tasks that seem most urgent and important are on top. But there are also worthwhile tasks to perform lower down on the list. Doing these tasks becomes a way of not doing the things higher up on the list. With this sort of appropriate task structure, the procrastinator becomes a useful citizen. Indeed, the procrastinator can even acquire, as I have, a reputation for getting a lot done.

Drive-DanielPink Gtdcover Although Perry doesn't describe it this way, having read and written about Dan Pink's book, Drive: The Surprising Truth About What Motivates Us (in the same post - ironically in this context - that I also wrote about David Allen's book, Getting Things Done ... which I still haven't read), I believe that Perry's practice of structured procrastination may be an unconscious prioritization of intrinsically motivating tasks over extrinsically motivated tasks: choosing to do things he wants to do, such as writing the essay, while postponing other tasks that others want him to do, such as grading papers or ordering textbooks. And as Pink points out, through his review of several studies, intrinsic motivations typically win out over extrinsic motivations. [Note that I do not mean to imply that Pink promotes or even condones structured procrastination; I'm quite sure Allen would not.]

Returning to Lehrer's rumination on the costs and benefits of distraction, he defines latent inhibition - the capacity to ignore stimuli that seem irrelevant - and cites a 2003 study showing that decreased latent inhibition is associated with increased creative achievement in high-functioning individuals, i.e., people who are more distractable may also be more creative. However, he points out that the study includes the important caveat that "low latent inhibition only leads to increased creativity when it’s paired with a willingness to analyze our excess of thoughts, to constantly search for the signal amid the noise" [and I'll note that one of my fermenting posts is all about signal vs noise]. Having recently been inspired by Lehrer's Metacognitive Guide to College, I'm glad he is not promoting distractability ... or, at least, not promoting unrestricted or unstructured distracability.

I would define distractability as a form of unstructured procrastination. Whereas structured procrastination is working on - or attending to - things that are important, but not the most important things, unstructured procrastination may involve attending to things that are not important at all (i.e., completely irrevelevant). Indeed, this blog post itself may be more of an example of unstructured rather than structured procrastination ... but I'm going to postpone further consideration of that train of thought ... and having indulged my impulse to fire off a quick blog post, I will turn my attention back to other, potentially more important, tasks.

Jonah Lehrer's Metacognitive Guide to College

HowWeDecide Jonah Lehrer, the 27 year old author of How We Decide, gave the Opening Days convocation keynote at Willamete University last Friday. After being introduced by Willamette president M. Lee Pelton as "a humanist disguised as a neuroscientist", Lehrer offered a fun and fascinating whirlwind tour of neuroscience, psychology and sociology, in the context of a 5-point guide to how to succeed in (and through) college. Having attended several convocations both as a student and a faculty member, I would rank his keynote as one of the best I've ever heard, rivaled only by one I heard in 1986, by Theodor Seuss Geisel (aka Dr. Seuss), when he received an honorary doctorate at the University of Hartford (so he really was a "doctor").

Leading off with a story demonstrating the ephemeral nature of many "great truths" (Oliver Wendel Holmes, Sr., discovering the great "truth" that the world smells like turpentine - specifically, "a strong smell of turpentine prevails throughout" - while on a nitrous oxide-induced hallucinogenic journey), Lehrer assured the new students that they will regularly encounter profound truths and discover new ideas ... few of which will have impact lasting beyond 72 hours, and nearly all of which will be forgotten soon after they finish college. The real value of a college education is learning how to think ... and to promote this process, he offered 5 tips.

Be an outsider is a platform for crowdsourcing research and development that succeeds primarily through the participation of outsiders. Companies post problem descriptions and offer prizes for solutions, and individuals and organizations outside the company submit potential solutions. Lehrer quoted a Harvard Business School study reporting that 60% of the posted problems are solved within 6 months, and that the key to solutions is being on the outside, i.e., being able to look at the problem from an outsider's perspective.

I'm not sure which study he is referencing (I haven't read his book yet), but I did find a related report on an Innovation Network conference:

InnoCentive now boasts 175,000 "solvers" from more than 200 countries around the world. About 90% are individuals, 10% are organizations and 60% have masters degrees or PhDs. Last year, nearly 50% of the "challenges" posted on InnoCentive's web site generated a solution that was put to use.

Academics who polled InnoCentive's winning solvers discovered something "both startling and intuitively obvious," said Spradlin. "What they found was that typically ... the background of the solver who solved the problem" was "no less than six disciplines away" from the subject area in which the problem emerged. "What that means is, if all the Stanford PhDs in your chemistry lab could have solved the problem, they would have solved it already."

Lehrer reported that English poet Samuel Taylor Coleridge used to tell people he attended public lectures on chemistry in London "to improve my stock of metaphors", and encouraged students to take at least one class each semester outside of their field ... and "don't be afraid to be the lonely poet in chem class".

Learn how to relax

image from Lehrer described a study on people solving compound remote associates problems, for which Lehrer suggested the evocative acronym "CRAP". Another acronym, "RAT" (remote associates test), is more commonly associated with these kinds of problems - often posed on the Sunday Puzzle on NPR - in which three words are presented and the problem is to find a fourth word that relates to all of them (e.g., given the problem "broken, clear, eye", the solution is "glass"). The study revealed that the "flash of insight" or "Aha!" moment that occurs immediately before a solution can be reliably detected via functional magnetic resonance imaging (fMRI) and electroencephalography (EEG), and the alpha wave pattern closely resembles that of someone who has experience in meditation, i.e., someone who is able to achieve states of deep relaxation.

Contrary to the intuition many of us have when faced with a hard problem, which is to focus on the problem as hard as we can (I imagine this is why they are called "hard problems"), the solution in many cases is to simply relax and temporarily turn our attention to other things, and allow the solution to emerge more organically. I was reminded of one of my favorite lines of poetry, by Wallace Stevens:

Perhaps the truth depends on a walk around a lake.

Another observation by Lehrer - the brain knows more than you know, you just have to listen - reminded me of the way yet another poet, David Whyte, describes poetry:

Poetry is the art of overhearing yourself say things you didn't know you knew.

But I digress. Shifting from poetry to technology - and back to Lehrer's speech - Lehrer suggested that one of the most effective ways of listening to what you know is to turn off the gadgets that constantly inundate us with what others are saying ... reminding me of what Sherry Turkle, Kathy Sierra, James Surowiecki, Malcolm Gladwell, James Ogilvy, Dan Oestreich and other great thinkers have said about self-reflection vs. self-expression, and the recent New York Times article on digital devices deprive brain of needed downtime.

Make friends with lots of different people

ConsequentialStrangers Lehrer described the self-similarity principle (or perhaps homophily) as a natural tendency to associate with people who are like us (and avoid people who are not like us), and suggested that students guard against this tendency. A study by sociologist Martin Ruef and his colleagues at Princeton, in which they interviewed 600 entrepreneurs, revealed that the entrepreneurs with the highest informational entropy (i.e., most diverse social networks) were the most successful, and that the propensity to strike up conversation with potentially consequential strangers was a key indicator of this quality. The researchers estimate that entrepreneurs with highly entropic networks were 3 times more innovative than those with low entropy networks (though innovation is a notoriously difficult concept to measure).

College is a great place to forge new connections with a broad range of people, and so Lehrer encouraged students to take advantage of the opportunity to diversify their social networks ... which will seve them well long after they've forgotten all (or most of) the facts they will have learned while in school.

Don't eat the marshmallow

image from Another variation on the theme of intent focus vs. relaxation - or, at least, distraction - was illuminated through the story of the marshmallow task, which Lehrer wrote about in a New Yorker article on the secret of self control last May. Stanford psychology professor Walter Mischel conducted experiments with four year olds at the Bing Nursery School, including one named Carolyn, to explore delayed gratification:

Carolyn was asked to sit down in the chair and pick a treat from a tray of marshmallows, cookies, and pretzel sticks. Carolyn chose the marshmallow. ... A researcher then made Carolyn an offer: she could either eat one marshmallow right away or, if she was willing to wait while he stepped out for a few minutes, she could have two marshmallows when he returned. He said that if she rang a bell on the desk while he was away he would come running back, and she could eat one marshmallow but would forfeit the second. Then he left the room [for about 15 minutes].

Only 30% of the children were able to delay gratification for the full 15 minutes; the average delay of gratification was about 2 minutes. 13 years later, Mischel conducted extensive followup surveys to discover how the 600+ children had fared. The high delayers - those who were able to distract themselves for the full 15 minutes - averaged 200 points higher on the SAT, on average, than the low delayers - those who were unable to shift their attention to anything but the marshmallow, and succumbed to temptation within 30 seconds.

Lehrer instructed the students that "your task for the next four years is to learn how to control your attention. You control the spotlight" - use it wisely.


BeingWrongBook Elaborating on a theme invoked by Dean Darlene Moore during the opening remarks to the event - in which she emphasized the primacy of the journey over the destination - Lehrer invited students to fully appreciate the experience of a college education. Highlighting the importance of embracing wrongology, Lehrer offered a great anecdote:

You get to share your opinion on Hamlet, and write long essays about how Plato, the guy who blew your mind last week, was actually wrong about everything.

In my own experience as a philosophy major years ago (and continuing ever since), education is about learning things, and then unlearning things; discovering a great truth, and then discovering that its opposite is [also] true. I can understand the appeal of fundamentalism, in clinging tenaciously to beliefs no matter what facts may present themselves, especially as fears, uncertainties and doubts are promulgated by those who would deign to decide for us, but I don't think we can learn much when we are not willing to be in the question(s).

image from upload.wikimedia.orgSpeaking of questions, during the question & answer period following his talk, my favorite question was by a student who asked how Lehrer figures out which questions to ask (or pursue). He answered that he wrote a book about decisions primarily because he is pathologically indecisive, and generally tends to begin with his own frustrations ... mirroring my own tendency toward what I like to call irritation-based research ... or what Eric Raymond, author of The Cathedral and the Bazaar, describes in the context of open source programming:

Every good work of software starts by scratching a developer's personal itch.

In closing, I want to acknowledge that I have not yet read Lehrer's book, How We Decide, but as I noted in my earlier post on the warm welcome we enjoyed throughout Willamette Opening Days, my daughter, Meg, read the book over the summer, and after his speech she told me that many of the examples are covered more extensively in the book, which is next on my stack of "to-reads".

image from Finally, I want to loop back to some introductory remarks made by President Pelton, in which he quoted E. O. Wilson, the multidisciplinary scientist sociobiologist who contributed much to our understanding of ant colonies (and other societies and systems):

We are drowning in information, while starving for wisdom. The world henceforth will be run by synthesizers, people able to put together the right information at the right time, think critically about it, and make important choices wisely.

Lehrer, like Wilson, is clearly a great synthesizer, and I hope his convocation keynote - and the subsequent scope of a liberal arts education at Willamette - will help inspire a future generation of synthesizers, critical thinkers and wise decision-makers.

All models, studies and Wikipedia entries are wrong, some are useful

A sequence of encounters with various models, studies and other representations of knowledge lately prompted me to reflect on both the inherent limitations and the potential uses of these knowledge representations ... and the problems that ensue when people don't fully appreciate either their limitations or applications ... or the inherent value of being wrong.

ScienceNewsCycle Daniel Hawes, an Economics Ph.D. student at the University of Minnesota, analyzed the Science Secret for Happy Marriages, examining a study correlating comparative attractiveness of spouses and the happiness of marriages. He notes that many reports of the "result" - the prettier a wife in comparison to her husband the happier the marriage - did not note the homogeneity of the population, particularly the early stage of marriage for most subjects in the study, the lack of control for inter-rater variability in measuring attractiveness and happiness, or the potential influences of variables beyond attractiveness and happiness. These limitations were reported in the original study, but not in subsequent re-reports, leading Hawes to reference a very funny PHD Comics parody of The Science News Cycle and conclude with the rather tongue-in-cheek disclaimer:

This blog post was sponsored by B.A.D.M.S (Bloggers against Data and Methods Sections) in honor of everybody who thinks (science) blogs should limit themselves to reporting correlations (and catchy post titles).

A while later, in a blog post about his Hyptertext 2010 keynote on Model-Driven Research in Social Computing, former University of Minnesota Computer Science Ph.D. student and current PARC Area Manager and Principal Scientist Ed Chi offered a taxonomy of models - descriptive, explanatory, predictive, prescriptive and generative - and an iterative 4-step methodology for creating and applying models in social computing research - characterization, modeling, prototyping and evaluation. Most relevant in the context of this post, he riffed on an observation attributed to George Box

all models are wrong, but some are useful

All models - and studies - represent attempts to condense or simplify data, and any such transformations (or re-presentations) are always going to result in some data loss, and so are inherently wrong. But wrong models can still be useful, even - or perhaps particularly - if they simply serve to spark challenges, debate ... and further research. As an example, Ed notes how Malcolm Gladwell's "influentials theory", in which an elite few act as trend setters, was useful in prompting Duncan Watts and his colleagues to investigate further, and create an alternative model in which the connected many are responsible for trends. More on this evolution of models can be found in Clive Thompson's Fast Company article, Is the Tipping Point Toast?

BeingWrongBook Over the next few weeks, I encountered numerous other examples of wrongness, limitations, challenges and debate:

My most significant recent encounter with wrongness, limitations and debate was via Susannah Fox, Associate Director at the Pew Internet & American Life Project and a leading voice in the Health 2.0 community, who offered a Health Geek Tip: Abstracts are ads. Read full studies when you can. She describes several examples of medical studies whose titles or abstracts may lead some people - medical experts and non-experts alike - to make incorrect assumptions and draw unwarranted conclusions.

SanjayGupta In one case, “a prime example of the problem with some TV physician-'journalists'”, publisher Gary Schwitzer criticized Dr. Sanjay Gupta's proclamation that an American Society of Clinical Oncology study showed that "adding the drug Avastin to standard chemotherapy 'can slow the spread of [ovarian] cancer pretty dramatically'" as a dramatically unwarranted claim not supported by the study. I won't go into further details about this example, except to note with some irony that I had mentioned Dr. Gupta in my previous post about The "Boopsie Effect": Gender, Sexiness, Intelligence and Competence, in which he had complained that being named one of People Magazine's sexiest men had undermined his credibility ... and it appears that several people quoted in Schwitzer's blog post as well as in the comments are questioning Dr. Gupta's credibility, though I don't see any evidence that these doubts are related to his appearance.

My favorite example, even richer in irony, is what Susannah initially referred to as "an intriguing abstract that begs for further study: Accuracy of cancer information on the Internet: A comparison of a Wiki with a professionally maintained database". Another Health 2.0 champion, Gilles Frydman, tweeted a couple of questions about the study, regarding which types of cancers were covered and which version of the professionally maintained database was used. I've posted a considerable amount of cancer information on the Internet myself (a series of 19 blog posts about my wife's anal cancer), and I've long been fascinated with the culture and curation of Wikipedia, so I decided to investigate further.

ASCO The original pointer to the abstract came from a Washington Post blog post about "Wikipedia cancer info. passes muster", based on a study that was presented at the American Society of Clinical Oncology (ASCO). The post includes an interview with one of the study's authors, Yaacov Lawrence. I called Dr. Lawrence, and he was kind enough to fill me in on some of the details, which I then shared in a comment on Susannah's post. The study in question was presented as a poster - not a peer-reviewed journal publication - and represents an early, and rather limited, investigation into the comparative accuracy of Wikipedia and the professionally maintained database. At the end of our conversation, I promised to send him some references to other studies of the accuracy of Wikipedia, and suggested that the Health 2.0 community may be a good source of prospective participants in future studies.

Wikipedia But here's the best part: while searching for references, in the Wikipedia entry on the Reliability of Wikipedia, under the section on Science and medicine peer reviewed data, I found the following paragraph:

In 2010 researchers at Kimmel Cancer Center, Thomas Jefferson University, compared 10 types of cancer to data from the National Cancer Institute's Physician Data Query and concluded "the Wiki resource had similar accuracy and depth to the professionally edited database" and that "sub-analysis comparing common to uncommon cancers demonstrated no difference between the two", but that ease of readability was an issue.

And what is the reference cited for this paragraph? The abstract for the poster presented at the meeting:

Rajagopalan et al (2010). "Accuracy of cancer information on the Internet: A comparison of a Wiki with a professionally maintained database.". Journal of Clinical Oncology 28:7s, 2010. Retrieved 2010-06-05.

So it appears we have yet another example of a limited study - that was not peer-reviewed - being used to substantiate a broader claim on the accuracy of Wikipedia articles on" Science and medicine peer reviewed data" ... in a Wikipedia article on the topic of Reliability of Wikipedia. Perhaps someone will eventually edit the entry to clarify the status of the study. In any case, I find this all rather ironic.

As with the other examples of "wrong" models and limited studies, I believe that this study has already been useful in sparking discussion and debate within the Health 2.0 community, and I'm hoping that some of the feedback from the Health 2.0 community - and perhaps other researchers who have more experience in comparative studies of Wikipedia accuracy - will lead to more research in this promising area.

[Update, 2010-09-01: I just read and highly recommend a relevant and somewhat irreverent article by Dave Mosher, The Ten Commandments of Science Journalism.]

[Update, 2011-03-16: I just read and highly recommend another relevant article on wrongness and medicine: Lies, Damn Lies and Medical Science, by David H. Freeman in the November 2010 edition of The Atlantic.]

[Update, 2011-04-21: Another relevant and disturbing post: Lies, Damn Lies and Pharma Social Media Statistics on Dose of Digital by Jonathan Richman.]

[Update, 2012-02-23: John P. A. Ioannidis offers an explanation for Why Most Published Research Findings are False in a 2005 PLoS Medicine article.]