Previous month:
September 2013
Next month:
November 2013

October 2013

The Scientific Method: Cultivating Thoroughly Conscious Ignorance

Ignorance_HowItDrivesScience_StuartFirestein_coverStuart Firestein brilliantly captures the positive influence of ignorance as an often unacknowledged guiding principle in the fits and starts that typically characterize the progression of real science. His book, Ignorance: How It Drives Science, grew out of a course on Ignorance he teaches at Columbia University, where he chairs the department of Biological Sciences and runs a neuroscience research lab. The book is replete with clever anecdotes interleaved with thoughtful analyses - by Firestein and other insightful thinkers and doers - regarding the central importance of ignorance in our quests to acquire knowledge about the world.

Each chapter leads off with a short quote, and the one that starts Chapter 1 sets the stage for the entire book:

"It is very difficult to find a black cat in a dark room," warns an old proverb. "Especially when there is no cat."

He proceeds to channel the wisdom of Princeton mathematician Andrew Wiles (who proved Fermat's Last Theorem) regarding the way science advances:

It's groping and probing and poking, and some bumbling and bungling, and then a switch is discovered, often by accident, and the light is lit, and everyone says "Oh, wow, so that's how it looks," and then it's off into the next dark room, looking for the next mysterious black feline.

Firestein is careful to distinguish the "willful stupidity" and "callow indifference to facts and logic" exhibited by those who are "unaware, unenlightened, and surprisingly often occupy elected offices" from a more knowledgeable, perceptive and insightful ignorance. As physicist James Clerk Maxwell describes it, this "thoroughly conscious ignorance is the prelude to every real advance in science."

The author disputes the view of science as a collection of facts, and instead invites the reader to focus on questions rather than answers, to cultivate what poet John Keats' calls "negative capability": the ability to dwell in "uncertainty without irritability". This notion is further elaborated by philosopher-scientist Erwin Schrodinger:

In an honest search for knowledge you quite often have to abide by ignorance for an indefinite period.

PowerOfPullIgnorance tends to thrive more on the edges than in the centers of traditional scientific circles. Using the analogy of a pebble dropped into a pond, most scientists tend to focus near the site where the pebble is dropped, but the most valuable insights are more likely to be found among the ever-widening ripples as they spread across the pond. This observation about the scientific value of exploring edges reminds me of another inspiring book I reviewed a few years ago, The Power of Pull, wherein authors John Hagel III, John Seely Brown & Lang Davison highlight the business value of exploring edges: 

Edges are places that become fertile ground for innovation because they spawn significant new unmet needs and unexploited capabilities and attract people who are risk takers. Edges therefore become significant drivers of knowledge creation and economic growth, challenging and ultimately transforming traditional arrangements and approaches.

On a professional level, given my recent renewal of interest in the practice of data science, I find many insights into ignorance relevant to a productive perspective for a data scientist. He promotes a data-driven rather than hypothesis-driven approach, instructing his students to "get the data, and then we can figure out the hypotheses." Riffing on Rodin, the famous sculptor, Firestein highlights the literal meaning of "dis-cover", which is "to remove a veil that was hiding something already there" (which is the essence of data mining). He also notes that each discovery is ephemeral, as "no datum is safe from the next generation of scientists with the next generation of tools", highlighting both the iterative nature of the data mining process and the central importance of choosing the right metrics and visualizations for analyzing the data.

Professor Firestein also articulates some keen insights about our failing educational system, a professional trajectory from which I recently departed, that resonate with some growing misgivings I was experiencing in academia. He highlights the need to revise both the business model of universities and the pedagogical model, asserting that we need to encourage students to think in terms of questions, not answers. 

W.B. Yeats admonished that "education is not the filling of a pail, but the lighting of a fire." Indeed. TIme to get out the matches.

If_life_is_a_game_these_are_the_rules_large

On a personal level, at several points while reading the book I was often reminded of two of my favorite "life rules" (often mentioned in preceding posts) articulated by Cherie Carter-Scott in her inspiring book, If Life is a Game, These are the Rules:

Rule Three: There are no mistakes, only lessons.
Growth is a process of experimentation, a series of trials, errors, and occasional victories. The failed experiments are as much a part of the process as the experiments that work.

Rule Four: A lesson is repeated until learned.
Lessons will repeated to you in various forms until you have learned them. When you have learned them, you can then go on to the next lesson.

Firestein offers an interesting spin on this concept, adding texture to my previous understanding, and helping me feel more comfortable with my own highly variable learning process, as I often feel frustrated with re-encountering lessons many, many times:

I have learned from years of teaching that saying nearly the same thing in different ways is an often effective strategy. Sometimes a person has to hear something a few times or just the right way to get that click of recognition, that "ah-ha moment" of clarity. And even if you completely get it the first time, another explanation always adds texture.

My ignorance is revealed to me on a daily, sometimes hourly, basis (I suspect people with partners and/or children have an unfair advantage in this department). I have written before about the scope and consequences of others being wrong, but for much of my life, I have felt shame about the breadth and depth of my own ignorance (perhaps reflecting the insight that everyone is a mirror). It's helpful to re-dis-cover the wisdom that ignorance can, when consciously cultivated, be strength.

[The video below is the TED Talk that Stuart Firestein recently gave on The Pursuit of Ignorance.]

 

 


An Excellent Primer on Data Science and Data-Analytic Thinking and Doing

DataScienceForBusiness_coverO'Reilly Media is my primary resource for all things Data Science, and the new O'Reilly book on Data Science for Business by Foster Provost and Tom Fawcett ranks near the top of my list of their relevant assets. The book is designed primarily to help businesspeople understand the fundamental principles of data science, highlighting the processes and tools often used in the craft of mining data to support better business decisions. Among the many gems that resonated with me are the emphasis on the exploratory nature of data science - more akin to research and development than engineering - and the importance of thinking carefully and critically ("data-analytically") about the data, the tools and overall process. 

CRISP-DM_Process_DiagramThe book references and elaborates on the Cross-Industry Standard Process for Data Mining (CRISP-DM) model to highlight the iterative process typically required to converge on a deployable data science solution. The model includes loops within loops to account for the way that critically analyizing data models often reveals additional data preparation steps that are needed to clean or manipulate the data to support the effective use of data mining tools, and how the evaluation of model performance often reveals issues that require additional clarification from the business owners. The authors note that it is not uncommon for the definition of the problem to change in response to what can actually be done with the available data, and that it is often worthwhile to consider investing in acquiring additional data in order to enable better modeling. Valuing data - and data scientists - as important assets is a recurring theme throughout the book.

DataScienceForBusiness_Figure7_2As a practicing data scientist, I find the book's emphasis on the expected value framework - associating costs and benefits with different performance metrics - to be a helpful guide in ensuring that the right questions are being asked, and that the results achieved are relevant to the business problems that motivate most data science projects. And as someone whose practice of data science has recently resumed after a hiatus, I found the book very useful as a refresher on some of the tools and techniques of data analysis and data mining ... and as a reminder of potential pitfalls such as overfitting models to training data, not appropriately taking into account null hypotheses and confidence intervals, and the problem of multiple comparisons. I've been using the Sci-Kit Learn package for machine learning in Python in my recent data modeling work, and some of the questions and issues raised in this book have prompted me to reconsider some of the default parameter values I've been using.

DataScienceForBusiness_Figure8_5The book includes a nice mix of simplified and real-world examples to motivate and clarify many of the common problems and techniques encountered in data science. It also offers appropriately simplified descriptions and equations for the mathematics that underly some of the key concepts and tools of data science, including one of the clearest definitions of Bayes' rule and its application in constructing Naive Bayes classifiers I've seen. The figures (such as the one above) add considerable clarity to the topics covered throughout the book. I particularly like the chapter highlighting the different visualizations - profit curves, lift curves, cumulative response curves and receiver operator characteristic (ROC) curves - that can be used to help compare and effectively communicate the performance of models. [Side note: it was through my discovery of Tom Fawcett's excellent introduction to ROC analysis that I first encountered the Data Science for Business book. In the interest of full disclosure, I should also note that Tom is a friend and former grad school colleague (and fellow homebrewer) from my UMass days].

The penultimate chapter of the book is on Data Science and Business Strategy, in which the authors elaborate on the importance of making strategic investments in data, data scientists and a culture that enables data science and data scientists to thrive. They note the importance of diversity in the data science team, the variance in individual data scientist capabilities - especially with respect to innate creativity, analytical acument, business sense and perseverence - and the tendency toward replicability of successes in solving data science problems, for both individuals and teams. They also emphasize the importance of attracting a critical mass of data scientists - to support, augment and challenge each other - and progressively systematizing and refining various processes as the data science capability of a team (and firm) matures ... two aspects whose value I can personally attest to based on my own re-immersion in a data science team.