Academia

The Scientific Method: Cultivating Thoroughly Conscious Ignorance

Ignorance_HowItDrivesScience_StuartFirestein_coverStuart Firestein brilliantly captures the positive influence of ignorance as an often unacknowledged guiding principle in the fits and starts that typically characterize the progression of real science. His book, Ignorance: How It Drives Science, grew out of a course on Ignorance he teaches at Columbia University, where he chairs the department of Biological Sciences and runs a neuroscience research lab. The book is replete with clever anecdotes interleaved with thoughtful analyses - by Firestein and other insightful thinkers and doers - regarding the central importance of ignorance in our quests to acquire knowledge about the world.

Each chapter leads off with a short quote, and the one that starts Chapter 1 sets the stage for the entire book:

"It is very difficult to find a black cat in a dark room," warns an old proverb. "Especially when there is no cat."

He proceeds to channel the wisdom of Princeton mathematician Andrew Wiles (who proved Fermat's Last Theorem) regarding the way science advances:

It's groping and probing and poking, and some bumbling and bungling, and then a switch is discovered, often by accident, and the light is lit, and everyone says "Oh, wow, so that's how it looks," and then it's off into the next dark room, looking for the next mysterious black feline.

Firestein is careful to distinguish the "willful stupidity" and "callow indifference to facts and logic" exhibited by those who are "unaware, unenlightened, and surprisingly often occupy elected offices" from a more knowledgeable, perceptive and insightful ignorance. As physicist James Clerk Maxwell describes it, this "thoroughly conscious ignorance is the prelude to every real advance in science."

The author disputes the view of science as a collection of facts, and instead invites the reader to focus on questions rather than answers, to cultivate what poet John Keats' calls "negative capability": the ability to dwell in "uncertainty without irritability". This notion is further elaborated by philosopher-scientist Erwin Schrodinger:

In an honest search for knowledge you quite often have to abide by ignorance for an indefinite period.

PowerOfPullIgnorance tends to thrive more on the edges than in the centers of traditional scientific circles. Using the analogy of a pebble dropped into a pond, most scientists tend to focus near the site where the pebble is dropped, but the most valuable insights are more likely to be found among the ever-widening ripples as they spread across the pond. This observation about the scientific value of exploring edges reminds me of another inspiring book I reviewed a few years ago, The Power of Pull, wherein authors John Hagel III, John Seely Brown & Lang Davison highlight the business value of exploring edges: 

Edges are places that become fertile ground for innovation because they spawn significant new unmet needs and unexploited capabilities and attract people who are risk takers. Edges therefore become significant drivers of knowledge creation and economic growth, challenging and ultimately transforming traditional arrangements and approaches.

On a professional level, given my recent renewal of interest in the practice of data science, I find many insights into ignorance relevant to a productive perspective for a data scientist. He promotes a data-driven rather than hypothesis-driven approach, instructing his students to "get the data, and then we can figure out the hypotheses." Riffing on Rodin, the famous sculptor, Firestein highlights the literal meaning of "dis-cover", which is "to remove a veil that was hiding something already there" (which is the essence of data mining). He also notes that each discovery is ephemeral, as "no datum is safe from the next generation of scientists with the next generation of tools", highlighting both the iterative nature of the data mining process and the central importance of choosing the right metrics and visualizations for analyzing the data.

Professor Firestein also articulates some keen insights about our failing educational system, a professional trajectory from which I recently departed, that resonate with some growing misgivings I was experiencing in academia. He highlights the need to revise both the business model of universities and the pedagogical model, asserting that we need to encourage students to think in terms of questions, not answers. 

W.B. Yeats admonished that "education is not the filling of a pail, but the lighting of a fire." Indeed. TIme to get out the matches.

If_life_is_a_game_these_are_the_rules_large

On a personal level, at several points while reading the book I was often reminded of two of my favorite "life rules" (often mentioned in preceding posts) articulated by Cherie Carter-Scott in her inspiring book, If Life is a Game, These are the Rules:

Rule Three: There are no mistakes, only lessons.
Growth is a process of experimentation, a series of trials, errors, and occasional victories. The failed experiments are as much a part of the process as the experiments that work.

Rule Four: A lesson is repeated until learned.
Lessons will repeated to you in various forms until you have learned them. When you have learned them, you can then go on to the next lesson.

Firestein offers an interesting spin on this concept, adding texture to my previous understanding, and helping me feel more comfortable with my own highly variable learning process, as I often feel frustrated with re-encountering lessons many, many times:

I have learned from years of teaching that saying nearly the same thing in different ways is an often effective strategy. Sometimes a person has to hear something a few times or just the right way to get that click of recognition, that "ah-ha moment" of clarity. And even if you completely get it the first time, another explanation always adds texture.

My ignorance is revealed to me on a daily, sometimes hourly, basis (I suspect people with partners and/or children have an unfair advantage in this department). I have written before about the scope and consequences of others being wrong, but for much of my life, I have felt shame about the breadth and depth of my own ignorance (perhaps reflecting the insight that everyone is a mirror). It's helpful to re-dis-cover the wisdom that ignorance can, when consciously cultivated, be strength.

[The video below is the TED Talk that Stuart Firestein recently gave on The Pursuit of Ignorance.]

 

 


Valuable Advice on Preparing for Technical Interviews ... and Careers

CrackingTheCodingInterview TheGoogleResume The cover of Gayle Laakmann McDowell's book, Cracking the Coding Interview, and links to her Career Cup web site and Technology Woman blog are included in the slides I use on the first day of every senior (400-level) computer science course I have taught over the last two years. These are some of the most valuable resources I have found for preparing for interviews for software engineering - as well as technical program manager, product manager or project manager - positions. I recently discovered she has another book, The Google Resume, that offers guidance on how to prepare for a career in the technology industry, so I've added that reference to my standard introductory slides.

While my Computing and Software Systems faculty colleagues and I strive to prepare students with the knowledge and skills they will need to succeed in their careers, the technical interview process can prove to be an extremely daunting barrier to entry. The resources Gayle has made available - based on her extensive interviewing experience while a software engineer at Google, Microsoft and Apple - can help students (and others) break through those barriers. The updated edition of her earlier book focuses on how to prepare for interviews for technical positions, and her latest book complements this by offering guidance - to students and others who are looking to change jobs or fields - on how to prepare for careers in the computer technology world.

Gayle_uwb_wide

I have been looking for an opportunity to invite Gayle to the University of Washington Bothell to present her insights and experiences directly to our computer science students since I started teaching there last fall, and was delighted when she was able to visit us last week. Given the standing room only crowd, I was happy to see that others appreciated the opportunity to benefit from some of her wisdom. I will include fragments of this wisdom in my notes below, but for the full story, I recommend perusing her slides (embedded below) or watching a video of a similar talk she gave in May (also embedded further below), and for anyone serious about preparing for tech interviews and careers, I recommend reading her books.

Gayle emphasized the importance of crafting a crisp resume. Hiring managers typically spend no more than 15-30 seconds per resume to make a snap judgment about the qualifications of a candidate. A junior-level software engineer should be able to fit everything on one page, use verbs emphasizing accomplishments (vs. activities or responsibilities), and quantify accomplishments wherever possible. Here are links to some of the relevant resources available at her different web sites:

One important element of Gayle's advice [on Slide 13] that aligns with my past experience - and ongoing bias - in hiring researchers, designers, software engineers and other computing professionals is the importance of working on special projects (or, as Gayle puts it, "Build something!"). While graduates of computer science programs are in high demand, I have always looked for people who have done something noteworthy and relevant, above and beyond the traditional curriculum, and it appears that this is a common theme in filtering prospective candidates in many technology companies. This is consistent with advice given in another invited talk at UWB last year by Jake Homan on the benefits of contributing to open source projects, and is one of the motivations behind the UWB CSS curriculum requiring a capstone project for all our computer science and software engineering majors.

IntroductionToAlgorithmsGayle spoke of "the CLRS book" during her talk at UWB and her earlier talk at TheEasy, a reference to the classic textbook, Introduction to Algorithms, by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. She said that entry-level software engineer applicants typically won't need to know data structures and algorithms at the depth or breadth presented in that book, and she offers a cheat sheet / overview of the basics on Slides 23-40, and an elaboration in Chapters 8 & 9 of her CtCI book. However, for those who are interested in delving more deeply into the topic, an online course based on the textbook is now part of the MIT Open CourseWare project, and includes video & audio lectures, selected lecture notes, assignments, exams and solutions.

One potential pitfall to candidates who prepare thoroughly for technical interviews is they may get an interview question that they have already seen (and perhaps studied). She recommended that candidates admit to having seen a question before, equating not doing so with cheating on an exam, and to avoid simply reciting solutions from memory, both because simple slip-ups are both common and easy to catch.

Gayle stressed that was there is no correlation between how well a candidate thinks he or she did in an interview and how well their interviewers thought they did. In addition to natural biases, the candidate evaluation process is always relative: candidates' responses to questions are assessed in the context of the responses of other candidates for the same position. So even if a candidate thinks he or she did well on a question, it may not be as well as other candidates, and even if a candidate thinks he or she totally blew a question, it may not have been blown as badly as other candidates blew the question.

Another important factor to bear in mind is that most of the big technology companies tend to be very conservative in making offers; they generally would prefer to err on the side of false negatives than false positives. When they have a candidate who seems pretty good, but they don't feel entirely confident about the candidate's strength, they have so many [other] strong candidates, they would rather reject someone who may have turned out great than risk hiring someone who does not turn out well. Of course, different companies have different evaluation and ranking schemes, and many of these details can be found in her CtCI book.

Gayle visits the Seattle area on a semi-regular basis, so I'm hoping I will be able to entice her to return each fall to give a live presentation to our students. However, for the benefit of those who are not able to see her present live, here is a video of her Cracking the Coding Interview presentation at this year's Canadian University Software Engineering Conference (CUSEC 2012) [which was also the site of another great presentation I blogged about a few months ago, Bret Victor's Inventing on Principle].

Finally, I want to round things out on a lighter note, with a related video that I also include in my standard introductory slides, Vj Vijai's Hacking the Technical Interview talk at Ignite Seattle in 2008:


def main() in Python considered harmful

Python-logoI recently graded the first Python programming assignments in the course I'm teaching on Social and Computational Intelligence in the Computing and Software Systems program at University of Washington Bothell. Most of the students are learning Python as a second (or third) language, approaching it from the perspective of C++ and Java programming, the languages we use in nearly all our other courses. Both of those languages require the definition of a main() function in any source file that is intended to be run as an executable program, and so many of the submissions include the definition of a main() function in their Python scripts.

In reviewing some recurring issues from the first programming assignment during class, I highlighted this practice, and suggested it was unPythonistic (a la Code like a Pythonista). I recommended that the students discontinue this practice in future programming assignments, as unlike in C++ and Java, a function named main has no special meaning to the the Python interpreter. Several students asked why they should refrain from this practice - i.e., what harm is there in defining a main() function? - and one sent me several examples of web sites with Python code including main() functions as evidence of its widespread use.

Comfort_zone_growth_zone_panic_zoneIn my experience, the greatest benefit to teaching is learning, and the students in my classes regularly offer me opportunities to move out of my comfort zone and into my growth zone (and occasionally into my panic zone). I didn't have a good answer for why def main() in Python was a bad practice during that teachable moment  ... but after lingering in the growth zone for a while, I think I do now.

The potential problem with this practice is that any function defined at the top level of a Python module becomes part of the namespace for that module, and if the function is imported from that module into the current namespace, it will replace any function previously associated with the function name. This may lead to unanticipated consequences if it is combined with a practice of using wildcards when importing, e.g., from module import * (though it should be noted that wildcard imports are also considered harmful by Pythonistas).

I wrote a couple of simple Python modules - main1.py and main2.py - to illustrate the problem:

# main1.py
import sys

def main():
    print 'Executing main() in main1.py'
    print '__name__: {}; sys.argv[0]: {}\n'.format(__name__, sys.argv[0])
 
if __name__ == '__main__':
    main()

# main2.py
import sys

def main():
    print 'Executing main() in main2.py'
    print '__name__: {}; sys.argv[0]: {}\n'.format(__name__, sys.argv[0])
 
if __name__ == '__main__':
    main()

The main functions are identical except one has a string 'main1.py' whereas the other has a string 'main2.py'. If either of these modules are executed from the command line, they execute their own main() functions, printing out the hard-coded strings and the values of __name__ and sys.argv[0] (the latter of which will only have a value when the module is executed from the command line).

$ python main1.py
Executing main() in main1.py
__name__: __main__; sys.argv[0]: main1.py

$ python main2.py
Executing main() in main2.py
__name__: __main__; sys.argv[0]: main2.py

When these modules are imported into the Python interpreter using wildcards, the effect of invoking the main() function will depend on whichever module was imported first.

$ python
Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', '__doc__': None, '__package__': None}
>>> from main1 import *
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__package__': None, 'sys': <module 'sys' (built-in)>, '__name__': '__main__', 'main': <function main at 0x1004aa398>, '__doc__': None}
>>> main()
Executing main() in main1.py
__name__: main1; sys.argv[0]:

>>> from main2 import *
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__package__': None, 'sys': <module 'sys' (built-in)>, '__name__': '__main__', 'main': <function main at 0x1004aa140>, '__doc__': None}
>>> main()
Executing main() in main2.py
__name__: main2; sys.argv[0]:

>>> exit()
$

Now, this may all be much ado about little, especially given the aforementioned caveat about the potential harm of using wildcards in import statements. I suppose if one were to execute the more Pythonistic selective imports, i.e., from main1 import main and from main2 import main, at least the prospect of the main() function being overridden might be more apparent. But people learning a programming language for the first time - er, or teaching a programming language for the second time - often use shortcuts (such as wildcard imports), and so I offer all this as a plausible rationale for refraining from def main() in Python.

As part of my practice of leaning into discomfort and staying in the growth zone, I welcome any relevant insights or experiences that others may want to share.


The Independent Project: An Inspiring Experiment in Student-Designed Learning

The Independent Project is an experimental school within a school, designed and implemented by a group of students at Monument Mountain Regional High School in Great Barrington, MA. I recently wrote about Carl Rogers' ideas regarding student-centered learning, in which a professor (who in this case was also a therapist) plays the role of facilitator, while graduate or undergraduate college students design and implement the course. The Independent Project takes this a step further, empowering a small group of high school students to design and implement an entire curriculum with far less direct involvement of faculty.

I first read about the project in Let Kids Rule the School, a March 2011 NYTimes op-ed piece by Susan Engel, a psychology professor at nearby Williams College, in which she describes a participatory educational framework with minimal input by or supervision from any faculty or staff:

Their guidance counselor was their adviser, consulting with them when the group flagged in energy or encountered an obstacle. Though they sought advice from English, math and science teachers, they were responsible for monitoring one another’s work and giving one another feedback. There were no grades, but at the end of the semester, the students wrote evaluations of their classmates.

During a two-day workshop at the University of Washington Bothell this week on Reinventing University-Level Learning for the 21st Century, I mentioned The Independent Project during a breakout session and wondered whether and how this experiment might be carried out in the context of a university. I started composing a followup email with an annotated link ... and as the annotations grew longer I decided I'd move it over to a blog post (and expand it further).

Since my last visit to The Independent Project web site, a few things have been added:

  • a comprehensive 16-page white paper [PDF] offering more details about exactly how they proposed, sought approval and then designed and implemented the program
  • a map showing other schools that are purportedly experimenting with the program (surprisingly, only two: one in Ireland, another in Mexico)
  • a blog with entries spanning February through March 2012, suggesting they are in the midst of a second trial of the program (a post on logistics highlights the revisions from the first trial)

There is also an extremely well-produced and inspiring 15-minute video, which I will embed below, and then followup with a few highlights.

The video starts out with a quote often - though questionably - attributed to Antoine de Saint-Exupéry, (author of The Little Prince), while the voice-over ironically talks of the project as "rocking the boat":

If you want to build a ship, don't drum up the men to gather wood, divide the work and give orders. Instead, teach them to yearn for the vast and endless sea.

Another relevant and inspiring quote, by educational reformer Ted Sizer, is shown further on:

Inspiration, hunger: these are the qualities that drive good schools. The best we educational planners can do is to create the most likely conditions for them to flourish, and then get out of their way.

Throughout the video, a number of recurring themes are emphasized by students, faculty and administrators at the school (and the school within the school): independence, exploration, questions, initiative, creativity, curiosity, caring, sharing, confidence, mutual support, engagement, freedom, responsibility, excitement, agency, a thirst for knowledge.

According to Sam Levin, the Monument student who instigated the idea and wrote the white paper mentioned above, students would start each week converging on a set of questions and allocate responsibility for answering those questions among each other, then spend the rest of the week splitting their time between individual endeavors during the mornings and working on collective endeavors in the afternoons. They would gather together to teach each other what they learned each Friday.

Among the questions they asked were:

  • How do plant cells differ from the top of the mountain to the bottom of the mountain?
  • Why do we cry?
  • How does music affect the brain?
  • How do mice react to aromatherapy?
  • What causes innovation?
  • Why do we have art?
  • How does Pixar make a film?
  • How can we clean the Housatonic River?
  • What are the dimensions?
  • Are some infinities bigger than others?
  • Is there science behind old wives' tales?

Among the books they read were

  • As I Lay Dying, by William Faulkner
  • Charlotte's Web, by E. B. White
  • Sirens of Titan, by Kurt Vonnegut
  • Tales of Weird, by Ralph Steadman
  • The House on Mango Street, by Sandra Cisneros
  • Travels in the Scriptorium, by Paul Auster
  • The Importance of Being Earnest, by Oscar Wilde
  • Flatland, by Edwin A. Abbott

Among the individual endeavors they worked on were

  • a novel
  • a play
  • a compilation of short stories
  • a study of women's trauma and recovery
  • a short film
  • practicing culinary arts
  • learning the piano

Toward the end of the video, Levin summarizes the experience:

We learned how to learn, we learned how to teach, and we learned how to work.

We learned how to learn in the sense that we learned how to ask questions, and explore the answers using different methods like the scientific method or different or different resources.

We learned how to teach in the sense that we learned how to take what we learned and share it with other people, not just because we had to, because we had a presentation, but because it was our responsibility to make sure everyone else in the group benefited from our work.

We learned how to work in the sense that we learned how to use different resources, and go to different people, and use different methods, and push each other, and be pushed, and criticize and be criticized, to produce the best work, and learn as much as possible.

Susan Engel, who is also interviewed in the video, notes that "the potential for this is right there within the walls of every single school". I'm disappointed, though hardly surprised, that the program does not appear to have gone viral. While the potential for such student-designed learning may be present in any school, I suspect most students would not be sufficiently motivated to expend the energy to design and implement their own curriculum.

Even if a sufficient number of students were sufficiently motivated, there would still be the problem(s) of scale. One of the things I'm re-learning about academia - which is also being learned by participants in the Occupy movement - is that self-governance requires a considerable investment of time and energy ... and it's very difficult to avoid the imposition of structure (aka bureaucracy) as the size of the self-governing body expands.

With the growing array of online resources freely available for learning, including videos from the Khan Academy, massive online open courses from traditional universities as well as startups, and the Peeragogy Handbook Project (to name just a few), the potential for learning increasingly extends far beyond the walls of any single school. Indeed, this expanding potential, and the growing uncertainty about the role universities might play, is the motivation behind the UWB workshop this week.

One of the outcomes of the workshop was a commitment - at least among some of the participants - to undertake more experiments in our cultivation of learning opportunities ... some of which I'm hoping will incorporate the insights and experiences from The Independent Project.

Update: Sam Levin, who is now at University of Oxford, shared some updates with me during a Skype call, which I'll summarize below.

  • Sam has also helped instigate other experimental projects involving high school students, such as Project Sprout and The Future Project.
  • The Independent Project has been contacted by approximately 200 schools from around the world, many of which are implementing their own variations.
  • During the first experiment (in Fall 2010), the participating students were awarded up to 6.5 "public school credits" which counted toward graduation but did not count toward any of the individual distribution requirements (e.g., English or Math), so some students needed to double up in preceding or succeeding years to ensure they met all the requirements.
  • The second experiment is going on now (Spring 2012), and starting in the Fall of 2012, the school will make The Independent Project a permanent part of their program, offer it for a full year (vs. a single semester), and allow students to apply credits toward individual requirements.

Much to my surprise, Sam is not aware of any researchers who are evaluating the current or past instantiations of The Independent Project at MMRHS or any other schools. He suspects this may be due, in part, to the difficulty of assessment (in general, and specifically in the case of a program where students determine their own assessment paradigm). I may be wrong, but this seems like a HUGE opportunity for researchers interested in educational innovation.


Hadoop, Apache and the Benefits of Contributing to Open Source Projects

Hadoop_elephant Jake Homan, a Senior Software Engineer at LinkedIn and UW Bothell CSS graduate, gave a recent guest lecture at UWB on Apache Hadoop: Petabytes and Terawatts, offering an overview and applications of Hadoop as well as related distributed computing tools developed within the Apache Software Foundation. The presentation offered a great balance of breadth and depth that was very well suited to the audience, primarily composed of senior undergraduate and Master's-level computer science students (and a few faculty). One of the most valuable insights shared by Jake was the enormous value that contributing to open source software projects can offer CS students - and other interested in software engineering career opportunities - to develop and demonstrate both their technical skills and their ability to work and play well with others.

HDFSJake explained that Hadoop has two primary components: a distributed file system and a framework to support distributed computation. The Hadoop Distributed File System (HDFS) divides files into 128 MB blocks, makes 2 copies - yielding 3 replicas - of all the blocks, and then distributes the blocks on different DataNodes (computers). A NameNode manages the DataNodes and, among other tasks, regenerates the file blocks stored on a DataNode when that DataNode dies - and given enough DataNodes and enough time, a DataNode is sure to die - to ensure that 3 replicas of every file block are always available.

MapReduceHadoop provides a Java implementation of the MapReduce framework to support distributed computation. Using the prototypical example of a word count program - which Jake described as the "hello, world" program for distributed computing - he showed how to break down a computation into a Mapper and a Reducer. Generally speaking, a Mapper takes a <key, value> pair and generates zero or more <key, value> pairs; a Reducer takes all the values of one key and generates zero or more <key, value> pairs.

Applying this framework to the problem of counting words in a text (or collection of texts), a Hadoop program might start by splitting the text into lines or sentences where the keys represent the sequence positions of lines or sentences and the values represent the segments of text, e.g.,

<0, "Four score and seven years ago ...">
...

Hadoop would distribute these <key, value> pairs acrross DataNodes, where a TaskTracker on each DataNode would use a Mapper to split its line or sentence into a sequence of words and counts (where all counts are initially 1), yielding

<"Four", 1>
<"score", 1>
<"and", 1>
<"seven", 1>
...

During the Reduce phase, the outputs of Mappers are aggregated and sorted by key, yielding <key, list-of-values> pairs:

<"a", [1, 1, 1, 1, 1, 1, 1]>
<"above", [1]>
<"add", [1]>
...

These are then reduced [again] to <key, value> pairs, yielding the final sequence of word and frequency counts:

<"a", 7>
<"above", 1>
<"add", 1>
...

Distributed systems are increasingly the norm rather than the exception in companies providing any kind of web services - or involving any other kind of non-trivial computation - and so knowledge and experience in working with distributed systems is an increasingly important component of computer science education. However, even with knowledge of distributed systems, writing programs that can take advantage of distributed system architecture is still difficult and error-prone.

Jake said that if programmers can learn to think in terms of MapReduce, they can use Hadoop to manage many of the logistical and coordination aspects of distributed system programming; if programmers want to think or work with relational databases (SQL), they can use Hive; and if they want to think or work with higher level scripting languages, they can use Pig. Both of these are among the many Apache tools that can be layered on top of Hadoop. [I wrote about several of these tools in a post last August on Hadoop Day in Seattle: Hadoop, Cascading, Hive and Pig.]

One of the most useful pieces of knowledge that Jake shared during his presentation concerned the often underappreciated second-order benefits of contributing to open source projects, i.e., above and beyond the intrinsic value of improving software tools which, in many cases, programmers are using themselves. The first question he asks a software engineer candidate is "Have you done open source?" Open source software projects typically make all the code and the online conversations about the code publicly available, so Jake can do some background investigation to learn about both the open source code the candidate has written and the way the candidate has interacted with other contributors and stakeholders (e.g., the way a candidate has responded to bug reports or feature requests). The candidacy of any software engineer who has not contributed to any open source software projects may be considerably diminished by a deficit in this area.

ApacheSoftwareFoundationLogoGetting involved in an open source project can be intimidating, so Jake shared a link to the Apache Software Foundation list of ASF newbie issues that would be appropriately scoped projects for someone who wants to test the waters. I have not contributed directly to any Apache project - yet - but I did engage in some civic hacktivism at Data Camp Seattle in February, and some random hacks of kindness at RHOK 3 in June. I would like to organize an appropriately and inspiringly themed open source hackathon at UWB for students, faculty and other interested parties sometime in the near future ... but it will have to wait until after the fall quarter, as the three classes I'm teaching now are consuming nearly all time and energy. I'm glad I at least took an hour off last week for Jake's engaging and educational presentation.


Continuing Education: Senior Lecturer at the University of Washington, Bothell

Uwb-logo I recently embarked on the next stage of my re-engagement with academia, as a Senior Lecturer in the Computer & Software Systems program at the University of Washington, Bothell. Like the Tacoma campus, where I taught last winter and spring, the Bothell campus cultivates a small college culture within a large university system: classes are relatively small (with a maximum of 30-45 students in each) and there is a strong student-centered orientation among all the faculty and staff. The faculty - tenure track and non-tenure track - are actively engaged in research and other scholarly activities, but excellence in teaching is an essential attribute among all faculty.

During my first quarter, I am teaching courses on the Fundamentals of Computing (the introductory course for the CSS major) and Operating Systems (a senior-level core course in the major). I'm excited about teaching these courses for a number of reasons, not least of which is that these are the same courses I taught my first full-time semester teaching at the University of Hartford in 1985. Some content has changed, but many of the basic concepts have persisted over the intervening years. I'll be teaching courses on human-computer interaction, network design and web programming in the spring and winter quarters.

I don't anticipate much time for research during the next few quarters, as all of these courses will require new preparations on one or more dimensions. However, I do anticipate engaging some of my entrepreneurial energy. Although the Bothell campus is 20 years old, in the academic world this still qualifies as a "startup". The campus has ambitious growth plans to double in size over the next 5 years, and I'm looking forward to new opportunities for instigating, connecting and evangelizing in this new educational setting.

I also don't anticipate much time for blogging during this period; this post is already late (classes started last week), and I won't add much more to it. I do want to express my sincere gratitude for all the support I enjoyed from the faculty, staff and students at UW Tacoma throughout my initial re-engagement with academia last year. I am similarly grateful for the warm welcome I have received from the faculty, staff and students at UWB and CSS, and I look forward to my continuing education - as both a producer and a consumer - at the University of Washington.


Health, science, knowledge, access and elitism: Lawrence Lessig and science as remix culture

Remix-Lessig I have been an admirer and supporter of Lawrence Lessig's crusade for copyright reform and promotion of remix culture for many years. In a recent talk at CERN, Lessig applied his arguments for a fairer interpretation of fair use in the arts world to opening up the architectures for knowledge access in the world of science. The Harvard Law School professor made a compelling case for the ethical obligation of scientists [at least those in academia] to provide universal access to the knowledge they discover, and chastised those who practice exclusivity - those who choose elite-nment over enlightenment - as "wrong".

I intially discovered the talk by following a @BoingBoing tweet to a two-paragraph blog post about Lessig on science, copyright and the moral case for open access, which included an embedded 50-minute video of Lessig's presentation at CERN on 18 April 2011 entitled "The Architecture of Access to Scientific Knowledge: Just How Badly We Have Messed This Up".

I rarely take the time to watch any videos, and having seen many of Lessig's talks about copyright reform - live and online - I was preparing to simply retweet the link, and move on. But having been thoroughly irritated by a personal encounter with barriers to knowledge access during the [free] webcast from the otherwise enlightening and engaging Behavioral Informatics for Health event earlier this week, I was motivated to see and hear what Lessig had to show and tell. I was excited to discover that Lessig's talk was far more relevant to health and medicine - and the kind of universal access to crucial information that might help those outside of elite schools and hospitals better achieve positive health outcomes - than I initially anticipated.

Ajpm_journal Before sharing some of Lessig's insights and observations, I want to share the source of my personal irritation in encountering preventative measures erected to limit access to one of the two journals being showcased at the behavioral informatics event, a special issue on Cyberinfrastructure for Consumer Health from the American Journal of Preventative Medicine. When I investigated options for accessing some of the interesting articles being mentioned during the event, I discovered that

THIS SITE DOES NOT SUPPORT INSTITUTIONAL ACCESS

AJPM pricing options for individuals include a 12-month subscription to the journal for $277, or the purchase of individual articles for $31.50 each. The special issue being showcased at the event included 27 articles, which translates into a total cost of $850 for purchasing this one issue of the journal, whose mission is "the promotion of individual and community health".

Tbm_journal In contast, all the articles from the inaugural issue of the other journal being showcased at the event, Translational Behavioral Medicine, are freely available online, a policy much more in alignment with its mission:

TBM is an international peer-reviewed journal that offers continuous, online-first publication. TBM's mission is to engage, inform, and catalyze dialogue between the research, practice, and policy communities about behavioral medicine. We aim to bring actionable science to practitioners and to prompt debate on policy issues that surround implementing the evidence. TBM's vision is to lead the translation of behavioral science findings to improve patient and population outcomes.

I hope to post another blog entry with some notes from the behavioral informatics event, but in this post, I want to continue on with some of Lessig's commentary about science, knowledge, access and elitism. I'll embed a copy of the video below, follow it with some notes and partial transcriptions I made while watching, and finish off with a brief riff on science as a remix culture.

The Architecture of Access to Scientific Knowledge from lessig on Vimeo.

Lessig begins by talking about two motivations for his talk. The first is the late Supreme Court Justice Byron White, who was considered a liberal when appointed to the court by President John Kennedy in 1962, but became progressively more conservative, as evidenced in his authoring of the majority opinion in the 1986 case of Bowers v Hardwick, which upheld the criminalization of sodomy laws, and included the following statement:

Against this background, to claim that a right to engage in such conduct [sodomy] is "deeply rooted in this Nation's history and tradition" or "implicit in the concept of ordered liberty" is, at best, facetious.

Lessig calls this the White effect:

To be liberal / progressive is always relative to a moment, and that moment changes, and too many are liberal / progressive no more.

HarvardGazette_021111_Gita_019_605 The second, more recent, motivation was a Harvard Gazette article about Gita Gopinath, a macro-economist at Harvard who was born in India. After mentioning that Gopinath, a tenured professor, would like to have more time to read books that are not textbooks, the article concluded with the following sentence:

Still, the shelves in her new office are nearly bare, since, said Gopinath, “Everything I need is on the Internet now.”

Lessig notes:

If you're a member of the knowledge elite, then you have effectively free access to all of this information, but if you're from the rest of the world, not so much.

He goes on to observe:

The thing to recognize is that we built this world, we built this architecture for access. This flows from the deployment of copyright, but here, copyright to benefit publishers, not to enable authors. Not one of these authors gets money from copyright, not one of them wants the distribution of their articles limited, not one of them has a business model that turns upon restricting access to their work, not one of them should support this system.

As a knowledge policy, for the creators of this knowledge, this is crazy.

Lessig tells the story of his third daughter, who was diagnosed with jaundice shortly after her birth, and the concern he felt when the doctor expressed unexpected concern about possible complications. Due to his status as a Harvard professor, he had institutional access to many relevant articles in medical journals. When he calculated the cost for purchasing the 20 articles he tracked down, it would have cost $435 for someone who did not enjoy his level of elite status.

AAFP Even those journals which granted free access sometimes engaged in regulating access to parts of articles. For example, a February 2002 article on "Hyperbilirubinemia in the Term Newborn" in American Family Physician was available for free ... except for a crucial missing chart:

TABLE 4
Management of Hyperbilirubinemia in Healthy Term Newborns

The rightsholder did not grant rights to reproduce this item in electronic media. For the missing item, see the original print version of this publication.

Rather than architecting systems to maximize access to knowledge, Lessig suggests that "we are architecting access to maximize revenue" He also shares a chart from An Open Letter to All University Presidents and Provosts Concerning Increasingly Expensive Journals by Theodore Bergstrom & R. Preston McAfee on Journal Prices by Publisher and Discipline Type that shows the cost-per-page of purchasing articles from for-profit journals was 5 times higher, on average, than the cost in not-for-profit journals, leading him to wonder whether academia is creating it's own RIAA:

Really Important Academic Archive: RIAA for the Academy?

Sciencecommons Lessig is co-founder of the Science Commons, a translation of the Creative Commons license to promote open access in the scientific community, with four key principles:

  1. Open access to literature
  2. Access to research tools
  3. Data should be in the public domain
  4. Open cyberinfrastructure

PLoS_logo Lessig championed the Public Library of Science (PLoS) as an exemplar of these principles. Personally, I am very excited about the PLoS publication of a landmark study this week on Sharing Data for Public Health Research by Members of an International Online Diabetes Social Network, by Weitzman, et al., based on data from the TuDiabetes online community, and another recent study by Wicks, et al., based on PatientsLikeMe community members with amyotrophic lateral sclerosis (ALS) published - and freely available - in the journal Nature Biotechnology, Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm.

Having recently read a critique of Science 2.0, cataloging the shortcomings and/or failures of several traditional for-profit publishers to effectively capitalize on the Web 2.0 platform, it is encouraging to see some promising progress in sharing knowledge about chronic conditions in the not-for-profit world.

Lessig proceeds to review some of the issues surrounding the use - and misuse - of copyright in the arts, but I have already written about many of his arguments and examples from that domain in my notes from his keynote at the 2009 Seattle Green Festival. I'll simply note that in viewing his examples in this context, I was struck by the revelation that on a very basic level,

science is a remix culture

Traditionally, much of science has been the exclusive domain of professional scientists, who typically go to great lengths to cite prior work that is related to the experiments and results they report in peer-reviewed publications (indeed, some of the peers reviewing work submitted for publication are among those who are - or [feel they] should be - cited). With the rare exceptions of paradigm shifts, most of science is incremental in nature, and each increment represents a remix with a few added ingredients.

Vrm_header_stacked187 There are several promising signs that people without PhDs, MDs and other "terminal" credentials can participate more fully in the scientific discovery and dissemination process. I enumerated several of these efforts in an earlier post on platform thinking, but in the context of health and medicine - and Harvard - I do want to mention Doc Searls' recent post on Patient-driven health care in which he expands the idea of the patient as a platform and mentions efforts by , Jon Lebkowsky and to promote a vendor relationship management (VRM) model in which patients - and the data about their conditions - will be better able to participate in peer-to-peer collaborations with health care and health information technology professionals.

Lessig laments the current system in which authors - and peer reviewers - of scientific publications do much of the work for free, while for-profit publishers derive nearly of the financial benefits, and do so through restricting access to the knowledge produced by the authors. Given that much of the data used in the experiments reported in professional medical publications comes from patients (the PatientsLikeMe and TuDiabetes studies being particularly notable examples), it makes all the more sense to make the results of these experiments available to all patients ... and at some point, we all are - or will be - patients who might benefit from universal access to this knowledge.


Innovation, Research & Reviewing: Revise & Resubmit vs. Rebut for CSCW 2012

cscw2012-logo Research is about innovation, and yet many aspects of the research process often seem steeped in tradition. Many conference program committees and journal editorial boards - the traditional gatekeepers in research communities - are composed primarily of people with a long history of contributions and/or other well-established credentials, who typically share a collective understanding of how research ought to be conducted, evaluated and reported. Some gatekeepers are opening up to new possibilities for innovations in the research process, and one such community is the program committee for CSCW 2012, the ACM Conference on Computer Supported Cooperative Work ... or as I (and some other instigators) like to call it, Computer-Supported Cooperative Whatever.

This year, CSCW is introducing a new dimension to the review process for Papers & Notes [deadline: June 3]. In keeping with tradition, researchers and practitioners involved in innovative uses of technology to enable or enhance communication, collaboration, information sharing and coordination are invited to submit 10-page papers and/or 4-page notes describing their work. The CSCW tradition of a double-blind review process will also continue, in which the anonymous submissions are reviewed by at least three anonymous peers (the program committee knows the identities of authors and reviewers, but the authors and reviewers do not know each others' respective identities). These external reviewers assess the submitted paper or note's prospective contributions to the field, and recommend acceptance or rejection of the submission for publication in the proceedings and presentation at the conference. What's new this year is an addition to the traditional straight-up accept or reject recommendation categories: reviewers will be asked to consider whether a submission might fit into a new middle category, revise & resubmit.

CSCW, CHI and other conferences have enhanced their review processes in recent years by offering authors an opportunity to respond with a rebuttal, in which they may clarify aspects of the submission - and its contribution(s) - that were not clear to the reviewers [aside: I recently shared some reflections on reviews, rebuttals and respect based on my experience at CSCW and CHI]. For papers that are not clear accepts (with uniformly high ratings among reviewers) - or clear rejects (uniformly low ratings) - the program committee must make a judgment call on whether the clarifications proposed in a rebuttal would represent a sufficient level of contribution in a revised paper, and whether the paper could be reasonably expected to be revised in the short window of time before the final, camera-ready version of the paper must be submitted for publication. The new process will allocate more time to allow the authors of some borderline submissions the opportunity to actually revise the submission rather than limiting them to only proposing revisions.

As the Papers & Notes Co-Chairs explain in their call for participation:

Papers and Notes will undergo two review cycles. After the first review a submission will receive either "Conditional Accept," "Revise/Resubmit," or "Reject." Authors of papers that are not rejected have about 6 weeks to revise and resubmit them. The revision will be reviewed as the basis for the final decision. This is like a journal process, except that it is limited to one revision with a strict deadline.

The primary contact author will be sent the first round reviews. "Conditional Accepts" only require minor revisions and resubmission for a second quick review. "Revise/Resubmits" will require significant attention in preparing the resubmission for the second review. Authors of Conditional Accepts and Revise/Resubmits will be asked to provide a description of how reviewer comments were addressed. Submissions that are rejected in the first round cannot be revised for CSCW 2012, but authors can begin reworking them for submission elsewhere. Authors need to allocate time for revisions after July 22, when the first round reviews are returned [the deadline for initial submissions is June 3]. Final acceptance decisions will be based on the second submission, even for Conditional Accepts.

Although the new process includes a revision cycle for about half of the submissions, community input and analysis of CSCW 2011 data has allowed us to streamline the process. It should mean less work for most authors, reviewers, and AC members.

The revision cycle enables authors to spend a month to fix the English, integrating missing papers in the literature, redoing an analysis, or adopt terminology familiar to this field, problems that in the past could lead to rejection. It also provides the authors of papers that would have been accepted anyway to fix minor things noted by reviewers.

This new process is designed to increase the number and diversity of papers accepted into the final program. Some members of the community - especially those in academia - may be concerned that increasing the quantity may decrease the [perceived] quality of submissions, i.e., instead of the "top" 20% of papers being accepted, perhaps as many as 30% (or more) may be accepted (and thus the papers and notes that are accepted won't "count" as much). However, if the quality of that top 30% (or more) is improved through the revision and resubmission process, then it is hoped that the quality of the program will not be adversely affected by the larger number of accepted papers presented there ... and will actually be positively affected by the broader range of accepted papers.

I often like to reflect on Ralph Waldo Emerson's observation:

All life is an experiment. The more experiments you make the better.

If research - and innovation - is about experimentation, then it certainly makes sense to experiment with the ways that experiments are assessed by the research communities to which they may contribute new insights and knowledge.

BeingWrongBook There is a fundamental tension between rigorous validation and innovative exploration. Maintaining high standards is important to ensuring the trustworthiness of science, especially in light of the growing skepticism about science among some segments of the public. But scientists and other innovators who blaze new trails often find it challenging to validate their most far-reaching ideas to the satisfaction of traditional gatekeepers, and so many conferences and journals tend to be filled with more incremental - and more easily validatable - results. This is not necessarily a bad thing, as many far-reaching ideas turn out to be wrong, but I increasingly believe that all studies and models are wrong, but some are useful, and so opening up new or existing channels for reviewing and reporting research will promote greater innovation.

I'm encouraged by the breadth and depth of conversations, conversions and alternatives I've encountered regarding research and its effective dissemination, including First Monday, arXiv and alt.chi. At least one other ACM-sponsored research community - UIST (ACM Symposium on User Interface Software & Technology) - is also considering changes to their review process; Tessa Lau recently wrote about that in a blog post at the Communications for the ACM, Rethinking the Systems Review Process (which, unfortunately, is now behind the ACM paywall ... another issue relevant to disseminating research). The prestigious journal, Nature, recently wrote about the ways social media is influencing scientific research in an article on Peer Review: Trial by Twitter.

I think it is especially important for a conference like CSCW that is dedicated to innovations in communication, collaboration, coordination and information sharing (which [obviously] includes social media) to be experimenting with alternatives, and I look forward to participating in the upcoming journey of discovery. And in the interest of full disclosure, one way I am participating in this journey is as one of the Publicity Co-Chairs for CSCW 2012, but I would be writing about this innovation even if I were not serving in that official capacity.

[Update: Jonathan Grudin, one of the CSCW 2012 Papers & Notes Co-Chairs, has written an excellent overview of the history and motivations of the revise and resubmit process in a Communications of the ACM article on Technology, Conferences and Community: Considering the impact and implications of changes in scholarly communication.]


Academia Redux: Joining the Institute of Technology at the University of Washington, Tacoma

My new office This past Monday, I returned to the classroom after a hiatus of over two decades. While I have given occasional guest lectures and other presentations in academic settings in the intervening period, for the next six months, I will be engaging with students in classrooms at least twice a week in my new role as a Lecturer in the Computing and Software Systems program at the Institute of Technology at the University of Washington, Tacoma.

I'm excited about the opportunity to interact more regularly with students again. I don't much care for the title, "Lecturer", as it implies a predominantly one-way style of communication, and I see education as more of a conversation, a cooperative endeavor in which I hope to learn at least as much as the students do. And given that the two courses I'm teaching this quarter are outside of my primary areas of expertise, I fully anticipate that this will be a quarter filled with teachable moments for all participants.

Having recently written about narrative psychology and the stories we make up about ourselves, I've been reflecting on my own life story, and what this latest chapter represents. 21 years ago, I resigned my position as Assistant Professor of Computer Science at the University of Hartford in order to work full time on a Ph.D. at the University of Massachusetts, with the initial intention of returning to teaching with my union card in hand. However, after completing my thesis in Artificial Intelligence, I was interested in trying something completely different, and followed a path into industry research and development that involved a blend of Ubiquitous Computing, Human-Computer Interaction and Computer-Supported Cooperative Work Whatever. I've always imagined myself returning to academia at some point, and I'm grateful to have the opportunity to explore whether this is the appropriate time and place for a renewal of my passion for teaching.

My new colleagues at the Institute of Technology have been enormously supportive as I learn or re-learn both the content of the courses and how best to facilitate the learning of that content by the students. I'm impressed with the techologies that are available for promoting interaction in the classroom and hands-on experience in the labs, and am taking as much advantage of best practices developed by my colleagues as possible. Practicing Brene Brown's prescription for wholeheartedness and connection through courage, vulnerability and authenticity, I have been very open with the students, and they have also been generally patient and supportive as I do my best to get up to speed on multiple dimensions simultaneously. I know that several of these students know more than I know about some of the material we're covering in both courses, and I look forward to their continued contributions in this cooperative learning experience.

As part of my commitment to always do my best, I will postpone the inclination to write more about this transition at the moment, and turn my attention back to preparing for my second week of classes. After just one week, I can better understand the relative infrequency with which my Twitter friends from academia post status updates, and I expect my own social media use to continue at significantly reduced levels for much of the quarter.

FeelTheFearAndDoItAnyway-20thAnniversary In carrying on a tradition in past "transition" blog posts, I want to re-share one of the most valuable resources I've encountered - and regularly revisit - for making decisions about significant life events, the No-Lose Decision Model from Susan Jeffers' book, Feel the Fear and Do It Anyway:

Before you make a decision:

  1. Focus immediately on the no-lose model (whichever path you choose will provide learning opportunities … even if it’s learning what you don’t like)
  2. Do your homework (talk to as many people as will listen … both to help clarify your own intention and to get alternative perspectives)
  3. Establish your priorities (which pathway is more in line with your overall goals in life – at the present time)
  4. Trust your impulses (your body gives you good clues about which way to go)
  5. Lighten up (it really doesn’t matter – it’s all part of a lifelong learning process)

After making a decision:

  1. Throw away the picture (if you focus on what you expected, you may miss the unexpected opportunities that arise along the new path you’ve chosen)
  2. Accept total responsibility for your decision (don’t give away your power)
  3. Don’t protect, correct (commit yourself to any decision you make and give it all you got … but if it doesn’t work out, change it!)

I more recently re-encountered some corroborating wisdom from Dan Gilbert's book, Stumbling on Happiness - another source of insights I revisit periodically, especially on the cusp of important decisions - as articulated by Ze Frank: