Re-rethinking Recommendation Engines: Psychology and the Influence of False Negatives
February 26, 2008
Alex Iskold posted an interesting article on Rethinking Recommendation Engines on ReadWriteWeb yesterday. I like (and recommend) his crisp and clear delineation of different types or sources of recommendations - personalized (based on your past behavior), social (based on past behavior of others who are similar to you) and item-based (based on the recommendable items themselves) - and his emphasis on the importance of incorporating psychological principles, not just technological ones, into the design of effective recommendation engines. [I also like (and recommend) Rick MacManus' associated recommendations on 10 Recommended Recommendation Engines, but that may be biased by MyStrands' prominent placement in that list.] However, I take issue with - or at least re-rethink - some of Alex' contentions regarding the road to successful recommender systems being paved with false negatives.
First, I want to agree with Alex (and Gavin Potter, the Guy in the Garage that Alex references) about the importance of psychology in technology design and in general ("Enhancing formulas with a bit of human psychology is a really good idea") and the value of recognizing and capitalizing on human inertia. However, his characterization of inertia - the tendency of our ratings to be heavily influenced (or primed) by other recent ratings - seems more characteristic of a primacy or recency effect than inertia (as I understand these concepts). However, I do think that inertia plays an important role in the adoption and use (or non-adoption / non-use) of any technology - people do not tend to change much or even expend much effort, unless or until sufficient incentive is provided.
So I think the inertia problem, with respect to recommendation engines, is more one of motivating users to rate things ... and I actually think the Netflix ratings system for movies (which provides the basis for much of the article) is an outstanding example - it doesn't require much effort (you are automatically prompted for a rating whenever you login to the site after having sent a DVD back), and the more you rate, the better the recommendations you receive, offering intrinsic vs. extrinsic motivation ... and explaining why the system has motivated millions of its users to contribute an estimated 2 billion ratings. [Aside: I see that ReadWriteWeb is offering an extrinsic incentive for comments and trackbacks - a chance to win an Amazon gift certificate - but I was already planning on adding a trackback for intrinsic reasons.] In any case, however one labels these psychological influences - inertia, priming and/or recency - they are important to incorporate into the design of recommendation engines, and the systems that use them.
Further along in the article, Alex distinguishes false positives - recommendations for things that (it later turns out) we do not like - from false negatives - recommendations against things (it would later or perhaps likely turn out) we do like, and correctly recommends leveraging false negatives more effectively in the design of recommendation engines. [And just to round things out, in case it isn't obvious, true positives are recommendations for things that we will / do like, and true negatives are recommendations for things that we will / do not like ... and thanks to Eric for helping me set the record straight with respect to "do likes" and "don't' likes" in my description of false negatives (!)]
Unfortunately, he extends this thread to some propositions that lie beyond my comfort zone:
We do not need recommendations, because we are already over subscribed. We need noise filters. An algorithm that says: 'hey, you are definitely not going to like that' and hide it. ... If the machines can do the work of aggressively throwing information out for us, then we can deal with the rest on our own.
Now, on the one hand, I am sympathetic to the problem of information overload. However, as I noted in my notes from CSCW 2006, Paul Dourish pointed out that this is not a new problem:
One of the diseases of this age is the multiplicity of books; they doth so overcharge the world that it is not able to digest the abundance of idle matter that is every day hatched and brought forth into the world.
-- Barnaby Rich (1580-1617), writing in 1613 (!); quoted by de Solla Price in his 1963 book "Little Science, Big Science."
I'm also reminded of James Carse's observation about evil in his marvelous (and highly recommended) book, Finite and Infinite Games:
Evil is never intended as evil. Indeed, the contradiction inherent in all evil is that it originates in the desire to eliminate evil. "The only good Indian is a dead Indian."
I think that too aggressively filtering out [presumed] false negatives can render us more easily manipulated by technology ... and the people and organizations who control technology. Although there is considerable debate about what Web 2.0 is, one of its key ingredients is surely the provisioning of architectures of participation, in contrast to the "command and control" paradigm of earlier technologies (and eras). One of the beneficial side effects of the growth of Web 2.0 - for me - has been enhanced opportunities for serendipity, and allowing more false negatives is likely to yield fewer instances of serendipity. Furthermore, I believe increasing the probability - or acceptability - of false negatives may have the unfortunate consequence of moving further up the head of the long tail ... and/or further down toward the lowest common denominator(s). Book burning lies at or near the extreme end of the "acceptance of false negatives" spectrum, though I do not mean to imply that any of these consequences are intended or desired by the article or author.
In earlier chapters of my career, when I was more focused on natural language processing and automatic speech recognition, I became familiar with the concept of Equal Error Rate (EER), which represents a way of measuring the balance between false positives (which yields what is called the False Acceptance Rate, or FAR) and false negatives (False Rejection Rate, or FRR). The documentation for the BioID biometrics system SDK from HumanScan provides a nice articulation of these concepts, including the graph below:
Perhaps the solution to the tension between false positives and false negatives in recommender systems is to incorporate some kind of control for the user to specify an acceptable balance or threshold (which may default to the EER) ... although that would also require devising a solution to the tension between user inertia and input ... but that simply provides additional corroboration for Alex's primary argument that we need to incorporate more psychology into our designs of good - or better - recommender system technologies.