The article, written by Leonard Mlodinow - who teaches randomness at Caltech and is author of "The Drunkard's Walk: How Randomness Rules Our Lives" - was prompted, in part, by two recent articles in the Journal of Wine Economics, by Robert Hodgson, a retired professor who taught statistics at Humboldt State University and has been proprietor of Fieldbrook Winery since 1976. Hodgson noticed that the same wine would receive widely different ratings in different competitions, ranging from "undrinkable" to double gold medals, and decided to investigate. He took a course on wine judging and joined the advisory board of the California State Fair Commercial Wine Competition, and was eventually given permission to conduct controlled experiments to rate the raters.
In his first study [An Examination of Judge Reliability at a Major U.S. Wine Competition (Journal of Wine Economics, Vol. 3, No. 2, 105-113)] each year, for four years, Mr. Hodgson served actual panels of California State Fair Wine Competition judges—some 70 judges each year—about 100 wines over a two-day period. He employed the same blind tasting process as the actual competition. In Mr. Hodgson's study, however, every wine was presented to each judge three different times, each time drawn from the same bottle.
The results astonished Mr. Hodgson. The judges' wine ratings typically varied by ±4 points on a standard ratings scale running from 80 to 100. A wine rated 91 on one tasting would often be rated an 87 or 95 on the next. Some of the judges did much worse, and only about one in 10 regularly rated the same wine within a range of ±2 points.
This September, Mr. Hodgson dropped his other bombshell [An Analysis of the Concordance Among 13 U.S. Wine Competitions (Journal of Wine Economics, Vol. 4, No. 1, 1-9)]. This time, from a private newsletter called The California Grapevine, he obtained the complete records of wine competitions, listing not only which wines won medals, but which did not. Mr. Hodgson told me that when he started playing with the data he "noticed that the probability that a wine which won a gold medal in one competition would win nothing in others was high." The medals seemed to be spread around at random, with each wine having about a 9% chance of winning a gold medal in any given competition.
To test that idea, Mr. Hodgson restricted his attention to wines entering a certain number of competitions, say five. Then he made a bar graph of the number of wines winning 0, 1, 2, etc. gold medals in those competitions. The graph was nearly identical to the one you'd get if you simply made five flips of a coin weighted to land on heads with a probability of 9%. The distribution of medals, he wrote, "mirrors what might be expected should a gold medal be awarded by chance alone."
Mlodinow references other interesting articles and studies in the article noting one that reportedly showed that "even flavor-trained professionals cannot reliably identify more than three or four components in a mixture, although wine critics regularly report tasting six or more" [Influence of training and experience on the perception of multicomponent odor mixtures, by Andrew Livermore and David G. Laing, Journal of Experimental Psychology: Human Perception and Performance, Vol 22(2), Apr 1996, 267-277], another that reportedly showed that simply manipulating the color of a wine influences experts' perception of sweetness [The Influence of Color on Discrimination of Sweetness in Dry Table-Wine, by Rose M. Pangborn, Harold W. Berg and Brenda Hansen, The American Journal of Psychology, Vol. 76, No. 3 (Sep., 1963), pp. 492-495], and a third that reportedly showed that the style of bottle from which the wine is poured also has a significant effect on experts' perception of the wine [La dégustation: Etude des représentations des objets chimiques dans le champ de la conscience, by Frédéric Brochet, Academie Amorin, Coup de Cœur 2001]. The third study, and another variation of the second one (also conducted by Brochet), was also reported by Jonah Lehrer in a blog post on The Subjectivity of Wine, in which many of the commentators raise questions about the methods and results of those studies, although I don't think any of them would question the inherent subjectivity of judging wine.One of my favorite parts of the article is a comparison between two descriptions of the same wine - a Silverado Limited Reserve Cabernet Sauvignon 2005 - offered by two different prominent sources of wine ratings. The Wine News describes it as "Dusty, chalky scents followed by mint, plum, tobacco and leather. Tasty cherry with smoky oak accents…" whereas The Wine Advocate describes it as having "promising aromas of lavender, roasted herbs, blueberries, and black currants." Mlodinow notes that there is no intersection between the 8 flavors and scents listed in the first description and the 4 flavors and scents in the second description.
Mlodinow emailed Robert Parker to invite him to participate in a controlled blind tasting. Unfortunately, Parker declined, but did point him to a recent Executive Wine Seminars tasting of 15 of Parker's top-rated 2005 Bordeaux wines - all in the 95-100 point range - in which Parker's second ratings of 12 of the 15 wines were reportedly within 2-3 points of his initial rating (though all were in a rather narrow range to begin with). This is somewhat disappointing, as I have found Parker to be the most reliable judge of wines after years of exploring different sources (including Steven Tanzer, the Wine Spectator, and a variety of different online and offline wine stores, though I have found Richard Gagnon of the Brattleboro Food Co-op and Nabil and Karen at Seattle Wine Company to be pretty reliable sources, too).
However, there was no mention in Mlodinow's article (or the EWS web site) of the intra-subject consistency of flavor and scent descriptions in the blind tasting, and over the years, I have tended to focus more on the descriptors than the numerical ratings. I have seen many cases of the diversity of descriptions for the same wine from different sources, and so for me, the trick has been to find a source whose descriptions most consistently align with my own perceptions. In fact, one of the appealing aspects of personal wine recommendations I've received from Richard, Nabil and Karen is their conversational nature: we talk about the various wines, and focus more on the descriptors than any numerical ratings, allowing us to converge on the wines that I typically find most enjoyable).In any case, all of the various sources of ratings and reviews are secondary sources; the best way to judge which wines I enjoy is to taste them myself, ideally at a wine store or at a larger scale tasting event such as the Zinfandel Festival. Although, like the experts, I am also subject to the influences and biases listed in these reports (e.g., the color of the wine as it is poured into a glass, the descriptors mentioned by the winemaker or wine shop proprietor, the bottles from which the wines are poured and probably a host of other factors), I'd generally rather be at the whim of direct influences on my first-hand experience than at the whim of indirect influences of second-hand reports by experts.