Beer and Nappies -- A Data Mining Urban Legend


You probably haven't heard of this UL, as it's not well known outside Data Mining circles.

There is a story that a large supermarket chain, usually Wal-Mart, did an analysis of customers' buying habits and found a statistically significant correlation between purchases of beer and purchases of nappies (diapers in the US). It was theorized that the reason for this was that fathers were stopping off at Wal-Mart to buy nappies for their babies, and since they could no longer go down to the pub as often, would buy beer as well. As a result of this finding, the supermarket chain is alleged to have the nappies next to the beer, resulting in increased sales of both.

The Original Version

I mentioned this story to someone who orks bovines, and who is otherwise a reliable source of information, and he said that it was found when Wal Mart analysed their customers' purchases.

Remaining skeptical, I did a web search on 'Wal Mart beer diapers', which turned up two items which mentioned the association, one of which mentioned Wal Mart in the same paragraph but didn't actually mention that the analysis was done for them. In both items, it is talked about apocryphally:

  1. Data Mining by Doug Alexander
  2. The Knowledge Banks

Doug Alexander has this to say:
Market basket analysis - Understand what products or services are commonly purchased together; e.g., beer and diapers.

The Knowledge Banks has this to say:
"As classically defined, a data warehouse involves pulling data from defined sources and integrating that data so that it is no longer simply operational but strategic information. One reason for such a narrow interpretation is that until recently, data warehousing was used almost exclusively in the business world, most notably in retail outlets like Wal-Mart. Corporate managers with an eye on the bottom line utilize the technology to detect consumer buying habits and formulate strategic marketing plans accordingly. For example, by scanning each sale into a data warehouse, grocery stores have determined that men in their 20s who purchase beer on Fridays after work are also likely to buy a pack of diapers. Thus, a display of Pampers or another brand might be set up in the beer aisle, or merchants will put one (but not both) of the products on sale on Friday evenings."

I have a reference in print for this story which clearly links the story to Wal-Mart: Sean Kelly's Data Warehousing: The Route to Mass Customization, a 1996 release from John Wiley & Sons, New York.

Some other sightings

These three web sites are a sample of those on the web which vector the legend. They are uncritical.

The stories are embellished with plausible sounding factoids, e.g. increased sales as a percentage:

The second quote does use the world "legendary", but this seems to be used in the sense "renowned" rather than implying any skepticism.

It begins to sound dubious

One web site, The Land Mines of Data Mining, is skeptical and has this to say:

"Thus, one infamous "fact" that a store turned up through data mining a few years ago was a correlation between sales of beer and sales of diapers. While this amusing result was widely reported, nobody could figure out why people seemed to buy beer and diapers at the same time. What coherent marketing strategy can emerge from such factlets?

... and goes on to say ...

The fact is that data mining itself is in the diaper stage."

A data mining cow orker writes:
"basically people were reviewing data, possibly for Wal-Mart, and speculated that it would be interesting to find a link between 'Beer and Nappies'. From there on the myth was created."

Another data mining cow orker writes:
Lynette Dyer (of Cogit) gave a very good presentation that played down much of the hype around data mining, emphasising the need for a clear business value proposition. She also debunked the 'beer and nappies' example, which she said was invented by Tom XXXX as a joke when he and others were doing some analysis for Wal-Mart.

The Mutated Version

Spotted on Sydney Morning Herald - Computers:
"Sometimes the data can throw up surprises: mining of databases held by 7-Eleven stores in the US revealed a link between purchases of beer and nappies. When they were moved together, sales of both increased, says Williams."

Conclusion: Fb

One of the key indicators of an urban legend is that the same legend is associated with many people or organizations, e.g. the well known UL about Rod Stewart spread to Marc Almond. It appears that this one has spread from Wal-Mart to 7-11.

That, plus two data miners I know well vouching for its lack of voracity suggests that this UL is probably false.

Postscript: Now that Wal-Mart has taken over ASDA, I'll be able to check the relative location of beer and nappies there personally.

