A silly BuzzFeed list teaches a lesson on Wikipedia vandalism

Wikipedia logoI try not to encourage the link-bait over at BuzzFeed (even just for fun) but one “listicle” posted last Thursday got lots of attention among the UK people I follow. It is titled “12 Spectacular Acts Of Wikipedia Vandalism” and I have to admit it contains some pretty funny stuff.  The list includes Ernest Hemingway as the author of a children’s book and the First Law of Thermodynamics reinterpreted through Fight Club.

People love to point out weird things that make it into Wikipedia. When it’s not being played for humor (as it is here) often the purpose is to call into question Wikipedia’s accuracy. But one thing that these posts often leave out – how long does vandalism like this hang around on Wikipedia? Some articles are constantly being edited, and if a piece of vandalism is removed immediately, how much damage can it do?

So just for fun, I decided to seek out each of the pieces of vandalism that BuzzFeed highlighted and find out.  Just how likely were you to stumble on one of these pieces of vandalism unaware?

This is certainly not any sort of scientific sampling of vandalism on Wikipedia, so I won’t belabor the significance of these numbers. I just wanted to take a look.

But how do you find a piece of vandalism after it’s been removed? Well as I’ve stated repeatedly here on the blog and on podcasts, the vast majority of Wikipedia’s actions are publicly recorded for anyone to examine. So you can search back through the history of each article to find out how it looked on any given day. To make this easier, there is a tool called WikiBlame that allows you to search through old versions for a particular piece of text.

Using WikiBlame and clues from the BuzzFeed article, I was able to find the exact piece of vandalism in all twelve cases. Here’s a listing showing each article, when it was vandalized, when that was removed and the net time the vandalism was visible:

Vandalism Fixed Elapsed
#1 Jack Pickles Oct 28, 2012 9:37 Oct 28, 2012 11:03 1h 26m
#2 Thermodynamics Nov 21, 2013 11:31 Nov 21, 2013 11:31 < 1 minute
#3 Batman Jun 14, 2007 16:55 Jun 14, 2007 16:55 < 1 minute
#4 Gin & Juice (1) Nov 2, 2011 16:29 Nov 2, 2011 16:37 8m
#4 Gin & Juice (2) Nov 2, 2011 23:31 Nov 2, 2011 23:46 15m
#5 Kay Burley Oct 5, 2012 5:59 Oct 5, 2012 6:11 12m
#6 Karen Gillan Aug 20, 2013 11:10 Aug 20, 2013 11:37 27m
#7 Spot the Dog Jul 3, 2012 0:16 Jul 3, 2012 0:26 10m
#8 Ron Atkinson Jun 22, 2006 11:41 Jun 22, 2006 12:45 1h 4m
#9 Mixed Martial Arts Apr 5, 2011 15:47 Apr 5, 2011 16:03 16m
#10 Cow Tipping Sep 26, 2002 15:14 Sep 27, 2002 13:25 22h 11m
#11 North London Gangs Nov 4, 2012 15:32 Nov 12, 2012 18:08 8d 2h 36m
#12 So Solid Crew Jul 12, 2011 4:50 Jan 5, 2012 9:01 177d 4h 11m

As you can see the Gin and Juice article was identically vandalized twice, I’ve broken that out in the table.

The median amount of time these twelve pieces of vandalism lasted before being removed was 16 minutes. Nine of the twelve were removed the same day, and two (numbers two and three in the list) were removed by automated robots the very same minute they were entered!

There are outliers, of course. The last three in the list stayed online for more than one calendar day, the last one (So Live Crew) for several months. But the cow tipping dates from the prehistoric era of Wikipedia – just a year after it was founded. And the So Live Crew list wasn’t a single piece of vandalism, but was a long series of edits over many months. Some were removed as the vandalism went on, the time I have above is when the entire list was finally removed.

Very similar to my results, a randomly sampled study of vandalism on Wikipedia in 2009 showed that the median time for removal was 4 minutes, and also found that in extreme cases some vandalism lasts months or even years.

So if most of these pieces of vandalism only lasted minutes, how does that translate into readers? How many people actually saw the vandalized articles? As I’ve written about before here, Wikipedia is transparent about that too!

Anyone can use the Wikipedia article traffic statistics tool to dig through the traffic statistics of any Wikipedia article dating back to around the first of 2008. That means we’ll have to guess a bit at the stats for numbers three, eight and ten in the list, but we can take a stab at it:

Total viewers that day (or period) Likely viewers of vandalism during time it before it was fixed
#1 Jack Pickles 477 29
#2 Thermodynamics 1882 2
#3 Batman 7395 * 6
#4 Gin & Juice (1) 285 2
#4 Gin & Juice (2) 285 3
#5 Kay Burley 25155 210
#6 Karen Gillan 4282 81
#7 Spot the Dog 1761 13
#8 Ron Atkinson 127 * 6
#9 Mixed Martial Arts 9592 107
#10 Cow Tipping 662 * 612
#11 North London Gangs 7,598 5,883
#12 So Solid Crew 89,098 85,241

* Estimated viewers using December 2007 figures for that article.

I’m just assuming here that the viewers for the day are spread evenly across the entire day, as the statistics tool doesn’t break things down smaller than a day. Then the time the vandalism was active was divided by the length of a day to estimate viewers.  It’s only a ballpark guess, as real page traffic is bursty and uneven.

But with those caveats, the median number of viewers who saw one of these vandalized pages was 29. Some were likely seen as few as two people, if at all – the robotic fixes were in place in less than a minute, possibly seconds.

Well, there you go – even a BuzzFeed comedy listicle can teach you something. Next time someone grouses to you about how much Wikipedia gets vandalized, you can tell them that most vandalism gets removed in mere minutes, and is barely seen by anyone – in most cases.

If you are interested in additional serious studies of Wikipedia vandalism, I suggest you start with the counter-vandalism unit here or consider this University of Minnesota study.