All models are wrong.: 2012

Sunday 30 December 2012

From the archives: goalfest in the Premier League

Yesterday evening saw one of the highest scoring English Premier League matches of all time. Arsenal's 7-3 drubbing of Newcastle joins just three other 10-goal games, although Portsmouth's 7-4 victory over Reading in 2007 remains the outright claimant for this particular accolade.

This was just one part, however, of a Saturday chock-full of goals, with a total of 35 scored across the eight matches played. This put me in mind of an article I wrote for Significance early in 2011 about an even more extraordinary day of football. Back then, Arsenal and Newcastle were at it again, with the latter's stunning four-goal comeback contributing to a whopping 41 scored across that day's eight Premier League games. In the article I used this as an excuse to show off the Poisson distribution, demonstrating how goals scored in football matches can be modeled surprisingly well by what is ultimately a (fairly simple) mathematical formula.

The remaining two matches of that particular weekend of football only produced two more goals, bringing the total for a complete 10 match 'round' of Premier League fixtures to 43. Based on the Poisson distribution (and assuming an average of 2.6 goals per game) I estimated there was a roughly 1 in 720 chance of seeing at least that many goals in a set of 10 games. This weekend's football was almost - but not quite - as remarkable, with six goals today bringing us to a total of 41 across 10 matches. Based on the same theory, this works out to a 1 in 250 occurrence.

Thursday 20 December 2012

Christmas games theory

Through a coincidence almost as implausible as the Champions League draw, today also sees a new article of mine go up on Significance. The idea was simple: find interesting stats/maths things to say about popular Christmas-time board games. Not making the cut was a discussion of the inevitable unfairness of dice (and an excuse to talk about Awesome Dice Blog's 20,000 roll experiment to compare two manufacturers), and how to use the Markov chain nature of a game of Snakes and Ladders to estimate how long it will take you to finish (40 rolls in the MB version, apparently).

What are the chances: Champions League draw exact repeat of rehearsal

The draw for the UEFA Champions League knockout stage took place earlier today, but it wasn't just the prospect of some truly mouthwatering ties making the headlines.

Observant eyes spotted a seemingly spectacular coincidence: today's draw was a near-identical repeat of that produced during Wednesday's rehearsal. While the order the ties came out was different, every single match featured the same pair of teams. So what were the chances of that?

It seems almost unbelievably unlikely, and an 'ESPN statistician' apparently gets an answer of roughly 1 in 2 million. On face value this makes sense: there are about 2 million possible ways of drawing 8 pairs of matches between 16 teams (the first team has 15 opponents to 'choose' from, then once that match is decided the next team has 13 opponents to choose from, and so on, giving 15 x 13 x 11 x 9 x 7 x 5 x 3 = 2 million (ish) possibilities). Unfortunately, this overlooks a large number of factors that drastically reduce the number of possible matches.

First of all, those teams who qualified as group winners in the previous stage of the competition can only be drawn against teams who qualified as group runners-up. With 8 teams in each 'half' of the draw (so to speak), this immediately drops us down to 8 factorial, or 40,320 possible matches. On top of this, however, no team can be drawn against the other team who qualified from their group, or even a team from the same football association. With 2 Spanish, 1 English, and 1 Italian team on each half of the draw, the number of 'valid' draws becomes even smaller.

The upshot of all of this (and an admittedly lazy brute-force approach) is that there were just 5,463 possible draws that could have been made that satisfied all of these rules, giving chances of two identical draws in a row of about 0.02%. That's still pretty staggering, but nowhere near the 1 in 2 million we started off with.

Update: It has been pointed out to me that while there are 5,463 possible draws, due to the mechanics of the draw process itself (the specifics of which I was not aware of at the time of writing) not all draws were equally likely. However, the different draws do still have very similar probabilities, and there is nothing obviously special about this particular combination of fixtures. More on this to come.

Monday 3 December 2012

Book review: How to study for a mathematics degree

My first foray into book reviews (well, since primary school, at any rate).

Tuesday 6 November 2012

Winning the White House (the easy way)

In a bid to create a more easily accessible list of my articles for the Significance website, I'm going to start updating this thing with links to them (maybe with some bonus content if there is anything I couldn't fit in).

It's (US) election day, so here's an article about how to win the Presidency with as small a share of the popular vote as possible. In the process of doing this, I tried a few other things that I didn't think quite made the cut.

After dealing with some trivial(ish) cases, I base my calculations in the article on turnout at the 2008 election. However, I thought I'd see what happens if I assume every member of the electorate voted. Unsurprisingly, not much changes: the minimum popular vote share required increases only a little to 22.3%.

I also looked at the analogous problem for becoming Prime Minister (assuming I had somehow persuaded a major political party to appoint me their leader). The maths is a lot more straightforward for the UK, as we just have one parliamentary seat per constituency (so none of this electoral college business). With 650 seats available, I'd simply have to win the 326 constituencies with the lowest turnout to form a (very) minority government. Admittedly, our politics are rather more complex than the two-party system stateside, but if we continue to assume 50% of the vote is required to win a constituency, then at the last general election a party could have won with just 22.1% of the national vote (the Liberal Democrats, by comparison, received 23%).

Tuesday 19 June 2012

Eurovision 2012: split jury/televotes are in!

After several months of inactivity on here (mainly owing to the combined powers of a PhD thesis and steadily worsening eyesight), submission of the former and surgery to remedy the latter means I suddenly find myself with considerable time of my hands. Fortunately, I have found the perfect excuse to get back into the blogging boat (that's a thing, right?): the Eurovision Song Contest.

It may not seem the most topical of subjects, with this year's competition a relatively distant memory (if you don't recall, try and remember when it wasn't raining), but the long-awaited split jury/televote results are finally here, allowing for one last bit of fun. In case you're unfamiliar, the points awarded by countries in the Eurovision Song Contest are determined by an equal weighting of votes from the public (the televote) and a jury of music 'experts' (the jury vote). Introduced in 2009, the idea was to try and water down the alleged 'political' voting that many (particularly on these shores) thought was spoiling the contest, with juries theoretically providing a much more objective assessment of the songs. Whether this works in practice is for another time, but for now we can at least have some fun seeing how various countries performed under each system. This year's highlights include:

There can be no argument over Sweden's triumph, with Loreen finishing on top in both votes. This makes it three times out of four that the public have chosen the same winner as the juries, the only exception being Azerbaijan's win in 2011 where the jury placed them a lowly second (behind Italy). In terms of who actually wins, then, the jury vote has yet to make a difference.

That said, Russia pushed Sweden extremely close in the televote, with the grannies' 332 points only just behind Loreen's 343. As many expected, Russia fared far worse with the juries, finishing in a fairly unremarkable 11th place. (How much of this was the 'mother Russia' vote, and how much was down to adorable grandmothers and a spinning pastry oven is, however, difficult to judge.)

The biggest winner under the televote was Turkey, whose 4th place with the public was a whopping 18 places higher than their position according to the juries. (It seems transforming into a boat gets you votes, and quite rightly too.)

Other big televote winners were Ireland (10th with the public, 25th with the juries) and Romania (7th with the public, 20th with the juries). Next year then, I would recommend Ireland teach Jedward how to sing and play the bagpipes, and victory could be theirs.

At the other end of the scale four countries (Italy, Spain, France and Ukraine) all finished 13 places worse in the televote than with the juries. France in particular owe a lot to the jury voters: had it been purely down to the public they would have finished on the (in)famous nul points for the first time in their history. (Perhaps they would have fared better if they could have used their delightfully bizarre official video.)

As for our very own Engelbert Humperdinck, who finished 25th out of 26 on the night, these results don't offer much comfort. Whilst many (as usual) blamed our poor placing on Eastern European bloc voting, these new data suggest that maybe it just wasn't a very good song. The Hump finished 21st with the public, but with the juries? Dead last.

There may well be more gems lurking, but these were my personal highlights. I'll leave you with what is becoming something of a tradition from these parts, with a map of Europe summarizing how countries in the final compared on the two different votes. It's worth bearing in mind that this is something of a simplification, but it gives (I hope) a reasonable impression of this year's competition. (Click for big.)

All models are wrong.