Wednesday, June 17, 2015

Is the Nash equilibrium the only way we all get laid?

Hollywood is not known for its accurate portrayal of mathematical concepts. This isn't really a bad thing, because let's face it: I'm paying $15 a ticket to shovel popcorn into my face, not use my brain. But now we're coming up on a month since the untimely demise of John Nash, pretty much inarguably one of the greatest mathematicians of our time (and my first high school celebrity crush). His death has thrust game theory into the spotlight (e.g. this retrospective or this silly video about game theory in the Greece situation from the NY Times and this collection of posts from the Economist). So in honor of the man and his greatness and the surge in public interest in some of his work, let's take a moment to look at one of my personal favorite deviations from mathematical accuracy in modern film: the idea that the concept of "Nash equilibrium" is in any way related to some overgrown frat boys commodifying women and snubbing hot blondes.

Haha, you think my thesis is about what?


Game Theory & Nash Equilibria

Here are the definitions we'll be working with (we're trying to avoid technical stuff and anything that requires a math background here, so we're going for the "convincing waving of hands" method of mathematical discourse). As far as we're concerned, game theory is the study of "players" in a "game" making up strategies in order to "win" said game. All that stuff is in quotes because while we can pretty much put any kind of strategic decision-making scenario into a game framework, we aren't necessarily only talking about games. We might be talking about economics, or psychology, or picking up chicks in bars. The specific branch of games we're going to care about are non-cooperative games: ones in which the players act independently. This basically mean that they're not forced to cooperate---though maybe they'll choose to do so anyway. There are no rules forcing players to act a certain way on the basis of the other player's actions. A strategy for a player in such a game is some set of rules telling us what that player's going to do.

We'll figure out where this whole Nash equilibrium thing comes in with an example. Let's take penalty kicks in soccer. To make this example useful, we'll assume some unrealistic stuff about soccer. For one thing, we'll ignore any influence of home field advantage, weather, other players, whatever. All that matters is the kicker and the goalkeeper. We'll also assume that the kicker isn't going to miss the goal.
Let's pretend this can't happen. Jason Puncheon would sure love to.
Furthermore, we'll assume that the goalkeeper can't instantly react to what the kicker is doing at the second when she starts kicking, so the goalkeeper's decision about which way to dive is going to have to be independent of that. This last one is pretty reasonable, at least.

OK, so the first question here is: what are possible strategies for the kicker, and what are possible strategies for the goalkeeper? For example, the kicker might decide that she'll always kick to the upper right. But if the goalkeeper somehow finds out this strategy (because she's reviewed a bunch of videos, or there's a mole on the kicking team), the kicker is outta luck because the goalkeeper will know exactly which way to dive. So having a deterministic strategy is out for the kicker. This means some randomness has to be involved. But extending our previous logic further: suppose the kicker still really likes the upper right corner, and her strategy says that with a 90% probability, she's going to kick up there, and with a 10% probability she'll do something else. The goalkeeper can still get an advantage if she finds out about this strategy, because she knows that diving toward the upper right will work 90% of the time. So it seems like the only thing that the kicker can do which doesn't become a worse idea if the goalkeeper finds out is to kick to a uniformly random part of the goal. That means that she's equally likely to pick any part of the goal and kick toward it (and since in math-soccer, intention translates perfectly into action, that's where the ball's gonna go). The kicker, in other words, has to show no preference in how she chooses where to place her kick.

How about the goalkeeper? By the same reasoning as above, the goalkeeper shouldn't have a deterministic strategy, either (if she always dives to the upper right, an informed kicker will just kick to the left). Same goes for showing a preference for diving in a certain direction. So the only strategy for the goalkeeper which doesn't suffer from discovery is to dive to a uniformly random part of the goal.

The underlying idea here is this: when a player tries on a strategy, she asks herself, "Does my opponent benefit from knowing that this was my strategy?" If the two players both have strategies such that the answer to that question is no, then this solution concept is what's known as a Nash equilibrium. In our version of math-soccer, the only Nash equilibrium is where the kicker kicks to a uniformly random part of the goal, and the goalkeeper dives to a uniformly random part of the goal.

Fun real-life fact: this kind of analysis, though more complex, works for real-life soccer, too! A bunch of people have worked on it, and the data shows that players do generally play the Nash equilibrium strategy.



The Movie Version
So did that scene in A Beautiful Mind describe a Nash equilibrium in any way? First, a refresher:


A personal note: Before we go on, I will freely admit that when I last saw this film in its entirety, I was in high school and it was one of my favorite movies ever. I didn't really learn what a Nash equilibrium was until I was in grad school, and the revelation that no mathematical epiphany was ever followed with the words "that's the only way we all get laid" hasn't impacted my enduring love for this movie (though it might have impacted my enthusiasm for my career).

OK, so that said, the most coherent summary I can come up with for Russell-Crowe-Nash's sudden insight is this. The work of 18th century economist/philosopher Adam Smith, according to this movie's cadre of econ-bros, can be summarized as "every man for himself." (I am not an economist, but a cursory reading of Adam Smith's wiki article shows this to be vaguely relevant, though overly simplified. We'll just take it as a given, because we've got other fish to fry.) Gladiator Nash notes---assuming an apparently obvious hotness hierarchy which puts blondes above brunettes---that the every man for himself strategy would result in zero babes for each dude, whereas a cooperative strategy would result in one medium-hot babe for each dude, plus a bonus karma bitch slap for the blonde who was obviously just trolling the bar with her ugly friends to boost her own ego.

The movie follows with John Nash slaving away for presumably an entire season at what will become his seminal work, implying that this OK Cupid algorithm he just came up with has something to do with the idea of what will eventually come to be called the Nash equilibrium (which got him his degree, and eventually a Nobel prize). But is the strategy described in the bar an example of a Nash equilibrium?

First of all, we should figure out who the players here are. Since the woman-shaped meat sacks on display obviously have no desires of their own, the only rational agents in this set-up are the men. We'll assume it's true that if they all go for the blonde, nobody ends up with anybody. Suppose that the guys follow the "go for the brunettes" strategy, instead. If I'm Master & Commander Nash, and I know that all of the other guys are going for the brunettes, would I change my strategy? Well, if I consider the blonde a better payoff than the brunettes, then yes---I'll know that the blonde is lonely and desperate for my chauvinistic charms, so I'll just go for her! So the thing described in the movie isn't a Nash equilibrium at all. (Though some efforts have been made to figure out what exactly would be a Nash equilibrium in this game.)

Of course, maybe the whole thing was a clever ruse to fool his less clever friends. Russell Crowe did get the blonde in the end.
link to img source
Wiiiiiink.


Thursday, May 21, 2015

The Unbearable Memorylessness of Being

Life is hard for pigeons. In ideal conditions, they can live about 35 years. But in the wild, pigeons die all the dang time. Between being eaten by hawks (or tortoises or pelicans or people or freaking catfish), hit by carsgetting sealed up in air ducts, and other grizzly fates that are likely to make the entire pigeon population turn on us in Hitchcockian retribution in short order, the average lifespan of a pigeon in the wild is about 6 years.

 
I think they're planning something.

Come, friend. Take a walk with me. Look at that random pigeon over there. How long do you expect it to live?

You know nothing of the pigeon's current age or lifestyle or the fact that he engages in maddening debauchery and pizza-eating binges on the regular. You are also not a fortune telling witch. Having nothing else to go on, you assume this is just an average pigeon, and you guess 6 years.

Quite right, my chum. I applaud your reasonable guesswork. But lo!---upon closer inspection, I realize that I do know this pigeon. His name is David Hasselhoff, no relation, and he's 4 years old. I tell you this, and ask you again. How long do you expect this bird to live? The Knight Rider cocks his head, eager to hear what you have to say.

Should this new information influence your response? On the one hand, if The Hoff was expected to live for 6 years and has already wasted 4 of them on pizza and hookers, it's reasonable to guess he's only got 2 years of wanton revelry left in him. So if you were to give that as your answer, I would not disparage you. But on the other hand---and it's always the other hand that holds the correct answer---is the correct answer. Which is 6. And here's why, amiga.

Let's assume that in any given month, a pigeon dies with probability 1/72 (where a probability of 0 means "definitely not happening" and a probability of 1 means "definitely happening"). We're choosing this number because there's a handy fact from probability theory that if an experiment has some probability p>0 of success and 1-p of failure each time (independently of previous experiments) and you repeat this experiment until your first success, it will take 1/p repetitions on average. In our case, our convenient choice of 1/72 tells us that if a pigeon will die with this independent probability of 1/72 each month, it should take 72 months (6 years) on average for the bird to die. By the way, this means that the time until the bird's death is something called a geometric random variable (more on random variables later).

We're making some simplifications here where we assume David Hasselhoff isn't learning anything from his hard times on the streets* (which could make him less likely to die in a future month) and he's not getting any more decrepit (which could make him more likely to die in a future month), but these are pretty reasonable: the first assumption probably makes a negligible change (though pigeons are pretty smart, so I dunno), and the second is consistent with the fact that the time until natural death is so much longer. We're also only interested in the month in which he dies, not the exact moment. If we were willing to consider time as a continuous thing, the situation would work the same way, but stuff would be harder to compute. More on this later, too.

Okay, so in our case the experiment is seeing whether the bird is alive or dead after a given month, the "success" is death, and the "failure" is life because that's the kind of people we are. The situation is that 48 months (4 years) have passed without a success. So here we are, asking how long it should take from now to see a success in this ghoulish bird experiment. We know that the next month will result in death with probability 1/72, and the month after that, and the month after that, and so on. So we're once again in the situation of looking at a geometric random variable ("time until Hoff-death") with probability of success equal to 1/72 at each month, which our handy fact tells us translates to having to wait 72 (more) months on average until the Hoff is successfully dead. The stuff that's already happened is irrelevant. That's because time-until-pigeon-demise is memoryless.



The Memoryless Property 

This is a pretty cool property for a thing to have, in an X-Men Origins Wolverine II: Memorylessness kind of way. In the pigeon context, the property says:

The probability that the pigeon lives for at least 6 more years given that he's already lived for 4 years is exactly the same as the probability that the pigeon lives for at least 6 years.

Here's how this looks in math terms. Let's let X be the age, in years, at which a pigeon dies. X is a random variable: i.e. a variable which takes on a value to be determined by some experiment. In your case, the experiment consists of you watching a pigeon and noting down its age when it dies, you grim bastard. The notation P(X ≥ 10) is going to stand for the quantity "the probability that X is at least 10." Expanding on this a little, we have the notation P(X ≥ 10 | X ≥ 4), which is "the probability that X is at least 10 given that X is known to be at least 4." Therefore the statement above, in this notation, looks like

P(X ≥ 10 | X ≥ 4) = P(X ≥ 6).

Stating this in a pigeonless context, a random variable X that takes on nonnegative values (i.e. only values that are at least zero) has the memoryless property precisely if no matter what nonnegative numbers s and t we pick, we have that the probability that X will be bigger than s+t---given that it's known to be bigger than t---is the same as the probability that X is bigger than s (given no prior information). In our new notation: 

P(X > s+t | X > t) = P(X > s) for any choice of s and t. 

It's kind of important to note that in order to talk about things without too much complicated stuff, we didn't treat our random variables as continuous (by just looking at the discrete years or months as options for when The Hoff could die instead of considering the full spectrum of time). In the discrete case we were considering, all random variables that are memoryless are geometric random variables. In the continuous setting, they're called exponential random variables, and computing this stuff requires a bit of calculus.

The Memorylessness of Things

There are lots of memoryless things, besides drunk David Hasselhoff. The time until a person who plays the lottery every month finally hits the jackpot, the distance between consecutive roadkill schmears on a highway, the time until your phone rings, the barometric formula (which measures change in air pressure with different altitude changes), how many randomly chosen people you have to interview until you find one that would make an adequate sacrifice to the pigeon horde, or how many Game of Thrones episodes we have to wait until we see one where Melisandre doesn't try to seduce anyone are all examples.




Let's look at home runs in baseball. David Ortiz's home run percentage is around 5.2% =13/250. Making the (not entirely unreasonable) assumption that each time he hits the ball, he's got an independent probability of 13/250 of hitting a home run, this tells us that on average he hits a home run every 250/13 ≈ 19 times he's at bat. So if Big Papi has gone 18 at bats without a home run, it's natural to feel that he's "due." But if he's really got a 13/250 chance of hitting a home run every time, his recent lack of homers doesn't make him any more likely to get one now.

A more intuitive example is coin flips. With a fair coin, you'll flip heads with probability 1/2 and tails with probability 1/2, every time, regardless of the past. The average number of flips until you see a head for the first time is therefore 2. That is, if you've gone 1, 2, 10, or 100,000 flips without ever having seen a head, you still have to wait on average 2 more flips before you see one. Of course, if you've actually gotten 100,000 tails in a row, there's a pretty good chance that someone's screwing with you, because the probability of this with a fair coin is around 10-30103. (Fun fact: for a frame of reference on how minuscule that is, there are about 1080 atoms in the known, observable universe, so the probability of flipping 100,000 tails in a row is about 1030023 times less likely than picking a particular one of the universe's atoms out at random. I don't know what the name for the number 1030023 is, but I did just learn that 103003 is called a millillion and 103000003 is a milli-millillion, and that seems like a good thing to know.) Then again, you just flipped a coin 100,000 times in a row, so you're probably beyond caring about atoms and millillions.




* But would a bird that's learned nothing go on to reboot its own bizarre public image to make a preemptively successful self-parodying reality show about himself or this pretty much fully perfect music video? Unlikely.

 

Friday, April 24, 2015

Benford and the IRS: A Love Story

If you're gonna cheat on your taxes, do it right.

I'm not saying you should cheat on your taxes or anything. I'm not. I'm just saying that there's a dumb way to do it and there's a slightly less dumb way to do it, and I'm gonna tell you this slightly less dumb way to do it, and then you're not gonna do it. 

It's all based on this one fact. Take just about any real life data set (a table of lengths of rivers, or the population of the largest US cities, or the street addresses of everyone who has eaten an apple in the last week)---at least one that's pretty large, and where the figures within aren't artificially constrained within a small range. Look at the leading digit of each of the numbers that appears in the data. You'd probably guess that among the 9 possible leading digits, you'd see 1 appearing roughly 19 ≈ 11.1% of the time, just like every other one of the numbers. But in practice, 1 is a leading digit something like 30% of the time! What? Yeah. This is a consequence of a little thing called Benford's Law, which says that the actual appearance of leading digits should look something like this in real life data sets:

(Cultural note: The exact numbers above are given by the difference between the logs, so the law says that 1 should appear as a leading digit approximately log10(2)-log10(1) ≈ 30.1% of the time, 2 should appear approximately log10(3)-log10(2) ≈ 17.6% of the time, etc. We won't discuss this any further, so if you want more rigorous math info, check out some other sources.)

So maybe you see where I'm going with this. If you're gonna cook the books by making up random-seeming numbers, you'd probably end up having leading 1's showing up about as often as leading 2's, 7's, 9's, etc. because humans suck at randomness. But a clever IRS agent who knows about Benford's law 
(and you better believe those pasty fluorescent-lighting-lit-basement dwellers know every trick in the book, learned after years of what I assume is indentured servitude in a that-scene-in-the-Matrix-where-they're-running-down-the-hallway-with-the-Keymaker type of government black site)

Copyright: The Matrix people?

would be able to spot that in his sleep, and slap you with a CP06 notice and Form 14950 request for your improperly filed 1095-A and Form 8962 right the dang away. Because maybe then the debt of his ancestors will be paid. Maybe then he'll finally see sunlight again. 

Copyright: Agent Smith?! I don't know!

Benford distribution in real life

In practice, data isn't going to follow the Benford distribution perfectly. One type of data which will be pretty far off from the Benford prediction is data that is constrained to a single order of magnitude ("orders of magnitude" are basically powers of ten---so 1,456 and 2,984 would have the same order of magnitude, whereas 1,456 is of a smaller order of magnitude than 10,233). If we were to write out all the numbers from 10 to 99 (which are all in a single order of magnitude), we'd have have each digit from 1 to 9 appearing exactly 10 times as the leading digit. This is not very interesting. But let's look at what happens when we have, say, the first 100 powers of 2 (and while we're at it, let's also multiply those digits by 3 or divide them by π or whatever other scaling you want to do):
The first 100 powers of two range from 1 to something like 1.9x1030 which means we have a range of 31 different orders of magnitude, and clearly this seems to fit much better. 

A convincing argument for why we'd expect data to follow this distribution is that if there is some kind of distribution that real-life data sets follow, then this distribution shouldn't depend on what units of measurement we use. So if we're looking at some data that's measured in kilometers, and it follows this magic distribution, it should still roughly follow the magic distribution if we change to miles instead. The uniform distribution (i.e. where each digit is equally likely to be a leading digit), doesn't have this property! But as evidenced by this power-of-2 example, the Benford distribution seems to. It's actually the case that the Benford distribution is the only scale-invariant distribution out there. 

Just how closely does natural data follow the distribution? Well, 1 definitely seems to appear more than most other digits, for most data sets, even if the Benford distribution isn't perfectly followed. Here are a handful of examples. The first chart below gives the leading digit in some data about the 295 most populated US cities. The second chart gives the leading digit in various data on a list of the 176 longest rivers in the world

The gray bars represent the numbers dictated by the Benford distribution, and the lines represent actual data. Some are decently matched to the Benford distribution (like the drainage area and average discharge for rivers) and others are pretty far off (like the length of the rivers and the population of US cities).

The worst fit is undoubtedly the length of rivers in miles, which looks wildly different from the length of rivers in kilometers. But we could have guessed that the Benford distribution would be a bad fit here by looking at the data. The range is 625 miles to 4,132 miles. This means that all of the 1's that appear as leading digits are from rivers that have length 1,000 miles to 1,999 miles. So the problem is that our data isn't spread over enough orders of magnitude for Benford's law to be applicable.

The first person to make a note of this surprising law was an astronomer named Simon Newcomb. Back in the late 1800s, people used log tables to do computations, which were basically long books with pages and pages of values of different logarithms on them.

A log table. Photo credit: agr.
Newcomb noted that the pages which had numbers starting with 1 were way more worn out that the later pages. So basically we can thank this discovery on the fact that new versions of log tables weren't coming out at an iPhone rate (I guess because logs haven't changed much over the years) and everyone had crappy old books full of worn out pages. I guess people weren't that impressed with Newcomb and his old book because it isn't called Newcomb's law. A guy named Frank Benford published some statistics about a mind-boggling number of different data points about half a century later and people named the law after him. Which is fair enough because there were 20,229 observations in the paper.

Benford and fraud in practice

Before you go off testing too many data sets, it should be noted that there have been cases where someone was accused of faking data because it didn't fit the Benford distribution, but it turned out that the data was not actually fake. One such embarrassing misapplication of the Benford law was actually done by the US Secret Service. It's important to note that just because data fails to satisfy Benford's law doesn't mean that it's falsified. Benford's law doesn't say that there cannot exist a data set that's uniform, for example.

There are people who actually use Benford's law to analyze whether tax documents are likely to be faked or not. An accounting consultant analyzed Bill Clinton's tax returns. He came out looking pretty good. So either he didn't cheat on his taxes, or he cheated on them pretty well.

So if you try to use Benford's law in your daily life, be a Bill Clinton, not a Secret Service.

Sunday, April 19, 2015

The Unintuitive Nature of Randomness

Humans suck at randomness. I mean we really, really don't have a good intuition for what randomness looks like. What follows is an example that illustrates this total intuition blind spot.

Whenever I teach probability in a math class, I like to start with this experiment (partly because I want to show that intuitive BS arguments are likely to be totally false in this class, and partly because this experiment is neat and memorable and I'm pretty into what my students think of me). I split the class into, say, twenty groups. Ten of the groups are told to flip an actual coin and record the results, while the other ten groups are told to make up a plausible-seeming coin flip sequence and write it down. I leave the room for a few minutes, and when I come back I figure out which results are real and which are made up just by looking at the sequences. In practice, I end up being correct about 93% of the sequences, which is a number I just made up (no but it's accurate). Sorcery! I only look at each of the sequences for a few seconds, and I'm sadly not a mental arithmetic savant or anything. So how does it work?

The answer is that I'm using the aforementioned central tenet of human probability-intuition: we don't have it. My one check is this. If there is a run of length 5 or 6 or longer (by a run, I mean that the same result happened in consecutive flips, so for instance the sequence---where here a white coin represents a tail and a black coin represents a head---


has a run of one tail, followed by a run of two heads, etc.), then I guess that this sequence was generated by actual coin flips. If the longest run is of length 4 or less, then I guess that this sequence is made up. Here's why this works most of the time (we'll talk about where these numbers come from later):

  1. The actual probability of having a run of length at least 5 in a sequence of 100 coin flips is around 97%. The probability of having a run of length at least 6 is a bit over 80%.
  2. Most people not only don't know this first fact but assume that long runs are actually pretty unlikely, so most of the time the made up sequences don't have any runs longer than length 3. The probability that a sequence of 100 coin flips has longest run length 3 or less is around 0.03%.
Let's look at an example. Here are a pair of sequences of coin flips. Each one represents a coin that was flipped 50 times in a row (because 100 didn't fit nicely on the page), with the results recorded each time. Does one look more "random" to you than the other?


or



Both look like something that's reasonable to expect as an outcome, but one of these was generated by an actual series of random flips (simulated by a random number generator, anyway) and the other one is made up by hand to "look" random. From the facts outlined above, a good guess is that the first one is the made up one. The first sequence has longest run length 3, which has about a 2% probability of happening, and so is pretty dang unlikely. The second sequence has longest run length 5, and the probability of a sequence of 50 coin flips having a run of length 5 or longer is about 82%. Not definitive, but likely.

The first thing that goes wrong at this point in the class is that someone takes this to mean that sequence 1 is less likely to come up than sequence 2. In fact, both sequences are equally likely. What? But you just said---NOPE. Think about it this way. Each time you flip a coin you have a 1/2 chance of getting heads, and 1/2 chance of getting tails, and these events are independent of each other (there are mathy definitions for what we actually mean by event and independence but our colloquial understanding of these terms will do here). So the probability of getting


is (1/2)(1/2)=1/4. The probability of getting

is (1/2)(1/2)=1/4. OK, you get the idea.

But this seems kind of weird. Suppose that a friend invited you to play a game in which you must pay him $1 every time a coin comes up heads, and you'll get paid $1 every time a coin comes up tails (which certainly seems like, and is, a fair game). If the first ten coin flips are



then this seems legit. Heads and tails have both come up five times, so the game is tied up, and that seems like a reasonable outcome for a fair game. Now suppose that the first ten coin flips are


You'd probably be suspicious as heck because you have yet to make any money and your friend has already won $10 of yours. (If you're not convinced yet: imagine you get another ten heads in a row. Your friend is obviously cheating somehow. Another ten heads, for a total of thirty in a row, and you've cut all ties with this former friend of yours and don't regret it one bit because he's a filthy lying scumbag.) Well, the probability of getting 10 heads in a row is (1/2)10 ≈ 0.00097, so you're right about the fact that he's probably a scumbag. The reason that the outcome that gave five heads and five tails seems less dubious is that there are actually 252 different ways to flip five heads and five tails (because 252 = "10 choose 5," but we won't go into that). There is only one outcome that gives exactly ten heads. So while any particular outcome that consisted of five heads and five tails is only going to occur with minuscule probability (1/2)10, one of these 252 outcomes will occur with probability 252(1/2)10 ≈ .25, or 252 times more often than the all-heads scenario.

So that's a quick example that helps illustrate the fact that human beings just aren't good at "sensing" randomness, and all the education in the world doesn't change our crappy intuition, even if we should know better. In the immortal words of Ygritte and Seth Meyers, when it comes to randomness:



Bonus math stuff

Skip this part if you've had enough.

If you're wondering where the probabilities referenced above come from, here's a first, easier-to-answer question that can improve our misguided intuition. If you repeat the flipping-100-coins experiment many times (like 1,000,000 times), how many runs of length 5 should we get on average? So, we have 100 flips, and a run of 5-in-a-row can start at flip 1, flip 2, flip 3, ... , all the way out to flip 96 (if we started any later than that, we'd run out of coin flips). So there are 96 places for a length 5 run to start. As we discussed above, any particular set of 5 flips is going to be all-heads with probability (1/2)5 and all-tails with probability (1/2)5, for a total probability of 
(1/2)5+(1/2)5=(1/2)4=1/16. 
If we want to figure out the average number of length 5 runs among these 1,000,000 coin-flipping experiments, we can simply add up the average number of length 5 runs among each of the 96 sets of 5 consecutive coin flips. On average, each one of these is a length 5 run 1/16 of the time, so in total we have 96/16 = 6 length 5 runs. Note that the way we're counting them, the sequence 


actually has two length 5 runs, so these aren't necessarily separate runs. But still, if on average there are six in an experiment, intuition says that it should be pretty likely that we find at least one. 

Okay, so that's intuition-building. But we did say that our intuition is not to be trusted, so you really shouldn't be satisfied right now. To get the actual numbers claimed above, we can do some clever but pretty complicated math, or we can simply run a simulation. Here's an example of what a simulation looks like, in a piece of math software called Matlab (there's a free Matlab equivalent called Octave that has most of the same functionality). If you have access to this program, you can run this function as written. If you have some facility with some other programming language, you can write a program like this one. If you have no idea what I'm talking about, just ignore what follows.

function probability = runCounter(run_time,num_flips,n)
% runCounter outputs the proportion of simulations among the run_time
% simulations tested in which there was a run of length n or greater among
% the num_flips flips of a coin.


longest_runs = 0;
run_of_n = zeros(1,run_time);
    
for i=1:run_time
    longest = 1;
    counter = 1;
    flip = 2;

    for j=1:num_flips
        last_flip = flip;
        flip = randi(2)-1;

        if flip == last_flip 
            counter = counter + 1;
            longest = max(longest,counter);
        else
            counter = 1;
        end
    end
    
    if longest >= n     
        run_of_n(i) = 1;
    end
    
    longest_runs = longest_runs + longest;
end

probability = sum(run_of_n)/run_time;

end

Tuesday, April 14, 2015

The Four Color Problem: the Illustrious History

Remember being a kid being dragged to a restaurant with grownups? The one consolation was when they'd hand you a paper placemat with the outline of the US or a parrot or something and that set of four crayons. Why was it always only four crayons? Come on, Friendly's, more than half a billion in revenues and you couldn't give us one more dang crayon? Dang! Well, turns out you really didn't need to complain so much, at least if you were going for efficiency in your coloring practice. Four crayons isn't a whole lot, but there's a pretty deep math theorem that says it's enough.

Theorem: Four crayons suffice.

But okay, we should probably be clearer about what that means. Let's think of the uncolored picture on the placemat like a map. You can use whatever color you want in each region, but you don't want two regions that border each other to get the same color. So let's restate the theorem with this constraint.

Theorem: Suppose you have a picture, consisting of any number of regions, that you want to color so that no two bordering regions get the same color. Then four crayons suffice.

And if that isn't applied math, I don't know what is. If you doubt the direct applicability of this theorem to your life, just buy yourself a grown-up coloring book and four colored pencils. It should be noted that though you'll never need more than four colors, it's not always easy to actually find a way to color a picture with only four. There was even a famous April Fools' Joke gone awry in which Martin Gardner published a picture that they claimed required five colors to color, and a bunch of mathematicians actually believed this meant that the theorem had been disproven. Here's Gardner's picture. See if you can color it using only four colors.

Fig. 1. This picture requires five colors to color, haha jk lol April Fool's it only takes four.


By the way, can we think of any uncomplicated examples where we need four? If you draw a few simple pictures without thinking about it too hard, you may not end up with anything that requires more than three colors. To illustrate where three don't suffice, here's a picture of a bird. 
Fig. 2. This is a bird, okay?
You might be wondering if this is really a picture of a bird and if so, what is with that middle thing that has to be its wing, and can a bird with a wing that looks like a potato fly or is this one of those flightless things like a dodo or a penguin, and my answer to you is that we're talking about MATHEMATICS here, not art or zoology, so focus. Here is an example of how we can color this bird using only four colors, which is all this tuber-winged air skunk is good for right now.
Fig. 3. Now it's obviously a bird, right?
Convince yourself that you couldn't get away with three colors here. (The problem is with the "wing" in the middle. It's touching three other regions, each of which are touching each other, so coloring those four regions with any three colors would necessarily result in two bordering regions sharing a color. I don't know why the mouth-triangle and the eye are red, okay, so stop asking. Maybe the bird is a demon and it bathes its flesh-tearing beak in the blood of the vanquished in sacrifice to its unholy idol. Look, you're getting really distracted here. We have to get back to the math.)

Okay, so four colors are sometimes necessary, and the theorem---clearly true because it is in bold---tells us that we'll never need more than that. The theorem is so easy to state that you might think this problem is not very substantive. To address that, let's talk a little about this problem's lengthy and colorful (ha, ha) history, which should quickly give the idea that proving that theorem is going to be Really Dang Hard.


Fig. 4. How it all went down.

It's 1852 and Francis Guthrie, a future-mathematician-slash-botanist in London, is coloring a map of the counties of England, because math grad school leaves you with a lot of free time. Hold up, he thinks. He's in England though so it comes out, Hark! Verily! What if like, one day, there were chain restaurants that distracted small children from their ennui with sets of crayons? How many crayons shall suffice? Based on his map, I guess, the answer seemed to be four, and he couldn't come up with a map that needed more than that, and lo!---a conjecture was born. A couple of years later, he (or maybe his brother? or maybe someone else entirely with the same initials as him?) published the problem in The Athenaeum, the New Yorker of 19th century London, but fancier because there are three vowels in a row in its name.  But nobody cares what a grad student thinks, and what kind of self-respecting fancy pants needs to concern himself with such paltry questions? (Fancy people have lots of crayons.) So it wasn't until Guthrie's super-famous advisor Augustus De Morgan took an interest in the problem that people really got into it.

De Morgan wrote to this guy, Sir Hamilton, and told him about the problem. Sir Hamilton wrote back,
I am not likely to attempt your quaternion of colour very soon.

Because that is how people talked and it was awesome. So after sitting on it for a few more years, De Morgan published the same problem in the same dang magazine in 1960, and that got attention. Francis was probably pissed that his advisor stole his glory, though.

So now all these high-powered mathematicians are into it, but nothing happens for a while. And then, 19 years after De Morgan's publication, this barrister Alfred Kempe comes out of nowhere with a proof that impresses the pants off all these hoity-toities. Barrister is British for "lawyer in a wig" I'm guessing. Everyone is impressed as heck. Francis Guthrie was a barrister for a while too, by the way. They were both trained mathematicians as well, but the wig game isn't as strong in that industry.

The next year, another guy comes on the scene with another proof. This guy, Peter Guthrie Tait, is actually a physicist. He also says something like,

The four color conjecture is true if and only if no snarks are planar.

Nobody knows what he's talking about, so they just nod. By the way, Tait studied the ozone, which is kinda neat, and quaternions, which are weird kinds of numbers. He also hung out with Sir Hamilton from earlier and was really into golf.

Things are pretty good for a while after that. But then in 1890, Percy Heawood finds a flaw in Kempe's proof, and then in 1891, Julius Petersen finds a flaw in Tait's proof, and we're left totally proofless. Heawood is a gloriously mustachioed but tiny little dude who wears a cape, carries a ratty old handbag, and always brings his little dog to his lectures. Also, he goes by Pussy, so.

Pussy salvages Kempe's proof somewhat and uses it to prove the Five Color Theorem by using something called Kempe chains. The Five Color Theorem says that five colors suffice to color any picture (with the same constraints we made earlier). The argument is pretty short and sweet and is taught in undergrad math classes now. But back in 1891, we're still nowhere with the Four Color Problem, and over the next 64 years pretty much all that happens is that this guy Hugo Hadwiger comes up with a problem that's a vast generalization of this one, and it's still unsolved, and it just serves to depress everyone further.

Meanwhile, in Germany, Heinrich Heesch is doing his thing at his university, and then the 1930's come along and the Nazis say that you can't work in a university unless you become a member of the Nazi party. (Hitler tries to do a lot of crazy stuff at this point, and we'll just fast forward.) Heesch decides on a firm no thanks to that, and quits. He moves into his parents' basement in 1935, where he stays until 1948 (which seems longer than strictly necessary, given that WWII ends in 1945). When he emerges, he's awesome. This is basically a mathematical superhero origins story. In 1955, he starts working on using computers to prove stuff. Which we should take a second to appreciate, because this is what computers looked like at the time.

Fig. 5. What the hell is this?
This starts an arms race throughout the 1960's and 1970's for who can use computers to solve the Four Color Problem, and basically everyone is using Heesch's methods. Heesch is close, maybe, but it's a couple of Americans, Appel and Haken, that finally beat him to it in 1976. Their paper is long and basically unreadable, and they themselves admit it:
...50 pages containing text and diagrams, 85 pages filled with almost 2500 additional diagrams, and 400 microfiche pages that contain further diagrams and thousands of individual verifications of claims made in the 24 lemmas in the main section of text. In addition ... certain facts have been verified with the use of about 1200 hours of computer time and would be extremely time-consuming to verify by hand. ... few mathematician have read [the paper] in any detail. 
The upshot is that not a whole lot of people actually understand the proof, and to this day there are those who think that using a computer to prove something is controversial. Some mathematicians are still looking for a non-computer proof, but nobody has succeeded.