When I’ve recommended a book to more than a dozen people and
bought a few copies as gifts, I like to distill my notes one more iteration. So I’ll jot
down here what I liked most about Algorithms to Live By by Brian Christian
and Tom Griffiths. Griffiths is a
psychologist and cognitive
scientist at Berkeley and Christian is a science
writer. The book rolls out a series of optimization
algorithms discovered by mathematicians and computer scientists but presented at the
human scale. These solutions help when your system crashed and they also help on your
bookshelves, in your communication with friends, when parking, and in your refrigerator.
Marriage Problem – Look and Leap
This is called the Secretary Problem but I think marriage is more
interesting. If you’re looking for “the
one,” exactly how long should you play the field? Assume you can’t go back and propose to an
old girlfriend. First decide what age
you’d like to marry, count the years to then and shop around for 37% of that
time. During that period note the “best”
candidate but don’t marry her but after that 37% time period marry anyone you come across who's better than that. This will maximize your
chances. Of course it’s a bit more
complicated than that… But let’s say
you could determine “best” in a reasonable way, what is your chance of marring
the “best” of them all this way? 37%.
Oh so now you say you can
go back to an earlier date and propose?
And there’s a 50% chance she’ll accept?
Then look for 61% of your time then leap, simple as that. Oh, marrying just for the money, and you can measure
your date’s net worth easily? Then set a
threshold at 95th percentile, marry the first who’s worth more. But that threshold will fall as you exhaust
your pool, the tables are in the book.
By the way, best chance of getting the
wealthiest this way: 58%. See how fun it is?
Setting a Home Price
Sell a house like this, if you know the high and low
expected bids and can calculate the cost of waiting. You can calculate your threshold price, which
you apply immediately and never change. If the range is $400k to $500k, it’s a slow
market and waiting costs $10k an offer, hold out for $455,279. The graph is in the book.
Parking
Parking a car and know the occupancy rate? With a 99% of spaces taken, you should start
looking ¼ mile from your destination. If
it’s 85% full you can drive within a half block.
Getting Caught
Oh you’re a burglar and if you get caught you lose
everything? You want to know how many
burglaries to do? Just take the chance
you get away and divide it by the chance you’ll get caught, burglarize that
many times, then quit.
Explore vs Exploit
How long should you shop around for new friends or
restaurants (explore) and when should you stick with your favorites
(exploit). That depends on your time
frame. Finding a new great restaurant
isn’t going to be worth as much if you’re about to move out of town. So explore early on, then exploit. There are several models you can use: “The Gittins index and the Upper Confidence
Bound ... inflate the appeal of lesser-known options beyond what we
actually expect, since pleasant surprises can pay off many times over.” I’ll leave the details for the authors to
explain.
Adaptive Trials
Adaptive trials are interesting, they allow you to gradually
phase out the less promising of two experimental clinical techniques. Imagine starting 50% a-50% b until the A’s
appear to do better, move to 55-45 ... and so on, there's a precise algorithm. You can drastically reduce the number exposed to an inferior treatment. But clinical trials are usually not done this way.
Sort and Search
The discussion of Sort-Search tradeoffs was great; it
introduced Big-O notation. Big-O is how
processing costs change with size (n). The example
was a dinner party. You have to clean
your house once regardless of n, so cleaning is “Big-O of 1”. Passing a dish around the table increases
linearly with every additional guest: “Big-O of n”. Each guest arriving hugs all the guests
already there … that’s quadratic time: “Big-O of n(squared).” Exponential time, “(Big-O)(2 raised to n)”
would happen if each guest doubles your work.
“Big-O (n!)” – factorial time is so much worse. That’s like randomly shuffling a deck of cards until they happen
to fall in order.
So how does Big-O help with sorting?
Bubble sort is Big-O of n (squared). Look at every adjacent pair of books in a
bookshelf and switch them if they’re out of order, then shift over and do it again.
Who would do this? A computer would, or
a slow person with bad eyesight. In
practice Insertion Sort, in which you remove all the books and place each on
the shelf correctly, is not much better than bubble sort although prior
knowledge might save a lot of time (M is in the middle, start looking there….).
Mergesort is the punchline.
Sort smaller batches then shuffle-sort those into bigger batches, repeat
and wala. But if you are not expecting
to search much, why bother, just use
Bucket Sort: put them into categories and quit.
Save a lot on the sort, pay a
little on the search. “Sorting something
that you will never search is a complete waste; searching something you never
sorted is merely inefficient.” (72)
Ranking Athletes
There’s a whole section on competition scoring, like for athletes or sports teams. The problem with Single
Elimination tournaments, where one loss knocks you out, is that while it finds
the best team, “second place is a lie." Round-Robin, where each team plays every other team – Big-O of
n(squared) … so many games required! And so many boring ones. Ladder Tournaments, where each player can challenge the next best is a bubble
sort. The most popular, Bracket
Tournament, divides the field in half at each stage. It’s merge sort. March Madness takes 64 teams to 32, then 16,
then 8, then the “final four” before the determining match. It’s Big-O(n log n). With 64 teams to start it reduces the number
of games needed from 2,016 with Round Robin to 192 games to find the best
team. But it doesn’t find the second
best.
Pecking Order
Chickens have a pecking order, for real. They use displacement
where one member just knows it’s not
worth trying to compete with anyone except that one directly above or below. So place is relatively easy to establish. If crook-beak
beats bug-eyes who just outpecked you, well you just don’t have to fight
crook-beak – you know you’ll lose.
That’s ordinal ranking. In comes
a newcomer. He’ll have a rough go at
first, finding his place, but then it’ll be tensely peaceful again.
Race vs Fight
If it’s a race rather than a fight, it’s not ordinal that
matters, it’s cardinal scores and all so much easier. The skier places precisely by racing the
clock in a couple of runs, where the cage fighter has to take on one nasty
opponent after the other. The authors
explain, “Much as we bemoan the daily rat race, the fact that it’s a race rather than a fight is a key part of what
sets us apart from the monkeys, the chickens – and for that matter, the rats.”
P 83
Caching
There are many ways to purge memory, the crudest maybe is Random
Eviction. Maybe someone with dementia suffers this. Another method is First-in-First-Out,
the oldest things must go. Clairvoyance – using future information -- is best if you can get it, and there’s an formula for
that too: Belady’s Algorithm.
When it all shakes out there are times to use each of these
but generally speaking the last thing
we can expect to need is the thing we used the longest time ago. That’s Last
Recently Used. So if you always put your
books back on the left side of the shelf, if you return your file folders to
the front of the drawer, if you hang your used shirts to one side of the rack …
that’s not a bad idea. So those papers on
top of your paper piles are probably the ones you will want to grab. Sweet! It's that filling drawer on its side. With this line of reasoning, throwing your clothes on the floor actually makes some
sense. I'll tell my son. If you think about it, the brain
works this way, pretty much. Those
things in the more distant past, what we haven't thought of for a long time are the ones we are likely to forget -- and are
the ones we can most afford to lose. Thank
you natural selection.
Sequence
Then there is the order in which we should actually do
things we need to do. Getting Things
Done, the organizational system which I try to use, recommends doing the quick
things immediately. Others will tell you
to do the hardest things first, or the fun things first, or the oldest or most
recent things first.
Gantt charts help
optimize order of operation. For
example, when you have many loads of laundry that need washing and drying find
the one with the shortest cycle. If it’s the washer do that load first,
if it’s the drier do it last, repeat for all loads. You maximize the time both washer and drier
are running, and minimize your time at the laundromat.
If you have a lot of tasks with deadlines you can follow an
algorithm for “minimizing maximum lateness” by prioritizing those with the
Earliest Due Date. First things
first. But if you want to minimize the
sum of lateness, use Shortest Processing Time (always do the quickest task
first). If you can weight each task by
importance just multiply that weight by the time required. “Only prioritize a task that takes twice as
long if it’s twice as important.” (111)
Some tasks can be given an “allow priority,” that is, they aren't high priority in planning but kick in over another when needed. We do this intuitively with bathroom breaks; when
you need one that task trumps most others. If you don’t plan this allowance into a project design
it could spell trouble as the authors point out happened on Mars Pathfinder in
1997 when it thrashed just after landing.
Thrashing
Thrashing is when a system is running at full bore and
accomplishing nothing. In this case some
tasks should simply be interruptible.
Switching tasks (context switching) comes with a cost. Sometimes you can reduce the cost of context
switching by clustering or coalescing tasks – the author suggests scheduling a
“bill paying day” when you get out your checkbook ... once. The GTD system helps with this by coalescing
actionables into folders so you can tackle related projects together. The U.S. mail coalesces correspondence for us. Office hours coalesce interruptions. Answering machines do too. It’s interesting to look at our daily
activities this way.
Bayesian Probabilities
The book goes over Bayes" Rule which combines probabilities
to overcome intuitive traps explaining things.
You basically use known probabilities for hypothetical pasts, figure the
chance they would deliver the known outcome, and work backward to find the most
probable cause. The example given was a random pull from a bag of coins containing 9
fair coins and 1 two headed coin. It
flips heads. How likely is it to be one
of the fair coins? Calculate the chance
of a fair coin being drawn and the chance of it flipping heads that’s 90% X
50% -- and compare it to the chance of
the trick coin drawn and flipping heads (10% X 100%) That’s 45 / 10 or 4.5
times more likely to be a fair coin.
Simple when you think about it, but hard to intuit.
Laplace took it further.
His law predicts that if you try a lottery only once and win, an
estimate of 2/3 for the portion of winning tickets is better than 100% or 50:50. It’s always the number of wins +1 divided by
the number of attempts +2. So if the bus was late 3 out of 12 times the chance
of it being late today is 4/14 or 28.6% . This one I didn’t get, but I suppose that if
68 of the other 70 insights in the book did make sense – that would be 69/72 or
96% chance this one should too? It’s on
page 131
Longevity
Here’s a real cool simple tool. If you want to predict how much longer something
you see will last and you don’t have anything to compare it to, find out how
long it’s been around and guess that. You
simply assume that the timing of your sample is random, so if it’s
normally distributed the best guess for current moment is smack in the
middle, top of the bell curve That’s the Copernican Principle,
good when you know nothing. So how long
will North Korea last? Let’s see…
2017-1948 is 69. So … 2086. Oh my.
If your phenomenon has a known distribution you can do
better with the Multiplicative Rule .
The example given was movie gross receipts. Most movies make little or nothing and some
are blockbusters. This is a power law
distribution. There is plenty of past
data and the calculated multiplier that fits this particular distribution is 1.4 so if
you hear a movie made $10,000 so far … best guess is a total of $14,000. A movie grossed $6m? Probably it will quit at 8.4.
Here’s the difference between normal distribution which call for the Average Rule, and power law distributionss that use Multiplicative Rule:
“Something normally distributed that’s gone on seemingly too long is bound to
end shortly; but the longer something in a power-law distribution has gone on
the longer you can
expect it to keep going.” (140) The
third method is the Additive Rule, good for things that are “memoryless,” that
is, you have priors but they don’t follow a regular pattern; these are things
with a wing-shaped Erlang distribution.
Then predict a constant amount of time.
Like for a slot machine, when is it going to pay off next? After every pull, win or lose, the prediction
is n more pulls.
Poor Priors
The authors noted that humans do pretty well using these
methods, but we inform our own priors (normal, power law, Erlang) by
experiencing life ... and in modern times our information is heavily skewed. We’re a more likely to hear about the guy
who was killed than about those who had a normal day. If we use our priors without understanding
our information bias we’re likely to make stupid predictions. It’s why people are afraid of flying but not
driving on the freeway: a plane crash is big news.
Remember the marshmallow challenge, where kids with the
willpower to wait 10 minutes doubled up, and lo, those showing such willpower did
better in life? The book offers an interesting
twist. When the experiment was preceded by an adult promising
cool art supplies to the kids in the waiting room -- then some kids got the supplies
and some a delay and lame excuse. The
stymied kids were more likely to eat the marshmallow, giving up after a short effort. It seems their priors suggested
these adults were not to be trusted … might as well eat this one sweet while you
can. The authors speculate that the differences between the successful kids and less successful ones in the classic study may
not be just a lack of willpower, but “it could be a result of believing that
adults are not dependable: that they can’t be trusted to keep their word.” Bad priors.
Overfitting
The discussion on overfitting was another lesson on how too
much weight on prior data can go wrong.
The tendency of course is to pile on the independent variables,
expecting a more predictive model if you collect more background information. But imbedded in that assumption is that each
of those things is actually a good predictor of what you’re really trying to
measure! For example, taste is
overfitted by evolution, to crave fat and sugar with no end. That used to help us but today … another bad
prior.
Occam’s Razor suggests the simplest hypothesis is probably
the best. If you want to simplify your
model, try adding a “complexity penalty” to knock out some superfluous
factors. Nature does it: “The burden of
metabolism … acts as a brake on the complexity of organisms, introducing a
caloric penalty for overly elaborate machinery.” (161) And in a similar way the slow pace of
evolution prevents organisms from overfitting their environment – that’s makes
them more resilient. Thanks again.
Optimization
There is a large section on optimization. This covers the problem of local maximums,
where the only way to improve in the long run requires a period of getting worse. A step down is required before larger steps up. You can get around this with randomized “jitters,”
progressively removing random influence: “simulated annealing,” and relaxation
of constraints at least early in the process.
Lagrangian Relaxation is clever – it simply moves constraints over to
the cost structure. You can’t steal cars to go to work! Lagrange would respond: “actually I can, but let's consider the costs.”
Handshakes
In a section on communication the authors point out how, by signaling
receipt, the receiver influences the message.
This might be by nods or facial expressions, or “oh,” “ha,” “hmm..” – in
other words: “message received!” Bad
listeners actually ruin the message because the sender doesn’t know how to
proceed -- what was heard, what was not, should she speed up, slow down, wrap it up, repeat something? This sorry of feedback is what computers do,
every packed or chunk is sent, acknowledged, and the acknowledgement is
acknowledged. Constant handshakes, and that way a missed message is simply resent.
Exponential Backoff
But what if there is a breakdown in communication – what can
be done about it. The answer is
brilliant: Exponential Backoff. Each
time there is a failure double the delay sending it again. The example was in dating or friendship: Oh,
so she didn’t return your text? Wait a
day, send another. No response? Wait 2, then 4 then 8 … Very soon you’re almost
out of touch, gracefully, but not completely. This is apparently how failed password attempts work, to increase security.
It’s like a squirrel, approaching you for that peanut. Additive Increase, Multipilicative Decrease.
Tail Drop
Here’s a nice one: Tail Drop. When we just had house phones people calling
when we were out had to call back. They
couldn’t leave a message. The early answering machines took messages (buffers)
but with limited capacity. I remember "Mailbox Full" But there's no Tail Drop with email, it's an infinite buffer. Everybody expects an answer.
Texts keep coming and the buffer never fills. “We used to reject . Now we defer.” (226) What have we done?
Auctions
The authors write about auctions “Sealed-Bid First-Price,”
“Dutch Auction (descending),” English Auction (ascending)” and the brilliant
“Vickrey Auction” which is a sealed bid but the winner plays the second highest price, not
the highest. Unlike the others, when you
crunch the numbers, “in the Vickrey auction, honest is literally the best
policy.” Ebay uses Vickrey, I’ve always
thought it was brilliant.
So, I’ve combed through the book for many of my own highlights for my own purposes, but I recommend you buy a copy, there's so much more. It’s superbly written, packed with insight,
and easy relate to.