When I’ve recommended a book to more than a dozen people and
bought a few copies as gifts, I like to distill my notes one more iteration. So I’ll jot
down here what I liked most about

scientist at Berkeley and Christian is a science
writer. The book rolls out a series of optimization
algorithms discovered by mathematicians and computer scientists but presented at the
human scale. These solutions help when your system crashed and they also help on your
bookshelves, in your communication with friends, when parking, and in your refrigerator. __Algorithms to Live By__by Brian Christian and Tom Griffiths. Griffiths is a psychologist and cognitive

**Marriage Problem – Look and Leap**

This is called the Secretary Problem but I think marriage is more interesting. If you’re looking for “the one,” exactly how long should you play the field? Assume you can’t go back and propose to an old girlfriend. First decide what age you’d like to marry, count the years to then and shop around for 37% of that time. During that period note the “best” candidate but don’t marry her but after that 37% time period marry anyone you come across who's better than that. This will maximize your chances. Of course it’s a bit more complicated than that… But let’s say you could determine “best” in a reasonable way, what is your chance of marring the “best” of them all this way? 37%.

Oh so now you say you

*can*go back to an earlier date and propose? And there’s a 50% chance she’ll accept? Then look for 61% of your time then leap, simple as that. Oh, marrying just for the money, and you can measure your date’s net worth easily? Then set a threshold at 95^{th}percentile, marry the first who’s worth more. But that threshold will fall as you exhaust your pool, the tables are in the book.
By the way, best chance of getting the
wealthiest this way: 58%. See how fun it is?

**Setting a Home Price**

Sell a house like this, if you know the high and low
expected bids and can calculate the cost of waiting. You can calculate your threshold price, which
you apply immediately and never change. If the range is $400k to $500k, it’s a slow
market and waiting costs $10k an offer, hold out for $455,279. The graph is in the book.

**Parking**

Parking a car and know the occupancy rate? With a 99% of spaces taken, you should start
looking ¼ mile from your destination. If
it’s 85% full you can drive within a half block.

**Getting Caught**
Oh you’re a burglar and if you get caught you lose
everything? You want to know how many
burglaries to do? Just take the chance
you get away and divide it by the chance you’ll get caught, burglarize that
many times, then quit.

**Explore vs Exploit**

How long should you shop around for new friends or
restaurants (explore) and when should you stick with your favorites
(exploit). That depends on your time
frame. Finding a new great restaurant
isn’t going to be worth as much if you’re about to move out of town. So explore early on, then exploit. There are several models you can use: “The Gittins index and the Upper Confidence
Bound ... inflate the appeal of lesser-known options beyond what we
actually expect, since pleasant surprises can pay off many times over.” I’ll leave the details for the authors to
explain.

**Adaptive Trials**

Adaptive trials are interesting, they allow you to gradually
phase out the less promising of two experimental clinical techniques. Imagine starting 50% a-50% b until the A’s
appear to do better, move to 55-45 ... and so on, there's a precise algorithm. You can drastically reduce the number exposed to an inferior treatment. But clinical trials are usually not done this way.

**Sort and Search**

The discussion of Sort-Search tradeoffs was great; it
introduced Big-O notation. Big-O is how
processing costs change with size (n). The example
was a dinner party. You have to clean
your house once regardless of n, so cleaning is “Big-O of 1”. Passing a dish around the table increases
linearly with every additional guest: “Big-O of n”. Each guest arriving hugs all the guests
already there … that’s quadratic time: “Big-O of n(squared).” Exponential time, “(Big-O)(2 raised to n)”
would happen if each guest doubles your work.
“Big-O (n!)” – factorial time is so much worse. That’s like randomly shuffling a deck of cards until they happen
to fall in order.

So how does Big-O help with sorting?

Bubble sort is Big-O of n (squared). Look at every adjacent pair of books in a
bookshelf and switch them if they’re out of order, then shift over and do it again.
Who would do this? A computer would, or
a slow person with bad eyesight. In
practice Insertion Sort, in which you remove all the books and place each on
the shelf correctly, is not much better than bubble sort although prior
knowledge might save a lot of time (M is in the middle, start looking there….).

Mergesort is the punchline.
Sort smaller batches then shuffle-sort those into bigger batches, repeat
and wala. But if you are not expecting
to search much, why bother, just use
Bucket Sort: put them into categories and quit.
Save a lot on the sort, pay a
little on the search. “Sorting something
that you will never search is a complete waste; searching something you never
sorted is merely inefficient.” (72)

**Ranking Athletes**

There’s a whole section on competition scoring, like for athletes or sports teams. The problem with Single
Elimination tournaments, where one loss knocks you out, is that while it finds
the best team, “second place is a lie." Round-Robin, where each team plays every other team – Big-O of
n(squared) … so many games required! And so many boring ones. Ladder Tournaments, where each player can challenge the next best is a bubble
sort. The most popular, Bracket
Tournament, divides the field in half at each stage. It’s merge sort. March Madness takes 64 teams to 32, then 16,
then 8, then the “final four” before the determining match. It’s Big-O(n log n). With 64 teams to start it reduces the number
of games needed from 2,016 with Round Robin to 192 games to find the best
team. But it doesn’t find the second
best.

**Pecking Order**

Chickens have a pecking order, for real. They use

*displacement*where one member just knows it’s not worth trying to compete with anyone except that one directly above or below. So place is relatively easy to establish. If crook-beak beats bug-eyes who just outpecked you, well you just don’t have to fight crook-beak – you know you’ll lose. That’s ordinal ranking. In comes a newcomer. He’ll have a rough go at first, finding his place, but then it’ll be tensely peaceful again.**Race vs Fight**

If it’s a race rather than a fight, it’s not ordinal that
matters, it’s cardinal scores and all so much easier. The skier places precisely by racing the
clock in a couple of runs, where the cage fighter has to take on one nasty
opponent after the other. The authors
explain, “Much as we bemoan the daily rat race, the fact that it’s a

*race*rather than a*fight*is a key part of what sets us apart from the monkeys, the chickens – and for that matter, the rats.” P 83**Caching**

There are many ways to purge memory, the crudest maybe is Random
Eviction. Maybe someone with dementia suffers this. Another method is First-in-First-Out,
the oldest things must go. Clairvoyance – using future information -- is best if you can get it, and there’s an formula for
that too: Belady’s Algorithm.

When it all shakes out there are times to use each of these
but generally speaking the

*last*thing we can expect to need is the thing we used the*longest*time ago. That’s Last Recently Used. So if you always put your books back on the left side of the shelf, if you return your file folders to the front of the drawer, if you hang your used shirts to one side of the rack … that’s not a bad idea. So those papers on top of your paper piles are probably the ones you will want to grab. Sweet! It's that filling drawer on its side. With this line of reasoning, throwing your clothes on the floor actually makes some sense. I'll tell my son. If you think about it, the brain works this way, pretty much. Those things in the more distant past, what we haven't thought of for a long time are the ones we are likely to forget --*and*are the ones we can most afford to lose. Thank you natural selection.**Sequence**

Then there is the order in which we should actually do
things we need to do. Getting Things
Done, the organizational system which I try to use, recommends doing the quick
things immediately. Others will tell you
to do the hardest things first, or the fun things first, or the oldest or most
recent things first.

Gantt charts help optimize order of operation. For example, when you have many loads of laundry that need washing and drying find the one with the shortest cycle. If it’s the washer do that load first, if it’s the drier do it last, repeat for all loads. You maximize the time both washer and drier are running, and minimize your time at the laundromat.

Gantt charts help optimize order of operation. For example, when you have many loads of laundry that need washing and drying find the one with the shortest cycle. If it’s the washer do that load first, if it’s the drier do it last, repeat for all loads. You maximize the time both washer and drier are running, and minimize your time at the laundromat.

If you have a lot of tasks with deadlines you can follow an
algorithm for “minimizing maximum lateness” by prioritizing those with the
Earliest Due Date. First things
first. But if you want to minimize the
sum of lateness, use Shortest Processing Time (always do the quickest task
first). If you can weight each task by
importance just multiply that weight by the time required. “Only prioritize a task that takes twice as
long if it’s twice as important.” (111)

Some tasks can be given an “allow priority,” that is, they aren't high priority in planning but kick in over another when needed. We do this intuitively with bathroom breaks; when
you need one that task trumps most others. If you don’t plan this allowance into a project design
it could spell trouble as the authors point out happened on Mars Pathfinder in
1997 when it thrashed just after landing.

**Thrashing**

Thrashing is when a system is running at full bore and
accomplishing nothing. In this case some
tasks should simply be interruptible.
Switching tasks (

*context switching*) comes with a cost. Sometimes you can reduce the cost of context switching by clustering or coalescing tasks – the author suggests scheduling a “bill paying day” when you get out your checkbook ... once. The GTD system helps with this by coalescing actionables into folders so you can tackle related projects together. The U.S. mail coalesces correspondence for us. Office hours coalesce interruptions. Answering machines do too. It’s interesting to look at our daily activities this way.

**Bayesian Probabilities**

The book goes over Bayes" Rule which combines probabilities
to overcome intuitive traps explaining things.
You basically use known probabilities for hypothetical pasts, figure the
chance they would deliver the known outcome, and work backward to find the most
probable cause. The example given was a random pull from a bag of coins containing 9
fair coins and 1 two headed coin. It
flips heads. How likely is it to be one
of the fair coins? Calculate the chance
of a fair coin being drawn and the chance of it flipping heads that’s 90% X
50% -- and compare it to the chance of
the trick coin drawn and flipping heads (10% X 100%) That’s 45 / 10 or 4.5
times more likely to be a fair coin.
Simple when you think about it, but hard to intuit.

Laplace took it further.
His law predicts that if you try a lottery only once and win, an
estimate of 2/3 for the portion of winning tickets is better than 100% or 50:50. It’s always the number of wins +1 divided by
the number of attempts +2. So if the bus was late 3 out of 12 times the chance
of it being late today is 4/14 or 28.6% . This one I didn’t get, but I suppose that if
68 of the other 70 insights in the book did make sense – that would be 69/72 or
96% chance this one should too? It’s on
page 131

**Longevity**

Here’s a real cool simple tool. If you want to predict how much longer something
you see will last and you don’t have anything to compare it to, find out how
long it’s been around and guess that. You
simply assume that the timing of your sample is random, so if it’s
normally distributed the best guess for current moment is smack in the
middle, top of the bell curve That’s the Copernican Principle,
good when you know nothing. So how long
will North Korea last? Let’s see…
2017-1948 is 69. So … 2086. Oh my.

If your phenomenon has a known distribution you can do
better with the Multiplicative Rule .
The example given was movie gross receipts. Most movies make little or nothing and some
are blockbusters. This is a power law
distribution. There is plenty of past
data and the calculated multiplier that fits this particular distribution is 1.4 so if
you hear a movie made $10,000 so far … best guess is a total of $14,000. A movie grossed $6m? Probably it will quit at 8.4.

Here’s the difference between normal distribution which call for the Average Rule, and power law distributionss that use Multiplicative Rule:
“Something normally distributed that’s gone on seemingly too long is bound to
end shortly; but the longer something in a power-law distribution has gone on
the

*longer*you can expect it to keep going.” (140) The third method is the Additive Rule, good for things that are “memoryless,” that is, you have priors but they don’t follow a regular pattern; these are things with a wing-shaped Erlang distribution. Then predict a constant amount of time. Like for a slot machine, when is it going to pay off next? After every pull, win or lose, the prediction is n more pulls.**Poor Priors**

The authors noted that humans do pretty well using these
methods, but we inform our own priors (normal, power law, Erlang) by
experiencing life ... and in modern times our information is heavily skewed. We’re a more likely to hear about the guy
who was killed than about those who had a normal day. If we use our priors without understanding
our information bias we’re likely to make stupid predictions. It’s why people are afraid of flying but not
driving on the freeway: a plane crash is big news.

Remember the marshmallow challenge, where kids with the
willpower to wait 10 minutes doubled up, and lo, those showing such willpower did
better in life? The book offers an interesting
twist. When the experiment was preceded by an adult promising
cool art supplies to the kids in the waiting room -- then some kids got the supplies
and some a delay and lame excuse. The
stymied kids were more likely to eat the marshmallow, giving up after a short effort. It seems their priors suggested
these adults were not to be trusted … might as well eat this one sweet while you
can. The authors speculate that the differences between the successful kids and less successful ones in the classic study may
not be just a lack of willpower, but “it could be a result of believing that
adults are not dependable: that they can’t be trusted to keep their word.” Bad priors.

**Overfitting**

The discussion on

*overfitting*was another lesson on how too much weight on prior data can go wrong. The tendency of course is to pile on the independent variables, expecting a more predictive model if you collect more background information. But imbedded in that assumption is that each of those things is actually a good predictor of what you’re really trying to measure! For example, taste is overfitted by evolution, to crave fat and sugar with no end. That used to help us but today … another bad prior.
Occam’s Razor suggests the simplest hypothesis is probably
the best. If you want to simplify your
model, try adding a “complexity penalty” to knock out some superfluous
factors. Nature does it: “The burden of
metabolism … acts as a brake on the complexity of organisms, introducing a
caloric penalty for overly elaborate machinery.” (161) And in a similar way the slow pace of
evolution prevents organisms from overfitting their environment – that’s makes
them more resilient. Thanks again.

**Optimization**

There is a large section on optimization. This covers the problem of local maximums,
where the only way to improve in the long run requires a period of getting worse. A step down is required before larger steps up. You can get around this with randomized “jitters,”
progressively removing random influence: “simulated annealing,” and relaxation
of constraints at least early in the process.
Lagrangian Relaxation is clever – it simply moves constraints over to
the cost structure. You can’t steal cars to go to work! Lagrange would respond: “actually I can, but let's consider the costs.”

**Handshakes**

In a section on communication the authors point out how, by signaling
receipt, the receiver influences the message.
This might be by nods or facial expressions, or “oh,” “ha,” “hmm..” – in
other words: “message received!” Bad
listeners actually ruin the message because the sender doesn’t know how to
proceed -- what was heard, what was not, should she speed up, slow down, wrap it up, repeat something? This sorry of feedback is what computers do,
every packed or chunk is sent, acknowledged, and the acknowledgement is
acknowledged. Constant handshakes, and that way a missed message is simply resent.

**Exponential Backoff**

But what if there is a breakdown in communication – what can
be done about it. The answer is
brilliant: Exponential Backoff. Each
time there is a failure double the delay sending it again. The example was in dating or friendship: Oh,
so she didn’t return your text? Wait a
day, send another. No response? Wait 2, then 4 then 8 … Very soon you’re almost
out of touch, gracefully, but not completely. This is apparently how failed password attempts work, to increase security.
It’s like a squirrel, approaching you for that peanut. Additive Increase, Multipilicative Decrease.

**Tail Drop**

Here’s a nice one: Tail Drop. When we just had house phones people calling
when we were out had to call back. They
couldn’t leave a message. The early answering machines took messages (buffers)
but with limited capacity. I remember "Mailbox Full" But there's no Tail Drop with email, it's an infinite buffer. Everybody expects an answer.
Texts keep coming and the buffer never fills. “We used to

*reject*. Now we*defer.”*(226) What have we done?**Auctions**

The authors write about auctions “Sealed-Bid First-Price,”
“Dutch Auction (descending),” English Auction (ascending)” and the brilliant
“Vickrey Auction” which is a sealed bid but the winner plays the second highest price, not
the highest. Unlike the others, when you
crunch the numbers, “in the Vickrey auction, honest is literally the best
policy.” Ebay uses Vickrey, I’ve always
thought it was brilliant.

So, I’ve combed through the book for many of my own highlights for my own purposes, but I recommend you buy a copy, there's so much more. It’s superbly written, packed with insight,
and easy relate to.