Saturday, April 29, 2017

Algorithms To Live By (Book Review)

When I’ve recommended a book to more than a dozen people and bought a few copies as gifts, I like to distill my notes one more iteration.  So I’ll jot down here what I liked most about Algorithms to Live By by Brian Christian and Tom Griffiths.  Griffiths is a psychologist and cognitive
scientist at Berkeley and Christian is a science writer.  The book rolls out a series of optimization algorithms discovered by mathematicians and computer scientists but presented at the human scale.  These things were discovered when a system crashed, for example, and they also happen on your bookshelves, in your communication with friends, and in your refrigerator. 

Marriage Problem – Look and Leap
This is called the Secretary Problem but I think marriage is more interesting.  If you’re looking for “the one,” exactly how long should you play the field?  Assume you can’t go back and propose to an old girlfriend.  First decide what age you’d like to marry, count the years to then and shop around for 37% of that time.   During that period note the “best” candidate but don’t marry her but after that 37% time period marry anyone you come across who's better than that.  This will maximize your chances.  Of course it’s a bit more complicated than that…   But let’s say you could determine “best” in a reasonable way, what is your chance of marring the “best” of them all this way?  37%.

Oh so now you say you can go back to an earlier date and propose?  And there’s a 50% chance she’ll accept?  Then look for 61% of your time then leap, simple as that.  Oh, marrying just for the money, and you can measure your date’s net worth easily?  Then set a threshold at 95th percentile, marry the first who’s worth more.  But that threshold will fall as you exhaust your pool, the tables are in the book.   
By the way, best chance of getting the wealthiest this way: 58%.  See how fun it is?

Setting a Home Price
Sell a house like this, if you know the high and low expected bids and can calculate the cost of waiting.  You can calculate your threshold price, which you apply immediately  and never change.  If the range is $400k to $500k, it’s a slow market and waiting costs $10k an offer, hold out for $455,279.  The graph is in the book.

Parking
Parking a car and know the occupancy rate?  With a 99% of spaces taken, you should start looking ¼ mile from your destination.  If it’s 85% full you can drive within a half block.

Getting Caught
Oh you’re a burglar and if you get caught you lose everything?  You want to know how many burglaries to do?  Just take the chance you get away and divide it by the chance you’ll get caught, burglarize that many times, then quit.

Explore vs Exploit
How long should you shop around for new friends or restaurants (explore) and when should you stick with your favorites (exploit).  That depends on your time frame.  Finding a new great restaurant isn’t going to be worth as much if you’re about to move out of town.  So explore early on, then exploit.  There are several models you can use:  “The Gittins index and the Upper Confidence Bound ... inflate the appeal of lesser-known options beyond what we actually expect, since pleasant surprises can pay off many times over.”  I’ll leave the details for the authors to explain.

Adaptive Trials
Adaptive trials are interesting, they allow you to gradually phase out the less promising of two experimental clinical techniques.  Imagine starting 50% a-50% b until the A’s appear to do better, move to 55-45 ... and so on, there's a precise algorithm. You can drastically reduce the number exposed to an inferior treatment. But clinical trials are usually not done this way.

Sort and Search
The discussion of Sort-Search tradeoffs was great; it introduced Big-O notation.  Big-O is how processing costs change with size (n).  The example was a dinner party.  You have to clean your house once regardless of n, so cleaning is “Big-O of 1”.  Passing a dish around the table increases linearly with every additional guest: “Big-O of n”.  Each guest arriving hugs all the guests already there … that’s quadratic time: “Big-O of n(squared).”  Exponential time, “(Big-O)(2 raised to n)” would happen if each guest doubles your work.  “Big-O (n!)” – factorial time is so much worse.  That’s like randomly  shuffling a deck of cards until they happen to fall in order. 

So how does Big-O help with sorting?

Bubble sort is Big-O of n (squared).  Look at every adjacent pair of books in a bookshelf and switch them if they’re out of order, then shift over and do it again. Who would do this?  A computer would, or a slow person with bad eyesight.  In practice Insertion Sort, in which you remove all the books and place each on the shelf correctly, is not much better than bubble sort although prior knowledge might save a lot of time (M is in the middle, start looking there….).

Mergesort is the punchline.  Sort smaller batches then shuffle-sort those into bigger batches, repeat and wala.  But if you are not expecting to search much, why  bother, just use Bucket Sort: put them into categories and quit.  Save a lot on the sort,  pay a little on the search.  “Sorting something that you will never search is a complete waste; searching something you never sorted is merely inefficient.” (72)


Ranking Athletes
There’s a whole section on competition scoring, like for athletes or sports teams.  The problem with Single Elimination tournaments, where one loss knocks you out, is that while it finds the best team, “second place is a lie." Round-Robin, where each team plays every other team – Big-O of n(squared) … so many games required!  And so many boring ones. Ladder Tournaments, where each player can challenge the next best is a bubble sort.  The most popular, Bracket Tournament, divides the field in half at each stage.  It’s merge sort.  March Madness takes 64 teams to 32, then 16, then 8, then the “final four” before the determining match.  It’s Big-O(n log n).  With 64 teams to start it reduces the number of games needed from 2,016 with Round Robin to 192 games to find the best team.  But it doesn’t find the second best.

Pecking Order
Chickens have a pecking order, for real.  They use displacement  where one member just knows it’s not worth trying to compete with anyone except that one directly above or below.  So place is relatively easy to establish. If crook-beak beats bug-eyes who just outpecked you, well you just don’t have to fight crook-beak – you know you’ll lose.  That’s ordinal ranking.  In comes a newcomer.  He’ll have a rough go at first, finding his place, but then it’ll be tensely peaceful again.

Race vs Fight
If it’s a race rather than a fight, it’s not ordinal that matters, it’s cardinal scores and all so much easier.  The skier places precisely by racing the clock in a couple of runs, where the cage fighter has to take on one nasty opponent after the other.  The authors explain, “Much as we bemoan the daily rat race, the fact that it’s a race  rather than a fight  is a key part of what sets us apart from the monkeys, the chickens – and for that matter, the rats.” P 83

Caching 
There are many ways to purge memory, the crudest maybe is Random Eviction.  Maybe someone with dementia suffers this. Another method is First-in-First-Out, the oldest things must go.  Clairvoyance – using future information -- is best if you can get it, and there’s an formula for that too: Belady’s Algorithm.

When it all shakes out there are times to use each of these but generally speaking the last thing we can expect to need is the thing we used the longest time ago.  That’s Last Recently Used.  So if you always put your books back on the left side of the shelf, if you return your file folders to the front of the drawer, if you hang your used shirts to one side of the rack … that’s not a bad idea.  So those papers on top of your paper piles are probably the ones you will want to grab.  Sweet!  It's that filling drawer on its side.  With this line of reasoning, throwing your clothes on the floor actually makes some sense.  I'll tell my son. If you think about it, the brain works this way, pretty much.  Those things in the more distant past, what we haven't thought of for a long time are the ones we are likely to forget -- and are the ones we can most afford to lose.  Thank you natural selection.

Sequence
Then there is the order in which we should actually do things we need to do.  Getting Things Done, the organizational system which I try to use, recommends doing the quick things immediately.  Others will tell you to do the hardest things first, or the fun things first, or the oldest or most recent things first.

Gantt charts help optimize order of operation.  For example, when you have many loads of laundry that need washing and drying find the one with the shortest cycle. If it’s the washer do that load first, if it’s the drier do it last, repeat for all loads.  You maximize the time both washer and drier are running, and minimize your time at the laundromat.

If you have a lot of tasks with deadlines you can follow an algorithm for “minimizing maximum lateness” by prioritizing those with the Earliest Due Date.  First things first.  But if you want to minimize the sum of lateness, use Shortest Processing Time (always do the quickest task first).  If you can weight each task by importance just multiply that weight by the time required.  “Only prioritize a task that takes twice as long if it’s twice as important.” (111)

Some tasks can be given an “allow priority,” that is, they aren't high priority in planning but kick in over another when needed. We do this intuitively with bathroom breaks; when you need one that task trumps most others.  If you don’t plan this allowance into a project design it could spell trouble as the authors point out happened on Mars Pathfinder in 1997 when it thrashed just after landing.

Thrashing
Thrashing is when a system is running at full bore and accomplishing nothing.  In this case some tasks should simply be interruptible.  Switching tasks (context switching) comes with a cost.  Sometimes you can reduce the cost of context switching by clustering or coalescing tasks – the author suggests scheduling a “bill paying day” when you get out your checkbook ... once.  The GTD system helps with this by coalescing actionables into folders so you can tackle related projects together.  The U.S. mail coalesces correspondence for us.  Office hours coalesce interruptions.  Answering machines do too.  It’s interesting to look at our daily activities this way.

Bayesian Probabilities
The book goes over Bayes" Rule which combines probabilities to overcome intuitive traps explaining things.  You basically use known probabilities for hypothetical pasts, figure the chance they would deliver the known outcome, and work backward to find the most probable cause. The example given was a random pull from a bag of coins containing 9 fair coins and 1 two headed coin.  It flips heads.  How likely is it to be one of the fair coins?  Calculate the chance of a fair coin being drawn and the chance of it flipping heads that’s 90% X 50%  -- and compare it to the chance of the trick coin drawn and flipping heads (10% X 100%) That’s 45 / 10 or 4.5 times more likely to be a fair coin.  Simple when you think about it, but hard to intuit.
 
Laplace took it further.  His law predicts that if you try a lottery only once and win, an estimate of 2/3 for the portion of winning tickets is better than 100% or 50:50.  It’s always the number of wins +1 divided by the number of attempts +2. So if the bus was late 3 out of 12 times the chance of it being late today is 4/14 or 28.6%  .  This one I didn’t get, but I suppose that if 68 of the other 70 insights in the book did make sense – that would be 69/72 or 96% chance this one should too?  It’s on page 131

Longevity
Here’s a real cool simple tool.  If you want to predict how much longer something you see will last and you don’t have anything to compare it to, find out how long it’s been around and guess that.  You simply assume that the timing of your sample is random, so if it’s normally distributed the best guess for current moment is smack in the middle, top of the bell curve  That’s the Copernican Principle, good when you know nothing.  So how long will North Korea last?  Let’s see… 2017-1948 is 69.  So … 2086.  Oh my.

If your phenomenon has a known distribution you can do better with the Multiplicative Rule .  The example given was movie gross receipts.  Most movies make little or nothing and some are blockbusters.  This is a power law distribution.  There is plenty of past data and the calculated multiplier that fits this particular distribution is 1.4 so if you hear a movie made $10,000 so far … best guess is a total of $14,000.  A movie grossed $6m?  Probably it will quit at 8.4.

Here’s the difference between normal distribution which call for the Average Rule, and power law distributionss that use Multiplicative Rule: “Something normally distributed that’s gone on seemingly too long is bound to end shortly; but the longer something in a power-law distribution has gone on the longer  you  can expect it to keep going.” (140)  The third method is the Additive Rule, good for things that are “memoryless,” that is, you have priors but they don’t follow a regular pattern; these are things with a wing-shaped Erlang distribution.  Then predict a constant amount of time.  Like for a slot machine, when is it going to pay off next?  After every pull, win or lose, the prediction is n more pulls.

Poor Priors
The authors noted that humans do pretty well using these methods, but we inform our own priors (normal, power law, Erlang) by experiencing life ... and in modern times our information is heavily skewed.  We’re a more likely to hear about the guy who was killed than about those who had a normal day.  If we use our priors without understanding our information bias we’re likely to make stupid predictions.  It’s why people are afraid of flying but not driving on the freeway: a plane crash is big news.

Remember the marshmallow challenge, where kids with the willpower to wait 10 minutes doubled up, and lo, those showing such willpower did better in life?  The book offers an interesting twist. When the experiment was preceded by an adult promising cool art supplies to the kids in the waiting room -- then some kids got the supplies and some a delay and lame excuse. The stymied kids were more likely to eat the marshmallow, giving up after a short effort. It seems their priors suggested these adults were not to be trusted … might as well eat this one sweet while you can.  The authors speculate that the differences between the successful kids and less successful ones in the classic study may not be just a lack of willpower, but “it could be a result of believing that adults are not dependable: that they can’t be trusted to keep their word.”  Bad priors.

Overfitting
The discussion on overfitting was another lesson on how too much weight on prior data can go wrong.  The tendency of course is to pile on the independent variables, expecting a more predictive model if you collect more background information.  But imbedded in that assumption is that each of those things is actually a good predictor of what you’re really trying to measure!  For example, taste is overfitted by evolution, to crave fat and sugar with no end.  That used to help us but today … another bad prior.

Occam’s Razor suggests the simplest hypothesis is probably the best.  If you want to simplify your model, try adding a “complexity penalty” to knock out some superfluous factors.  Nature does it: “The burden of metabolism … acts as a brake on the complexity of organisms, introducing a caloric penalty for overly elaborate machinery.” (161)  And in a similar way the slow pace of evolution prevents organisms from overfitting their environment – that’s makes them more resilient. Thanks again.

Optimization
There is a large section on optimization.  This covers the problem of local maximums, where the only way to improve in the long run requires a period of getting worse. A step down is required before larger steps up.  You can get around this with randomized “jitters,” progressively removing random influence: “simulated annealing,” and relaxation of constraints at least early in the process.  Lagrangian Relaxation is clever – it simply moves constraints over to the cost structure. You can’t steal cars to go to work!  Lagrange would respond: “actually I can, but let's consider the costs.”

Handshakes
In a section on communication the authors point out how, by signaling receipt, the receiver influences the message.  This might be by nods or facial expressions, or “oh,” “ha,” “hmm..” – in other words: “message received!”  Bad listeners actually ruin the message because the sender doesn’t know how to proceed -- what was heard, what was not, should she speed up, slow down, wrap it up, repeat something?  This sorry of feedback is what computers do, every packed or chunk is sent, acknowledged, and the acknowledgement is acknowledged. Constant handshakes, and that way a missed message is simply  resent.

Exponential Backoff
But what if there is a breakdown in communication – what can be done about it.  The answer is brilliant: Exponential Backoff.  Each time there is a failure double the delay sending it again.  The example was in dating or friendship: Oh, so she didn’t return your text?  Wait a day, send another.  No response?  Wait 2, then 4 then 8 … Very soon you’re almost out of touch, gracefully, but not completely. This is apparently how failed password attempts work, to increase security.  It’s like a squirrel, approaching you for that peanut.  Additive Increase, Multipilicative Decrease.

Tail Drop
Here’s a nice one: Tail Drop.  When we just had house phones people calling when we were out had to call back.  They couldn’t leave a message. The early answering machines took messages (buffers) but with limited capacity.  I remember "Mailbox Full" But there's no Tail Drop with email, it's an infinite buffer. Everybody expects an answer.  Texts keep coming and the buffer never fills.  “We used to reject . Now we defer.” (226)  What have we done?

Auctions
The authors write about auctions “Sealed-Bid First-Price,” “Dutch Auction (descending),” English Auction (ascending)” and the brilliant “Vickrey Auction” which is a sealed bid but the winner plays the second highest price, not the highest.  Unlike the others, when you crunch the numbers, “in the Vickrey auction, honest is literally the best policy.”  Ebay uses Vickrey, I’ve always thought it was brilliant.



So, I’ve combed through the book for many of my own highlights for my own purposes, but I recommend you buy a copy, there's so much more. It’s superbly written, packed with insight, and easy relate to.  

No comments:

Post a Comment

I've allowed comments without login.