Saturday, December 16, 2017

Buddhism and Evolutionary Psychology

Robert Wright’s latest book brings together two big ideas: 1) Evolutionary psychology and 2) Theravada Buddhism.  It’s pretty wonderful as I’m very familiar with both and yet the connection hadn’t occurred to me.

When I read Wright’s The Moral Animal a decade ago I was reassured that people can both accept evolution in its full force and not worry about falling wholesale into immorality.  There is plenty of evidence for cooperation and reciprocal altruism in nature – this is a theme I’ve explored in this blog years ago and one that he extended with his second book NonZero.  Now he’s written Why Buddhism Is True binding evolution, psychology, and the Thai-Forest Buddhist philosophy/practice. Buddhism, he summarizes “has been studying how the human mind is programmed to react to its environment… Now, with Darwin’s theory we understood what had done the programming.” (p 224)

There are many kinds of Buddhism (“Zen is for poets, Tibetan is for artists, and Vipassana is for psychologists” ) and some carry supernatural beliefs like reincarnation which Wright and I reject.

Vipassana meditation, from Theravada Buddhism, is sometimes called “mindfulness” meditation and has a great deal of support in the books I read.  I meditated regularly for most of a decade and intend to return; I’ve  never doubted  the value of sitting quietly and watching the mind.

First, a little background, the basics of evolutionary psychology.  Our brains were groomed over 3.5b years to compel us toward things that increase the probability of passing on genes and steer us away from threats.  Of course that was a very different environment than we have now.  Things that worked well enough were hardwired and things that didn’t were combed away.  There was no premium on accuracy so we are beset with delusions, illusions, and impulses, all nicely patched up with convenient rationalizations. 

This hardwiring becomes a problem when the environment changes faster than the genes do.  Simple example: once storing enough calories was a pretty big problem so craving fat and sugar was selected for.  When fat was scarce, that helped.  We still have the craving but now we put on dangerous weight.  Offense triggers aggression?  That worked for us once, now we get stupid road rage.  Impulses are not always the best guide.

And while we get pleasure from satisfying our impulses it doesn’t last very long.  Why should it?  There’s no traction in satisfaction -- we have to reset quickly to be ready for the next impulse.  Studies have shown that even after winning the lottery people soon go back to feeling a “normal” amount of happiness again.

Thoughts can inform emotions, that’s what cognitive behavioral psychology has shown us.  But emotions, Wright says, are the real drivers of action and the prefrontal cortex, our "thinking" and most recent part of the brain, then rationalizes the action.  This is not intuitiv. We feel as if “we” simply make decisions and act on them.  Brain scan studies show the brain signals movement before the person herself know she's going to move her arm.  On seeing a stranger the amygdala announces within 0.2 seconds “attractive=friend” or “stranger=danger” before our prefrontal cortex -- and our awareness -- registers “thing.”  In a world of snakes, cougars, and warring tribes this reactive module was useful so evolution locked it in.

Wright then pointed out the next logical thing, and I was surprised that it hadn’t occurred to me:  Emotions can be true or false. This goes against popular psychology but if the feeling of desire (groomed by evolution) says “eating this bag of chips is good for me” -- and it’s really not -- your desire is lying.

Now a little background in Buddhism, there are three truths to existence  1. Anicca, 2. Dukka, and 3. Anatta.  Nothing is permanent, there is suffering, and there is no self.  The last one probably requires a little explaining.

What are the fundamental qualities of self?  Buddha started with the proposition that to be “self” it must have some consistency through  time and it must be something that we control.  Both of these concepts are implicit in the sentence “I did that.”

The Buddha said there are five aspects to our existence.  1.  the physical body and its organs, 2. our basic feelings, 3. perceptions of things, likes vison and hearing, 4. mental formations like thoughts, habits, etc, and 5. consciousness which is the awareness of the other four.  That’s a real simple summary but the question becomes which of these is something that is persistent through time and under our control -- the answer of course is none of them.   Meditation helps you see that one by one.   

Hence, no self.

So why do we have a “sense of self?”  It’s good for the organism to look after its body, the discrete package that carries its genes.  In fact the body is probably the most important thing, you want to wrap it up with a protective sense of self.  If you don’t feel that there is something real special about what’s inside your skin you won’t have a reason to protect it.  Good thing, too, if we’re going to live, right?

Of course that's impossible for a human to easily accept, but to get a taste of no-self try this.  Meditate and focus on something that you'd normally consider part of you -- a thought, a desire, a pain ...  When it becomes the object of your attention it seems to become something other than you.

The brain encourages use to recognize a special "self" which we fiercely protect.  That makes evolutionary sense. Wright wonders whether self should be so discreetly bounded.  In meditation he noticed that a pain in his foot sent a signal to his brain – “inside = me.”  A bird chirping sends a signal to his brain – “outside = not me.”  But, he wondered, in what sense is the cramp different from the song?  They both are signals, both are processed and interpreted by the brain.  Both can affect our outlook.

The Modular Theory of the Mind is pretty compelling, I thought.  It goes something like this.  The mind contains competing modules, each of which stimulates the organism toward taking some sort of action.  The strongest module wins.  Plenty of studies show how we can easily fabricate reasons for doing something when we actually had no reason whatsoever.   

A quote shows a good example of how this can work: “A long-term module may generate a sense of guilt when you reach for that chocolate bar. It may also give you a feeling of pride when you resist the allure of chocolate.  On the other side of the contest is the chocolate lust generated by the short-term module.  But the short-term module may have subtler tactics as well.  Is it, perhaps, the module that dredged up the memory of that article about the long-term benefits of antioxidants?  It just thought the long-term module might find that article interesting?” (p 129)

This has a practical implication. Instead of trying to overcome a habit – say, smoking – by strengthening self-discipline, maybe try instead to weaken the module that has taken control.  How do you do that?  Simply by looking at it carefully, and that’s where mindful meditation comes in.  Looking closely at the desire can turn it into an object of interest rather than a transparent impulse.  Mindful observation, he says, keeps the module from getting its reward. 

Wright uses the example of a rat which pushes a lever to get a reward.  Keep the rat away from the lever and the rat will still associate the lever with reward; when you look away he'll push it again.  On the other hand if you disconnect the lever from the reward, the rat will lose interest in the lever.  In the mind, you need only to bring the lever into your awareness  to disassociate the two.  From my experience, this can work.

Wright also explores the “essence” of things.  These are the adjectives we assign to objects -- that house is a “modest” house, or that person is a “nice” person.  Everything meaningful in our awareness is an association we have assigned to it, he says, and again, this has evolved for practical reasons.   Once you categorize something you don’t have to continually reassess it.  Attaching essence to things is something we do all day.  It's a shortcut we aren't even aware of.  

We also have an “essence preservation mechanism.” If a friend does a bad thing we consider it unusual, just as when a “bad” person does a good thing.  Still good.  Still bad.  Wright suggests  that not seeing “essence” consists mainly of not feeling intensely toward things. “Dampening of feelings leads to clarity of vision,” he says (p 165)   “… not making judgments’ ultimately means not letting your feelings make judgments for you.” (223) 

The way to see this for yourself is through meditation, he says.  It’s the practice of “fighting your creator: natural selection.”

Saturday, April 29, 2017

Algorithms To Live By (Book Review)

When I’ve recommended a book to more than a dozen people and bought a few copies as gifts, I like to distill my notes one more iteration.  So I’ll jot down here what I liked most about Algorithms to Live By by Brian Christian and Tom Griffiths.  Griffiths is a psychologist and cognitive
scientist at Berkeley and Christian is a science writer.  The book rolls out a series of optimization algorithms discovered by mathematicians and computer scientists but presented at the human scale.  These solutions help when your system crashed and they also help on your bookshelves, in your communication with friends, when parking, and in your refrigerator. 

Marriage Problem – Look and Leap
This is called the Secretary Problem but I think marriage is more interesting.  If you’re looking for “the one,” exactly how long should you play the field?  Assume you can’t go back and propose to an old girlfriend.  First decide what age you’d like to marry, count the years to then and shop around for 37% of that time.   During that period note the “best” candidate but don’t marry her but after that 37% time period marry anyone you come across who's better than that.  This will maximize your chances.  Of course it’s a bit more complicated than that…   But let’s say you could determine “best” in a reasonable way, what is your chance of marring the “best” of them all this way?  37%.

Oh so now you say you can go back to an earlier date and propose?  And there’s a 50% chance she’ll accept?  Then look for 61% of your time then leap, simple as that.  Oh, marrying just for the money, and you can measure your date’s net worth easily?  Then set a threshold at 95th percentile, marry the first who’s worth more.  But that threshold will fall as you exhaust your pool, the tables are in the book.   
By the way, best chance of getting the wealthiest this way: 58%.  See how fun it is?

Setting a Home Price
Sell a house like this, if you know the high and low expected bids and can calculate the cost of waiting.  You can calculate your threshold price, which you apply immediately  and never change.  If the range is $400k to $500k, it’s a slow market and waiting costs $10k an offer, hold out for $455,279.  The graph is in the book.

Parking a car and know the occupancy rate?  With a 99% of spaces taken, you should start looking ¼ mile from your destination.  If it’s 85% full you can drive within a half block.

Getting Caught
Oh you’re a burglar and if you get caught you lose everything?  You want to know how many burglaries to do?  Just take the chance you get away and divide it by the chance you’ll get caught, burglarize that many times, then quit.

Explore vs Exploit
How long should you shop around for new friends or restaurants (explore) and when should you stick with your favorites (exploit).  That depends on your time frame.  Finding a new great restaurant isn’t going to be worth as much if you’re about to move out of town.  So explore early on, then exploit.  There are several models you can use:  “The Gittins index and the Upper Confidence Bound ... inflate the appeal of lesser-known options beyond what we actually expect, since pleasant surprises can pay off many times over.”  I’ll leave the details for the authors to explain.

Adaptive Trials
Adaptive trials are interesting, they allow you to gradually phase out the less promising of two experimental clinical techniques.  Imagine starting 50% a-50% b until the A’s appear to do better, move to 55-45 ... and so on, there's a precise algorithm. You can drastically reduce the number exposed to an inferior treatment. But clinical trials are usually not done this way.

Sort and Search
The discussion of Sort-Search tradeoffs was great; it introduced Big-O notation.  Big-O is how processing costs change with size (n).  The example was a dinner party.  You have to clean your house once regardless of n, so cleaning is “Big-O of 1”.  Passing a dish around the table increases linearly with every additional guest: “Big-O of n”.  Each guest arriving hugs all the guests already there … that’s quadratic time: “Big-O of n(squared).”  Exponential time, “(Big-O)(2 raised to n)” would happen if each guest doubles your work.  “Big-O (n!)” – factorial time is so much worse.  That’s like randomly  shuffling a deck of cards until they happen to fall in order. 

So how does Big-O help with sorting?

Bubble sort is Big-O of n (squared).  Look at every adjacent pair of books in a bookshelf and switch them if they’re out of order, then shift over and do it again. Who would do this?  A computer would, or a slow person with bad eyesight.  In practice Insertion Sort, in which you remove all the books and place each on the shelf correctly, is not much better than bubble sort although prior knowledge might save a lot of time (M is in the middle, start looking there….).

Mergesort is the punchline.  Sort smaller batches then shuffle-sort those into bigger batches, repeat and wala.  But if you are not expecting to search much, why  bother, just use Bucket Sort: put them into categories and quit.  Save a lot on the sort,  pay a little on the search.  “Sorting something that you will never search is a complete waste; searching something you never sorted is merely inefficient.” (72)

Ranking Athletes
There’s a whole section on competition scoring, like for athletes or sports teams.  The problem with Single Elimination tournaments, where one loss knocks you out, is that while it finds the best team, “second place is a lie." Round-Robin, where each team plays every other team – Big-O of n(squared) … so many games required!  And so many boring ones. Ladder Tournaments, where each player can challenge the next best is a bubble sort.  The most popular, Bracket Tournament, divides the field in half at each stage.  It’s merge sort.  March Madness takes 64 teams to 32, then 16, then 8, then the “final four” before the determining match.  It’s Big-O(n log n).  With 64 teams to start it reduces the number of games needed from 2,016 with Round Robin to 192 games to find the best team.  But it doesn’t find the second best.

Pecking Order
Chickens have a pecking order, for real.  They use displacement  where one member just knows it’s not worth trying to compete with anyone except that one directly above or below.  So place is relatively easy to establish. If crook-beak beats bug-eyes who just outpecked you, well you just don’t have to fight crook-beak – you know you’ll lose.  That’s ordinal ranking.  In comes a newcomer.  He’ll have a rough go at first, finding his place, but then it’ll be tensely peaceful again.

Race vs Fight
If it’s a race rather than a fight, it’s not ordinal that matters, it’s cardinal scores and all so much easier.  The skier places precisely by racing the clock in a couple of runs, where the cage fighter has to take on one nasty opponent after the other.  The authors explain, “Much as we bemoan the daily rat race, the fact that it’s a race  rather than a fight  is a key part of what sets us apart from the monkeys, the chickens – and for that matter, the rats.” P 83

There are many ways to purge memory, the crudest maybe is Random Eviction.  Maybe someone with dementia suffers this. Another method is First-in-First-Out, the oldest things must go.  Clairvoyance – using future information -- is best if you can get it, and there’s an formula for that too: Belady’s Algorithm.

When it all shakes out there are times to use each of these but generally speaking the last thing we can expect to need is the thing we used the longest time ago.  That’s Last Recently Used.  So if you always put your books back on the left side of the shelf, if you return your file folders to the front of the drawer, if you hang your used shirts to one side of the rack … that’s not a bad idea.  So those papers on top of your paper piles are probably the ones you will want to grab.  Sweet!  It's that filling drawer on its side.  With this line of reasoning, throwing your clothes on the floor actually makes some sense.  I'll tell my son. If you think about it, the brain works this way, pretty much.  Those things in the more distant past, what we haven't thought of for a long time are the ones we are likely to forget -- and are the ones we can most afford to lose.  Thank you natural selection.

Then there is the order in which we should actually do things we need to do.  Getting Things Done, the organizational system which I try to use, recommends doing the quick things immediately.  Others will tell you to do the hardest things first, or the fun things first, or the oldest or most recent things first.

Gantt charts help optimize order of operation.  For example, when you have many loads of laundry that need washing and drying find the one with the shortest cycle. If it’s the washer do that load first, if it’s the drier do it last, repeat for all loads.  You maximize the time both washer and drier are running, and minimize your time at the laundromat.

If you have a lot of tasks with deadlines you can follow an algorithm for “minimizing maximum lateness” by prioritizing those with the Earliest Due Date.  First things first.  But if you want to minimize the sum of lateness, use Shortest Processing Time (always do the quickest task first).  If you can weight each task by importance just multiply that weight by the time required.  “Only prioritize a task that takes twice as long if it’s twice as important.” (111)

Some tasks can be given an “allow priority,” that is, they aren't high priority in planning but kick in over another when needed. We do this intuitively with bathroom breaks; when you need one that task trumps most others.  If you don’t plan this allowance into a project design it could spell trouble as the authors point out happened on Mars Pathfinder in 1997 when it thrashed just after landing.

Thrashing is when a system is running at full bore and accomplishing nothing.  In this case some tasks should simply be interruptible.  Switching tasks (context switching) comes with a cost.  Sometimes you can reduce the cost of context switching by clustering or coalescing tasks – the author suggests scheduling a “bill paying day” when you get out your checkbook ... once.  The GTD system helps with this by coalescing actionables into folders so you can tackle related projects together.  The U.S. mail coalesces correspondence for us.  Office hours coalesce interruptions.  Answering machines do too.  It’s interesting to look at our daily activities this way.

Bayesian Probabilities
The book goes over Bayes" Rule which combines probabilities to overcome intuitive traps explaining things.  You basically use known probabilities for hypothetical pasts, figure the chance they would deliver the known outcome, and work backward to find the most probable cause. The example given was a random pull from a bag of coins containing 9 fair coins and 1 two headed coin.  It flips heads.  How likely is it to be one of the fair coins?  Calculate the chance of a fair coin being drawn and the chance of it flipping heads that’s 90% X 50%  -- and compare it to the chance of the trick coin drawn and flipping heads (10% X 100%) That’s 45 / 10 or 4.5 times more likely to be a fair coin.  Simple when you think about it, but hard to intuit.
Laplace took it further.  His law predicts that if you try a lottery only once and win, an estimate of 2/3 for the portion of winning tickets is better than 100% or 50:50.  It’s always the number of wins +1 divided by the number of attempts +2. So if the bus was late 3 out of 12 times the chance of it being late today is 4/14 or 28.6%  .  This one I didn’t get, but I suppose that if 68 of the other 70 insights in the book did make sense – that would be 69/72 or 96% chance this one should too?  It’s on page 131

Here’s a real cool simple tool.  If you want to predict how much longer something you see will last and you don’t have anything to compare it to, find out how long it’s been around and guess that.  You simply assume that the timing of your sample is random, so if it’s normally distributed the best guess for current moment is smack in the middle, top of the bell curve  That’s the Copernican Principle, good when you know nothing.  So how long will North Korea last?  Let’s see… 2017-1948 is 69.  So … 2086.  Oh my.

If your phenomenon has a known distribution you can do better with the Multiplicative Rule .  The example given was movie gross receipts.  Most movies make little or nothing and some are blockbusters.  This is a power law distribution.  There is plenty of past data and the calculated multiplier that fits this particular distribution is 1.4 so if you hear a movie made $10,000 so far … best guess is a total of $14,000.  A movie grossed $6m?  Probably it will quit at 8.4.

Here’s the difference between normal distribution which call for the Average Rule, and power law distributionss that use Multiplicative Rule: “Something normally distributed that’s gone on seemingly too long is bound to end shortly; but the longer something in a power-law distribution has gone on the longer  you  can expect it to keep going.” (140)  The third method is the Additive Rule, good for things that are “memoryless,” that is, you have priors but they don’t follow a regular pattern; these are things with a wing-shaped Erlang distribution.  Then predict a constant amount of time.  Like for a slot machine, when is it going to pay off next?  After every pull, win or lose, the prediction is n more pulls.

Poor Priors
The authors noted that humans do pretty well using these methods, but we inform our own priors (normal, power law, Erlang) by experiencing life ... and in modern times our information is heavily skewed.  We’re a more likely to hear about the guy who was killed than about those who had a normal day.  If we use our priors without understanding our information bias we’re likely to make stupid predictions.  It’s why people are afraid of flying but not driving on the freeway: a plane crash is big news.

Remember the marshmallow challenge, where kids with the willpower to wait 10 minutes doubled up, and lo, those showing such willpower did better in life?  The book offers an interesting twist. When the experiment was preceded by an adult promising cool art supplies to the kids in the waiting room -- then some kids got the supplies and some a delay and lame excuse. The stymied kids were more likely to eat the marshmallow, giving up after a short effort. It seems their priors suggested these adults were not to be trusted … might as well eat this one sweet while you can.  The authors speculate that the differences between the successful kids and less successful ones in the classic study may not be just a lack of willpower, but “it could be a result of believing that adults are not dependable: that they can’t be trusted to keep their word.”  Bad priors.

The discussion on overfitting was another lesson on how too much weight on prior data can go wrong.  The tendency of course is to pile on the independent variables, expecting a more predictive model if you collect more background information.  But imbedded in that assumption is that each of those things is actually a good predictor of what you’re really trying to measure!  For example, taste is overfitted by evolution, to crave fat and sugar with no end.  That used to help us but today … another bad prior.

Occam’s Razor suggests the simplest hypothesis is probably the best.  If you want to simplify your model, try adding a “complexity penalty” to knock out some superfluous factors.  Nature does it: “The burden of metabolism … acts as a brake on the complexity of organisms, introducing a caloric penalty for overly elaborate machinery.” (161)  And in a similar way the slow pace of evolution prevents organisms from overfitting their environment – that’s makes them more resilient. Thanks again.

There is a large section on optimization.  This covers the problem of local maximums, where the only way to improve in the long run requires a period of getting worse. A step down is required before larger steps up.  You can get around this with randomized “jitters,” progressively removing random influence: “simulated annealing,” and relaxation of constraints at least early in the process.  Lagrangian Relaxation is clever – it simply moves constraints over to the cost structure. You can’t steal cars to go to work!  Lagrange would respond: “actually I can, but let's consider the costs.”

In a section on communication the authors point out how, by signaling receipt, the receiver influences the message.  This might be by nods or facial expressions, or “oh,” “ha,” “hmm..” – in other words: “message received!”  Bad listeners actually ruin the message because the sender doesn’t know how to proceed -- what was heard, what was not, should she speed up, slow down, wrap it up, repeat something?  This sorry of feedback is what computers do, every packed or chunk is sent, acknowledged, and the acknowledgement is acknowledged. Constant handshakes, and that way a missed message is simply  resent.

Exponential Backoff
But what if there is a breakdown in communication – what can be done about it.  The answer is brilliant: Exponential Backoff.  Each time there is a failure double the delay sending it again.  The example was in dating or friendship: Oh, so she didn’t return your text?  Wait a day, send another.  No response?  Wait 2, then 4 then 8 … Very soon you’re almost out of touch, gracefully, but not completely. This is apparently how failed password attempts work, to increase security.  It’s like a squirrel, approaching you for that peanut.  Additive Increase, Multipilicative Decrease.

Tail Drop
Here’s a nice one: Tail Drop.  When we just had house phones people calling when we were out had to call back.  They couldn’t leave a message. The early answering machines took messages (buffers) but with limited capacity.  I remember "Mailbox Full" But there's no Tail Drop with email, it's an infinite buffer. Everybody expects an answer.  Texts keep coming and the buffer never fills.  “We used to reject . Now we defer.” (226)  What have we done?

The authors write about auctions “Sealed-Bid First-Price,” “Dutch Auction (descending),” English Auction (ascending)” and the brilliant “Vickrey Auction” which is a sealed bid but the winner plays the second highest price, not the highest.  Unlike the others, when you crunch the numbers, “in the Vickrey auction, honest is literally the best policy.”  Ebay uses Vickrey, I’ve always thought it was brilliant.

So, I’ve combed through the book for many of my own highlights for my own purposes, but I recommend you buy a copy, there's so much more. It’s superbly written, packed with insight, and easy relate to.