Sunday, October 28, 2018

Book Review: The Book of Why

The Book of Why: The New Science of Cause and Effect
Judea Pearl
Mathematics, computers, statistics

The Book of Why reminded me strongly of The Evolution of Beauty.
  • The author has a Theory, referred to with Capital Letters (here, the Causal Revolution; there, that Beauty Happens).
  • The Theory was actually discovered long ago, but it has been forgotten.
  • Mainstream investigators have sneered at, pooh-poohed, and generally neglected the Theory.
  • Fie on them.
  • They are enslaved to their own outdated notions, which lead them to more and more outlandish convolutions to explain things that the Theory explains quite simply.
  • The Theory is nothing less than revolutionary, and has profound implications.
  • The assertions of significance are maybe a little more extravagant than the book can establish.
There's even a shared villain of sorts, the statistician R. A. Fisher.

Among the differences is that The Book of Why is more technical. I won't say it's written for a specialist audience, but to get a lot out of it you should have at least a basic understanding of probability and statistics. (Bonus points for knowing what Bayes' Theorem is.) You'll need to do some simple probability math if you want to verify what Pearl is saying. 

Put it another way: if you're not familiar with the stock phrase "correlation is not causation," this isn't the book for you--because this is exactly what Pearl is arguing about. Specifically, The Book of Why argues for the power of making inferences based on causal diagrams, and demonstrates rigorous ways to manipulate them to draw powerful conclusions. 

This sounds hazy, so let's go with an example from the book: the low-birth-weight paradox. We've all learned that smoking is bad, and that it's especially bad for pregnant mothers. And yet: babies with low birth weight do better if their mothers were smokers. This isn't a fluke; the statistics establish correlation quite firmly. What gives?

What gives, says Pearl, is that we're settling for a correlational answer when we need a causal one. Specifically, we need to understand that low birth weight may have different causes. Smoking can cause it. But so can developmental defects, or malnutrition. What the paradox show is that babies whose birth weight is low because their mothers smoked do better than babies whose birth weight is low because of other, much more serious conditions. Which makes perfect sense.

Interesting stuff. On the other hand, classical statisticians have good ways to talk about this sort of effect without resorting to Pearl's "Causal Revolution." Maybe it's clearer and simpler with causal calculus--I'm inclined to believe that--but The Book of Why rather implies a stronger claim.

Moreover, Pearl seems to tack sideways around one of the standard arguments against causal thinking. When you create a causal model, you're making assumptions. All the graph-theoretical rigor in the world won't help you answer questions if your graph is wrong--if, say, you assume that plowing the prairie causes rain (don't laugh, people did this), you'll have an incorrect diagram. Sure, this is often trivial--snowy weather causes traffic accidents, not the other way around--but when it's that trivial, why do you need a causal model in the first place?

The power of standard statistics is to tease out correlations that you didn't expect. That's why statisticians, AI researchers, and machine-learning people love it: you dump in a bunch of data, push a button, and poof! you learn something. (In practice, I haven't found it to work that way, but the notion is seductive.) Causal modeling seems like an interesting and powerful approach to quantifying what it is that you've learned. As to whether it merits the designation of "revolution," though, I'm still agnostic.

No comments:

Post a Comment