In Praise of Observational Evidence1
As always I devoured my Asterisk magazine with glee. High recommendations for Are Prediction Markets Good for Anything (basically no), We’re All One Crisis Away From Taking Unlicensed Research Peptides (cause illegible symptoms aren’t well covered by the medical establishment) and Selling Abstraction (finance is necromancy).
But the article that I enjoyed tussling with the most was In Praise of Observational Evidence. It goes through the history of the randomized control trial (RCT) and the usual flaws: RCTs often have low sample sizes (exercise), can be unethical (smoking), take way too long (longevity research) or have limited external validity (diet studies). But RCTs are the gold standard: if you want your study published it really helps to have a RCT. But what if there was another way?
Causal inference is having its (well-deserved) moment. Now we can take data off-the-shelf and use it to answer our question of interest. This is super cheap for the researcher and (lurking in the background) seems automatable by AI. Let’s deemphasize the RCT and invest heavily in observational data because it’s cheaper and gets the job done.
But let’s consider how this played out with the blue zones debacle. Using administrative records researchers were able to find regions of the world with unusually high proportions of centenarians. So let’s see what they do and replicate it for ourselves. The advice seems pretty benign: social connection, daily movement, and reasonable diet. Though it was missing an important causal factor: pension fraud. Yes: quite a high proportion of these centenarians were, um, dead but still collecting their checks.
You could call this confounding but it’s worse than the usual problem. It’s not like you can easily measure the fraud rate. It requires a lot of very careful investigation and rigorous data collection. No amount of modelling can save you here.
There’s also the obvious problem that nobody might collect the data you need: if it turns out that eating pomegranates adds years to your life but nobody asked about it on the questionnaire you’re out of luck. You get a “drunk looking under the lamppost” effect where researchers mostly focus on (and certainly only control for) the covariates which already exist.
So my worry with observational studies is that by selling them as cost effective and using whatever administrative data is lying around we’ll be worse off. Investments in data collection are important (especially in the face of a data hostile US administration) and often need to be targeted to the analysis of interest.
All that said, observational studies are super important! I in no way would want to diminish their promise. Somewhat selfishly I don’t want to wait around for longevity RCTs2. But I’ll hold out for better data than currently on offer. Finke leaves us with an inspiring quote
Sometimes nature is kinder than we assume. We need not always coerce it to make it answer our questions. Instead, it can be enough to just record what happens to further science. In time, we will understand all those things we cannot control, only observe.
Though this reminds me, to paraphrase Hayek, that the curious task of statistics “is to demonstrate to men how little they really know about what they imagine they can” measure.