Archive for June 12th, 2007|Daily archive page
HowTo: Propensity Score Matching
Sadly, Billy Bragg never wrote a song about Propensity Score Matching (although his Walk Away Renee might do, in a pinch?).
I give you this not because it is of great use or importance (or, to be fair, even interest), but because I’m looking into it for purposes of my own. And misery loves company. Suckers. So this is not a ‘how to’. I’m not even the first person to write about it on a blog. Consider it more of a ‘what is’. Without the maths, even.
I’ve mentioned the evaluation problem previously. I’ll illustrate with an example upon which I’ve worked previously: hysterectomy (NB: I’m not one of those authors). Simplifying that problem a bit, suppose there are two types of hysterectomy: Abdominal, and Vaginal (fellows: a hysterectomy is the removal of the uterus). The evaluation problem is this. You cannot give the same woman both types. Once you’ve given her an Abdominal hysterectomy, that’s it (fellows: women only have one uterus. What?). She cannot have a Vaginal one. There is no perfect counter-factual information with which to compare the factual.
Ergo you cannot compare the effectiveness of two types of hysterectomy on exactly the same person. This is also know as the effect of treatment on the untreated. This is where clinical trials come in. A Randomised Control Trial is one where all participants have been randomly allocated treatment or non-treatment (or Treatment A, Treatment B, etc. You get the picture). The idea is that the pool of treated patients is exactly the same as the pool of untreated patients. More importantly the probability of being treated, given any personal characteristics you might have, should be exactly the same as the probability of not being treated (and vice versa: the probability that you’re a male, for example, given you were treated, should be exactly the same as the probability that you’re a male given you weren’t treated – assuming treatment was not a sex-change operation. That really was beneath me).
Now it gets interesting. This is all well and good for clinical trials, but something else exists – something called a natural experiment. A natural experiment for our purposes is one without a properly-constructed control group, and usually arises with retrospective data analysis. If you wanted to examine something pertaining to an outbreak of Ebola Zaire, you can hardly randomise villages in central Africa, giving some of them Ebola and others not. If it so happens that you do have delusions of Dr. Mengele, kindly keep them to yourself. Nor, if any outbreak happens to occur, can you easily find a control group – no other region will be exactly alike in terms of culture, climate, etc. and there’s no guarantee that if you use the same region at an earlier period you will be successful (you could have missed the famine or war that led to young men eating a gorilla that was found dead rather than shot, or however the outbreak started).
An example I’ve also used in class is so-called 9/11 (no, I’m not disputing it happened, Americans just have a thoroughly backwards approach to dates. The American 9/11 was actually our 11/9 and it is the Americna who are wrong. I’ve also been to ‘ground zero’ and discovered a truly disgusting entrepreneurial spirit at work. Perhaps more on that another day, but I doubt it. It makes me want to throw up). It is a perfect natural experiment for emergency services responsiveness – but not one that can be carried out under RCT conditions, for obvious reasons.
Now, non-experimental data is likely to contain biases of one kind or another. In health care, for example, we cannot simply look at the health care demand of insured vs. uninsured people – the very fact that an insured person is insured means they will/may
(i) use more health care because there’s a co-pay, or some other deduction, and/or
(ii) use more health care because they’re sicker, which is why the bought the bloody health insurance in the first place, or
(iii) use more health care because they are more educated and make more money, hence can afford health insurane and understand the benefits of investments in their health stock (this last category consists entirely of Michael Grossman).
This is known as selection bias. And it can be passive or active, if you like. If you conduct a survey in only, say, the north of Italy, you get selection bias. Because is more health, wealthy, educated and industrialised than the south. If you conduct a survey on Fox News, you get self-selection bias, because only viewers of Fox News will respond. So you can’t compare teaching in public/private schools using test scores, because there will be systematic (non-random) differences. You can’t compare an outbreak of Ebola in Central Africa with no outbreak in West Africa. You can’t compare emergency response to a terrorist attack in New York to one in London or Madrid for the same reason.
There is also simultaneity bias, or reverse causation. That is to wit (I’m old-fashioned), consider this: does greater litigiousness generate more litigation lawyers, or did more litigation lawyers generate more litigation? Some of both, no doubt, but a regression model needs to be built to pick up causation in the right direction.
So, Propensity Score Matching. This is an econometric technique for use with non-experimental data. It is designed to overcome the bias that you will face if and when you decide to compare the effects of treatment on the treated with those of non-treatment on the non-treated, if you get your data from outside an RCT setting. If your treated and control groups are from different areas, different datasets, different time periods, and so forth. Propensity Score Matching is also not the only approach. Follow the link for non-experimental data, and you’ll find a handful more. The elsblog discusses some, as well.
The seminal paper for this stuff, by the by, is Propensity Score-Matching Methods For Nonexperimental Causal Studies by Rajeev Dehejia and Sadek Wahba (he’s from Morgan Stanley – no webpage).
So the trick is matching. Using covariates (Age, Gender, Income, Education, Marital Status, Hair Colour – anything of economic and/or statistical importance to the outcome of interest), we can theoretically break our sample of treated people into groups, or bins. All single males 28 years old, college-educated, non-smoking, living in urban environments, and so forth can then be compared, in terms of treatment, only with other single males 28 years old, college-educated, non-smoking, living in urban environments, and so forth. Repeat. However, see the problem? Every bin needs a corresponding bin in the non-treated group (this could be another sample). If there isn’t one, you have to exclude all of those people. If there is, but the bin is not sufficiently populated, so too out it goes.
The trick, the goal, the purpose of Propensity Score Matching is overcoming the problem of dimensionality. Comparing treated individuals to non-treated with non-experimental data is an enterprise entirely victim to the covariates at hand. Suppose you have 15 econometrically relevant explanatory variables. To employ them all is to render the econometric problem practically unsurpassable, but to use only, say, 5 of the variables will render the estimation and explanation of the response variable practically useless. Propensity score matching does this: it is a matching method that, instead of using every Xi, uses p(Xi), where p(Xi) is the probability of having been treated, given the covariates. The probability p(Xi) is the propensity score.
Bingo! Our dimensionality problem is gone. Instead of n covariates, we have a single score.
Then, each treated individual (assuming for the sake of argument that it is an individual with whose treatment we are dealing) is compared to non-treated individuals according to their propensity score. There are a few approaches to this, too. First, the treated individuals have to be ranked. They can be ranked in ascending order, descending order (this is with respect to their propensity score) or randomly. The ranking determines the order in which they are matched.
Once ranked, they are matched. Individuals can be matched with replacement or without replacement. Without replacement, once a non-treated individual has been matched, or ‘paired’ with a treated individual, they are removed from the pool. This can be a problem if you don’t have loads of non-treated individuals, as each subsequent match may involve greater distances (also in terms of the propensity score). Moreover it may not make sense. If you do use RCT data, comparing means (or using regression), you still compare with replacement, effectively. It should be case-by-case, but for me the arguments for matching with replacement are convincing enough.
Next: how many matches? If you match strictly one-to-one, you guarantee minimum bias, and minimum ‘distance’ between matches. But if you use more matches, you should get more precise estimates, albeit at the risk of greater bias (like the with/without replacement. As this propensity score ‘distance’ increases, so does the likelihood that you are comparing a treated individual with a systematically different non-treated individual). There are a couple of algorithms for this. One is the nearest-neighbour method, which automatically takes the m nearest non-treated propensity scores (but you pick the m – although that too can be optimised), and another is the ‘caliper’ method, which picks however many non-treated individuals are within a pre-specified ‘distance’ (that too could be optimised). This is also a case-by-case concern. There is no set rule for applying the Propensity Score Matching method to non-experimental data.
The Dehejia and Wahba (2002) paper uses data from a preceding paper by Robert LaLonde, comparing training programmes. This gave them the advantage of having on hand experimental data with which to compare results from a constructed non-experimental problem. Their results were pretty good, and Propensity Score Matching has entered the methodological world.
Why of interest to me? I intend, along with a colleague, in applying some Bayesian value-of-information standards to the method, to look at some of the preliminary testing that goes on to assess the suitability of comparison groups, as well as to evaluate the likelihood that the ultimately-estimated Treatment Effect is correct for a given individual. Look out for future posts containing discussion of value-of-information analysis. Now that stuff is fun.
Theirs are the skies all dark with bombers/And mine is the peace we knew/Between the wars
Wireless is still dead. What I want to fix it doesn’t seem to exist, and I as as thoroughly soaked while returning from CompUSA as I’ve ever been. I’d rather just sit here and finish Death on the Nile than work. Still.
In yesterday’s IHT (a newspaper I seem to read more on my mobile phone than anywhere/how else) I learned that General Motors (now routinely called GM – and why not, given that cars aren’t exactly buttering their bread these days. England has had some recent experience with their financial services arm), Ford and Chrysler are planning on going into the health insurance business together. They are in talks, secret – but involving at least 5 people who are telling tales out of school – to do something about what we are told is a combined (future) retiree health care cost of USD114bn. The combined cost across the 3 companies last year was USD12bn.
Not bad, not bad at all. We are routinely (for me, anyway) reminded over here how these costs add something like USD1500 per car. I’m a jerk, me. I tend to feel little sympathy for a company that should have been saving money for this instead of playing with it or giving it to shareholders and CEOs (you’ll notice those stories kind of dry up a couple of years ago).
Unless they were actually gambling on all their workers dying before retirement, having retirees on employer pension plans is kind of easy to foresee, when you’re the employer. But back to the story.
As a reference piont, Reuters refers to the steel industry, wherein the steelworkers union managed to stitch together a trust (so that billionaire Wilbur Ross could buy up bankrupt steel makers without getting stuck with these costs, to be fair). Frankly, given I teach in the home of the former Bethlehem Steel, I’m not so sure I’d use them as a model for anything. But like I said. I’m a jerk.
It’s going to be a casino now, by the way.
Very little about the deal, if there will be one, is known (not surprising). The thing about it that strikes me interesting is that 3 auto makers coming together to form a trust for the provision of health care benefits for retires into perpetuity, say, is the absence of risk pooling/spreading at either end. I don’t really know what auto workers do (if you say “make autos”, you die). I don’t know whether or not they are more likely to suffer specific health problems as they age. But suppose they do. It would surely be better to spread that amongst a bundle of companies in other industries, but make each contribute an actuarially fair sum?
Alternatively these are 3 American auto companies – that doesn’t do much to minimise the risk that they’ll all suffer hard times at the same time - and the trust will lose some contributions. The point, I know, is that once it is a trust, unlike pension funds previously, Peter can’t be robbed to pay Paul and leave Peter’s children destitute (too stretched? I wasn’t sure. See, Peter’s children are the workers. Oh forget it). I’m not saying it’s a bad idea. I’m saying it’s a good idea that could be implemented brilliantly, or reasonably poorly.
Optimally, I’d like to see government involvement. No, seriously, and not just because I’m a social welfarist. Health care fraud costs American taxpayers. So do personal bankruptcies, of which health expenses are the primary cause (this was before the utterly nasty, unforgiveable Bankruptcy Bill, so I don’t know now – are you even allowed to go bankrupt anymore?). Finally, we’ve all heard about Walmart employees being on Medicaid, but retirees are, like as not, hooked up to Medicare (the American one) one way or another. That is it’s purpose, after all. And Medicare has the expertise, such as it is, on administration.
It seems to me that an efficient solution would be one managed broadly that included risk-spreading in terms of contribution, pooling in terms of expenditure and health care utilisation, that included public contribution, possibly oversight and that was administered on a non-profit basis. At least sufficently such that (i) it doesn’t siphon off great slices of the money in fees (see: privatising medicare), or (ii) go belly-up itself, leaving workers with nobody left to sue.
Who knows, perhaps that is what’s being cooked up behind those closed doors.
Leave a Comment
Leave a Comment



