matching is a set of statistical techniques that intend to “pair” each treated unit with a group of untreated units that has the most similar characteristics
These matched untreated units then become the control group that approximates the unobserved counterfactual, to obtain more reliable causal inference in observational studies. Note that the same untreated units can be used multiple times to match different treated units (matching with replacement)
Nearest neighbor Matching using Mahalanobis distance
Unit 2 (treated) has characteristics
is matched with untreated units with similar characteristics: unit 8 and 9. Therefore, we use the average of the outcome for these units as the counterfactual for unit 2, the control outcome
Unit 3 (treated) has characteristics
doesn’t have exact match, but can be matched with similar untreated units 8-10. Similarly, we fill in all potential outcomes for control and treatment for each unit , calculate individual treatment effect, and calculate the simple average treatment effect
Matching with replacement:
- Good 👍: We always have the best matches, without worrying we’re using up untreated units
- Bad 👎: Many untreated units will be “reused”, which lead to a sample size
Solution?: allow multiple matches (even if they’re secondary, third, etc) to make up a larger sample.
Usage
(Sekhon) It’s usually useful to combine propensity score matching with Mahalanobis distance, because propensity score matching is particularly good at minimizing the discrepancy along the propensity score and Mahalanobis distance is particularly good at minimizing the distance between individual coordinates of X (orthogonal to the propensity score)
If Mahalanobis distance is not optimal for achieving balance in a given dataset, one should be able to search over the space of distance metrics and find something better, using an algorithm like genetic matching
Limitations
- matching requires a large and rich control group with their observed characteristics. Even with large control group, there’s a risk of no common support: some treated units have characteristics outside the range provided by the control group.
- it assumes there is no selection bias arising from unobserved characteristics (this cannot be proved)
- curse of dimensionality: increasing the number of variables we can match requires more data, some are of low quality due to no common support.
Unit 2 (treated) has characteristics
Unit 3 (treated) has characteristics 