(Diamond) (R Code) (Explanation)

A matching method for causal inference in observational study. Instead of matching untreated units with treated units individually, this method involves *assigning units to pairs or strata based on the distances between them, then discarding unpaired units, using the Genetic Algorithm *

The goal of matching is balanced samples, i.e., samples where the distribution of covariates in the treated and control groups is the same so that an estimated treatment effect cannot be said to be due to differences in the covariate distributions.

Algorithm

(2.3.3)

  1. Identify the covariates in the observational data set.
  2. standardize the variables
  3. For each treat unit, match them with the most similar untreated units (based on observable covariate) without looking at the outcome. To reduce selection bias, we want to scale the covariates such that we can create the most balance between two treatment and comparison groups.
    1. Pick a weight for each covariate, to create a weighting scheme for all
    2. Match
    3. Check balance for all covariate using strict matching balance test
      1. Student t-test statistic for means between groups
      2. Multivariate bootstrap KS test for two distributions
      3. For both, genetic matching maximizes the smallest p-value (associated with one covariate) at every generation. Intuitively, that means it focuses on maximizing the balance (similarity between 2 groups) for the worst-balanced covariate.
      4. We want p-value to be at least 0.15 for 200 treatment units, but it might be less for larger sample size.
    4. Repeat using Genetic Algorithm to search over the space of weighting schemes and identify weights that improve the overall balance.
      1. It might sacrifice the balance in some well-balanced covariates to improve the very imbalanced ones
    5. Stop when balance achieved or it stops improving
  4. Impute the missing counterfactual data, by averaging observed outcomes of the matches. For example, the person aged 22 in the treated unit below, has potential outcomes for control , by using their match (22-year in the control group).
  5. Compute average treatment effect and p-value. We want to p-value < 0.05 here for the treatment effect to be statistically significant treatment is effective.

Mahalanobis metric

This uses a generalized Mahalanobis distance with an additional weight matrix

  • is a positive definite weight matrix
  • is the Cholesky decomposition of , the sample covariance matrix of

Code

(Workbook)

library(Matching)
data(lalonde)
attach(lalonde)
 
# find the scaling weights for variables 
# match using nearest neighbors, using genetic matching
genout <- GenMatch(Tr=treat, estimand="ATT", X = X, M=1,  pop.size=16, max.generations=10, wait.generations=10)
 
# match (w/ without outcome)
mout <- Match(Tr=treat, X = X, Weight.matrix = genout)
 
# check balance 
mb <- MatchBalance(treat~age +educ, match.out=mout, nboots=500)

Usage

Limitations

Code

(Fancy Matching)