Causal Library in Python: Part 1

September 23, 2016

In the ninth semester, we are required to work on a 12 credit project. My project is on building a python library for causal inference. I’m working on this with Chandan Yeshwanth.

Our first task was to find more about existing libraries. Here is a list of libraries that we found:

pcalg: This R package supports both structure learning and inference. For structure learning, PC, FCI, RFCI and GIES algorithms are implemented. For causal inference, the IDA algorithm, the Generalized Backdoor Criterion (GBC) and the Generalized Adjustment Criterion (GAC) have been implemented.
CausalInference: This python library sopports
- Assessment of overlap in covariate distributions
- Estimation of propensity score
- Improvement of covariate balance through trimming
- Subclassification on propensity score
- Estimation of treatment effects via matching, blocking, weighting, and least squares
CausalImpact: This R package estimates the causal effect of a designed intervention on a time series.
SuperLearner and tmle: These are R packages that implement targeted maximum likelihood estimation (TMLE).

The main challenge in conducting any survey in the area of causal inference, is that there is a wide variation in the terminology and language used, depending on the author’s background. For example, my introduction was through Judea Pearl’s book Causality, where the it was all about Markov models and DAGs and graphs. However, other books on causal inference talk about the Granger causality test, propensity scores and treatment effects.

Ultimately, we decided to implement CausalImpact in python. We chose this because, firstly, it seemed do-able in the time that we have. Secondly, since it is basically the R implementation of this paper, we didn’t have to spend too much time in finding resources to understand the theory.

Our next steps are to understand the paper first, and then begin its implementation.