Quickstart¶

Installation¶

Install pqp using pip

pip install pqp

Usage¶

To characterize a causal effect, we need to specify three things:

Causal Assumptions: a graph representing the causal structural relationships between variables
Parametric Assumptions: a parametric model of the joint distribution of the data
Causal Estimand: an algebraic expression representing the causal effect of interest

Setup¶

To get started, we will spoof some data to run our analysis on.

import pandas as pd

df = pd.DataFrame({
    "x": [0, 0, 0, 1, 1, 0],
    "z": [0, 1, 0, 1, 1, 0],
    "y": [0, 1, 0, 1, 1, 0],
})

We can use the make_vars function to create a list of variables.

from pqp.symbols import make_vars
x, y, z = make_vars("xyz")

Note that the names of these variables match the column names in the data frame.

Causal Assumptions¶

We can then assemble these variables into a causal diagram using the Graph class. Here we will build the famous front-door model.

Infix operators are used to construct causal relationships. The <= operator is used to indicate causal influence from right to left, while the & operator is used to indicate confounding.

from pqp.identification import Graph
g = Graph([
    x & y,
    z <= x,
    y <= z,
])

We can use the .draw() method to visualize the causal diagram.

g.draw()

Parametric Assumptions¶

For the purposes of this article, we will assume that the data is drawn from a multinomial distribution. We can use the MultinomialEstimator class to specify the parametric assumptions.

from pqp.estimation import MultinomialEstimator
estimator = MultinomialEstimator(df, prior=1)

The prior argument specifies the prior strength of the model. The default is zero, in which case the model fits through maximum likelihood. We are using a nonzero value here because if you don’t specify a prior, the model will not always give positive probability estimates to events, which can cause problems when estimating causal effects.

If you don’t specify a prior, don’t worry though. If the estimator runs into a problem, it will throw an exception and tell you what to do.

Causal Estimand¶

For this example, we will estimate the average treatment effect of x on y. First, we need to define the treatment and control conditions.

treatment_condition = [x.val == 1]
control_condition = [x.val == 0]

Then, we can use the ATE class to define the causal estimand.

from pqp.identification import ATE
causal_estimand = ATE(y, treatment_condition, control_condition)

#inspect the expression
causal_estimand.expression().display()

Identification and Estimation¶

Now, we can first use the causal assumptions to identify the causal estimand, and then we can use the parametric assumptions to estimate the causal effect.

To identify the causal relationships in the causal diagram, we can use the .identify() method. For example, to identify the causal relationship between x and y, we can use the following:

estimand = g.identify(causal_estimand).identified_estimand
estimand.display()

We can then use the .estimate() method to estimate the causal effect.

effect = estimator.estimate(estimand)
effect
# => EstimationResult(value=0.4433808167141502)

Interpretability and Robustness¶

One of the most important features of pqp is its ability to provide human-interpretable explanations of the workings of the code. Many of the routines make very specific assumptions about the structure of the data or the effects of interest. It’s important for users to understand these assumptions so they can understand the potential limitations of an analysis.

The currency of pqp is the Result class. Any calculations that draw conclusions from the data will return instances of this class. This class tracks the transformations and assumptions made by the algorithms. As Results are assembled into successively more complex analyses of the data, pqp builds a dependency graph which tracks how different Result instances relate to each other and allows the user access to a list of steps executed in an analysis and the assumptions made.

To access the list of steps, we can use the .explain() and explain_all() methods, which detail the current result only or all results in the dependency graph, respectively.

effect.explain_all()

Output:

Data Processing
    Assume: x is on BinaryDomain()
    Assume: z is on BinaryDomain()
    Assume: y is on BinaryDomain()
Identification
    We will identify the average treatment effect using IDC.
    Assume: Noncontradictory evidence
    Assume: Acyclicity
    Assume: Positivty
    IDC
        Input:
        P(y| do(x))
        Output:
        Σ_(z) [ [Σ_(x) [ [P(x) * P(x, z, y) / P(x, z)] ] * P(x, z) / P(x)] ]
    Derived: identified_estimand = E_(y) [ Σ_(z) [ [Σ_(x) [ [P(x) * P(x, z, y) / P(x, z)] ] * P(x = 1, z) / P(x = 1)] ] ] - E_(y) [ Σ_(z) [ [Σ_(x) [ [P(x) * P(x, z, y) / P(x, z)] ] * P(x = 0, z) / P(x = 0)] ] ]
Fit MultinomialEstimator
    Assume: Multinomial likelihood
    Assume: Dirichlet prior
Estimation
    Performing brute force estimation using a multinomial likelihood and dirichlet prior.
    Derived: value = 0.4433808167141502