Currently under review! See issues
This package implements a multivariate conditional independence test and an algorithm for learning directed graphs from data based on the PCIT
- Samuel Burkart: designated point of contact
- Franz Kiraly
If you like to contribute, read our contribution guide.
There are 3 main functions:
- MetaEstimator: Estimator class used for independence testing
- PCIT: Multivariate Conditional Independence Test
- find_neighbours: Undirected graph skeleton learning algorithm
For the following, X, Y and Z can be univariate or multivariate
from sklearn.datasets import load_boston
data = load_boston()['data']
X = data[:,1:2]
Y = data[:,2:4]
Z = data[:,4:10]
PCIT(X, Y, confidence = 0.01)
The direction of the prediction is X -> Y, and as such the p-values correspond to the hypothesis that adding X does not improve the prediction of Y (one for each dimension in Y). If the parameter 'symmetric' is set to True (default), both directions are tested.
PCIT(X, Y, z = Z)
Testing if X is independent of Y, conditional on Z, using a custom MetaEstimator, multiplexing over a manually chosen set of estimators:
from sklearn.linear_model import RidgeCV, LassoCV,
SGDClassifier, LogisticRegression
regressors = [RidgeCV(), LassoCV()]
classifiers = [SGDClassifier(), LogisticRegression()]
custom_estim = MetaEstimator(method = 'multiplexing',
estimators = (regressors, classifiers))
PCIT(X, Y, z = Z, estimator = custom_estim)
X = load_boston()['data']
find_neighbours(X)
Conditional as well as multivariate independence testing are difficult problems lacking a straightforward, scalable and easy-to-use solution. This project connects the classical independence testing task to the supervised learning workflow. This has the following advantages:
- The link to the highly researched supervised learning workflow allows classical independence testing to grow its power as a side effect of the improvement in supervised learning methodology
- The sophisticated knowledge of hyperparameter-tuning in supervised prediction removes any need for hyperparameter tuning and manual choices prevalent in current methodology
- As a wrapper for the sklearn package, the PCIT is easy to use and adjust
Can be installed through pip
pip install pcit
The dependencies are:
Three tests can be run:
Test_PCIT_Power: Tests the power for increasing sample sizes on a difficult v-structured problem. Matlab code for same problem to compare with the "Kernel Conditional Independence Test" can be found here
Test_PCIT_Consistency: Here the consistency under perturbations in the data is assessed.
Test_Structure: Here the power and false-discovery rate control of the graphical model structure learning algorithm are assessed
MIT License