Fast Causal Inference is Tencent's first open-source causal inference project. It is an OLAP-based high-performance causal inference (statistical model) computing library, which solves the performance bottleneck of existing statistical model libraries (R/Python) under big data, and provides causal inference capabilities for massive data execution in seconds and sub-seconds. At the same time, the threshold for using statistical models is lowered through the SQL language, making it easy to use in production environments. At present, it has supported the causal analysis of WeChat-Search, WeChat-Video-Account and other businesses, greatly improving the work efficiency of data scientists.
- Provides the causal inference capability of second-level and sub-second level execution for massive data
Based on the vectorized OLAP execution engine ClickHouse/StarRocks, the speed is more conducive to the ultimate user experience
- Provide basic operators, causal inference capabilities of high-order operators, and upper-level application packaging
Support ttest, OLS, Lasso, Tree-based model, matching, bootstrap, DML, etc.
- Minimalist SQL usage
SQLGateway WebServer lowers the threshold for using statistical models through the SQL language,
and provides a minimalist SQL usage method on the upper layer, transparently doing engine-related SQL expansion and optimization
Basic causal inference tools
- ttest based on deltamethod, support CUPED
- OLS, 100 million rows of data, sub-second level
Advanced causal inference tools
- OLS-based IV, WLS, and other GLS, DID, synthetic control, CUPED, mediation are incubating
- uplift: minute-level calculation of tens of millions of data
- Data simulation frameworks such as bootstrap/permutation are being developed to solve the problem of variance estimation without a displayed solution
Already supported multiple businesses within WeChat, such as WeChat-Video-Account, WeChat-Search, etc.
github: https://github.com/Tencent/fast-causal-inference
- The machine needs to install and start the docker service
-
Linux:
-
Centos:
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum install docker-ce
systemctl start docker -
Ubuntu:
sudo apt-get install docker-ce
-
verify docker service status:
systemctl status docker
-
Install docker-compose container service orchestration tool
pip3 install --upgrade pip && pip3 install docker-compose
-
-
MacOS:
reference to https://docs.docker.com/desktop/install/mac-install/, Directly download the .dmg package and double-click to install it, Please make sure the docker service is running
Add PATH:echo 'export PATH="/Applications/Docker.app/Contents/Resources/bin:$PATH"' >> ~/.bash_profile && . ~/.bash_profile
-
verify docker service status:
docker ps
-
git clone https://github.com/Tencent/fast-causal-inference
cd fast-causal-inference && sh bin/deploy.sh
http://127.0.0.1To start causal analysis, please refer to the built-in demo.ipynb