Code Monkey home page Code Monkey logo

tencent / fast-causal-inference Goto Github PK

View Code? Open in Web Editor NEW
105.0 7.0 20.0 228.93 MB

It is a high-performance causal inference (statistical model) computing library based on OLAP, which solves the performance bottleneck of the existing statistical model library (R/Python) under big data

License: Other

Shell 0.10% Python 4.01% C++ 3.39% Dockerfile 0.02% Kotlin 0.14% FreeMarker 0.11% Java 91.87% HTML 0.16% Batchfile 0.02% PigLatin 0.01% Ruby 0.01% SCSS 0.15% JavaScript 0.01% Makefile 0.01% CMake 0.01%

fast-causal-inference's Introduction

Fast-Causal-Inference

license Release Version PRs Welcome

Introduction

Fast Causal Inference is Tencent's first open-source causal inference project. It is an OLAP-based high-performance causal inference (statistical model) computing library, which solves the performance bottleneck of existing statistical model libraries (R/Python) under big data, and provides causal inference capabilities for massive data execution in seconds and sub-seconds. At the same time, the threshold for using statistical models is lowered through the SQL language, making it easy to use in production environments. At present, it has supported the causal analysis of WeChat-Search, WeChat-Video-Account and other businesses, greatly improving the work efficiency of data scientists.

Main advantages of the project:

  1. Provides the causal inference capability of second-level and sub-second level execution for massive data Based on the vectorized OLAP execution engine ClickHouse/StarRocks, the speed is more conducive to the ultimate user experience
    topology
  2. Provide basic operators, causal inference capabilities of high-order operators, and upper-level application packaging
    Support ttest, OLS, Lasso, Tree-based model, matching, bootstrap, DML, etc.
    topology
  3. Minimalist SQL usage SQLGateway WebServer lowers the threshold for using statistical models through the SQL language, and provides a minimalist SQL usage method on the upper layer, transparently doing engine-related SQL expansion and optimization
    topology

The first version already supports the following features:

Basic causal inference tools

  1. ttest based on deltamethod, support CUPED
  2. OLS, 100 million rows of data, sub-second level

Advanced causal inference tools

  1. OLS-based IV, WLS, and other GLS, DID, synthetic control, CUPED, mediation are incubating
  2. uplift: minute-level calculation of tens of millions of data
  3. Data simulation frameworks such as bootstrap/permutation are being developed to solve the problem of variance estimation without a displayed solution

Project application:

Already supported multiple businesses within WeChat, such as WeChat-Video-Account, WeChat-Search, etc.

Project open source address

github: https://github.com/Tencent/fast-causal-inference

Getting started

Preconditions
  1. The machine needs to install and start the docker service
    • Linux:

      • Centos:

        yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
        yum install docker-ce
        systemctl start docker

      • Ubuntu:

        sudo apt-get install docker-ce

      • verify docker service status:

        systemctl status docker

      • Install docker-compose container service orchestration tool

        pip3 install --upgrade pip && pip3 install docker-compose

    • MacOS:
      reference to https://docs.docker.com/desktop/install/mac-install/, Directly download the .dmg package and double-click to install it, Please make sure the docker service is running
      Add PATH:

      echo 'export PATH="/Applications/Docker.app/Contents/Resources/bin:$PATH"' >> ~/.bash_profile && . ~/.bash_profile

    • verify docker service status:

      docker ps

One-Click Deployment:

git clone https://github.com/Tencent/fast-causal-inference
cd fast-causal-inference && sh bin/deploy.sh
http://127.0.0.1

To start causal analysis, please refer to the built-in demo.ipynb

fast-causal-inference's People

Contributors

fffffffhhhhhhh avatar fhbai avatar huangyanyanyan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

fast-causal-inference's Issues

未看到StarRocks的demo

感谢开源,但是看介绍中提到:
“Based on the vectorized OLAP execution engine ClickHouse/StarRocks, the speed is more conducive to the ultimate user experience”
但实际demo中只看到clickhouse的配置,请问现在是否已经支持StarRocks,能否补上demo

spark支持

您好!有没有考虑增加对spark的支持?如果有的话,考虑scala吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.