packtpublishing / causal-inference-and-discovery-in-python Goto Github PK

View Code? Open in Web Editor NEW

742.0 17.0 249.0 7.28 MB

Causal Inference and Discovery in Python by Packt Publishing

License: MIT License

Jupyter Notebook 99.83% Python 0.17%

causal-inference-and-discovery-in-python's Introduction

Causal Inference and Discovery in Python

This is the code repository for Causal Inference and Discovery in Python, published by Packt.

Unlock the secrets of modern causal machine learning with DoWhy, EconML, PyTorch and more

What is this book about?

Causal methods present unique challenges compared to traditional machine learning and statistics. Learning causality can be challenging, but it offers distinct advantages that elude a purely statistical mindset. Causal Inference and Discovery in Python helps you unlock the potential of causality.

You’ll start with basic motivations behind causal thinking and a comprehensive introduction to Pearlian causal concepts, such as structural causal models, interventions, counterfactuals, and more. Each concept is accompanied by a theoretical explanation and a set of practical exercises with Python code.

Next, you’ll dive into the world of causal effect estimation, consistently progressing towards modern machine learning methods. Step-by-step, you’ll discover Python causal ecosystem and harness the power of cutting-edge algorithms. You’ll further explore the mechanics of how “causes leave traces” and compare the main families of causal discovery algorithms.

The final chapter gives you a broad outlook into the future of causal AI where we examine challenges and opportunities and provide you with a comprehensive list of resources to learn more.

This book covers the following exciting features:

Master the fundamental concepts of causal inference
Decipher the mysteries of structural causal models
Unleash the power of the 4-step causal inference process in Python
Explore advanced uplift modeling techniques
Unlock the secrets of modern causal discovery using Python
Use causal inference for social impact and community benefit

If you feel this book is for you, get your copy today!

Instructions and Navigations

All of the code is organized into folders.

The code will look like the following:

preds = causal_bert.inference(
    texts=df['text'],
    confounds=df['has_photo'],
)[0]

Following is what you need for this book:

This book is for machine learning engineers, data scientists, and machine learning researchers looking to extend their data science toolkit and explore causal machine learning. It will also help developers familiar with causality who have worked in another technology and want to switch to Python, and data scientists with a history of working with traditional causality who want to learn causal machine learning. It’s also a must-read for tech-savvy entrepreneurs looking to build a competitive edge for their products and go beyond the limitations of traditional machine learning.

With the following software and hardware list you can run all code files present in the book (Chapter 1-15).

Software and Hardware List

Chapter	Software required	OS required
1-15	Python 3.9	Windows macOS, or Linux
1-15	DoWhy 0.8	Windows, macOS, or Linux
1-15	EconML 0.12.0	Windows, macOS, or Linux
1-15	CATENets 0.2.3	Windows, macOS, or Linux
1-15	gCastle 1.0.3	Windows, macOS, or Linux
1-15	Causica 0.2.0	Windows, macOS, or Linux
1-15	Causal-learn 0.1.3.3	Windows, macOS, or Linux
1-15	Transformers 4.24.0	Windows, macOS, or Linux

Join our Discord server

Join our Discord community to meet like-minded people and learn alongside more than 2000 members at Discord

Get to Know the Author

Aleksander Molak is a Machine Learning Researcher and Consultant who gained experience working with Fortune 100, Fortune 500, and Inc. 5000 companies across Europe, the USA, and Israel, designing and building large-scale machine learning systems. On a mission to democratize causality for businesses and machine learning practitioners, Aleksander is a prolific writer, creator, and international speaker. As a co-founder of Lespire, an innovative provider of AI and machine learning training for corporate teams, Aleksander is committed to empowering businesses to harness the full potential of cutting-edge technologies that allow them to stay ahead of the curve. He's the host of the Causal AI-centered Causal Bandits Podcast.

Note from the Author:

Environment installation

See the section Using graphviz and GPU below
To install the basic environment run: conda env create -f causal_book_py39_cuda117.yml
To install the environment for notebook Chapter_11.2.ipynb run: conda create -f causal-pymc.yml

NOTE: We added an experimental environment for Apple M1 as suggested by @ferrari-leo here. This environment hasn't been thoroughly tested so please use it at your own risk.

Selecting the kernel

After a successful installation of the environment, open your notebook and select the kernel causal_book_py39_cuda117

For notebook Chapter_11.2.ipynb change kernel to causal-pymc

Using `graphviz` and GPU

Note: Depending on your system settings, you might need to install graphviz manually in order to recreate the graph plots in the code. Check https://pypi.org/project/graphviz/ for instructions specific to your operating system.

Note 2: To use GPU you'll need to install CUDA 11.7 drivers. This can be done here: https://developer.nvidia.com/cuda-11-7-0-download-archive

Citation

BibTeX

@book{Molak2023,
    title={Causal Inference and Discovery in Python: Unlock the secrets of modern causal machine learning with DoWhy, EconML, PyTorch and more},
    author={Molak, Aleksander},
    publisher={Packt Publishing},
    address={Birmingham},
    edition={1.},
    year={2023},
    isbn={1804612987},
    note={\url{https://amzn.to/3RebWzn}}
}

APA

Molak, A. (2023). Causal Inference and Discovery in Python: Unlock the secrets of modern causal machine learning with DoWhy, EconML, PyTorch and more. Packt Publishing.

‼️ Known mistakes // errata

For known errors and corrections check:

If you spotted a mistake, let us know at book(at)causalpython.io or just open an issue in this repo. Thank you 🙏🏼

causal-inference-and-discovery-in-python's People

Contributors

Stargazers

Watchers

Forkers

python-repository-hub eastrain517 mattburnham chenweichen glapierr darciogm xiemeigongzi javiervicho nashquant mateuscichelero bestcourses-ai ccaballeroh michaelallen1966 andri301 snowdj georgi-petkov vinothmdev fitzhugor bakht-zaman seanby arturo-kaxanuk danielzaretsky codeaudit animesh fscipioni avudzor wuzhipeng2014 avsolatorio wesleyz anhngv102 duyamin saibaldasprivate anhnguyendepocen pauljw28 ri-moura danniuiuc etusien jcamacaro profsingletary omarashkar espron enriquemascote sunshineluyao mirthir galvin-mj lssantos dineshdyne diegoascanio eodenyire vincent-wq kishorkukreja sarkaft chakrs dabblingfrancis saurabhr cyoungyoung luoylin restevesd jh2737 garima1221 teuffy wangkun543604 dulakshiv nstepka rajaramkuberan chaoliu-cuhk alainlompo berengereg sayanddude gregorycrane surajrepo pr-124 pudja2001 nboitout cuma-yigit waykole davidkim0523 sdumza ymazari hendrikvandoorn edithso thbland iamnagesh fdoperezi techthiyanes hbcbh1999 nataliarosa9 huawen-poppy teja-p kccheng1988 buriburizymon9 vidyasagarbhargava mekongdelta-mind zqcsrz yamassindir jinghuayao nisar-1234 m-rath yuzhangsjtu ericagyemang

causal-inference-and-discovery-in-python's Issues

Chapter 6 - A milestone, but shall a little bit math would also help here?

Hi again @AlxndrMlk ,

I am really enjoying your book, and, again: Congratulations!

After reading Chapter 6, my impression is that the book engages in a very practical topic on how to use Causal Inference for many chapters, which is very good. However, I missed some technical details in Chapter 6 that I am trying to reproduce.

I took the chapter pynb, and combined with the chapter 7 pynb in order to reproduce the pipeline you used to demonstrate the intervention vs observation calculation of the effect size of X on Y across diverse DAGs representing the data generative method. Your construction of chapters 6 and 7 pynb were very didactic.

Although it was possible to reproduce the results for IV, Front Door and Back Door cases matching the E(Y|X) from the data generative process directly with intervention data, but very reassuring with the observational data with both the formula and the DoWhy,
I relied on the DoWhy Estimand´s expression.

I would feel more confident having learned to deduct the expressions, at least for the base cases. My gut feeling is that if I combine the data generative expressions and write in the form Y=f(X,Y), I could rearrange the partial derivatives to find the E(Y|X) expression, it worked for Front Door case, but when I wrote for IV the expression diverged from the DoWhy estimand, so I am missing some point(s).

Can you guide me on where to find the mathematical deductions of the expressions for such didactic simple cases? And maybe consider adding it in the next edition.

Thank you

[Chapter9] Causal_estimator.effect for prediction

Hello, I am Jake Lee from Korea, a passionate reader of your book

I have found that couple of codes not working for prediction part in Ch09.
Your code flow as following

Instantiate Causal model
Estimand
Estimate
Predict test data using .causal_estimator.effect

However #4 is not working from my side (description said there is no object of causal_estimator)
It would be appreciate if you give me help on it, especially in case that code is running in latest DoWhy version (11.0)

Thanks in advance!
Jake

Add bibtex reference to the book in the `README.md`

It would be a real help for anyone wanting to cite the book if you could add the bibtex entry text to the bottom of the README.md 👍

Along the same lines... I'm not sure if it's under your control, but I also can't find the bibtex citation through a Google Scholar search.

Chap 4 - `from_numpy_matrix` is deprecated

in networkx 3.0 the changelog shows the following "Remove to_numpy_matrix & from_numpy_matrix

Update the line of code: graph = nx.from_numpy_array(adj_matrix, create_using=nx.DiGraph)

dowhy version 0.11.1 - Incompatibilities, can be fixed ?

First of all, congratulations on the book! It is a great work.

In parallel with reading, I am trying to run the .pynb in the updated version of dowhy under python 3.11.8, which is closer to my Data Science environment than the python 3.9, and 0.8 dowhy version.

Unfortunately, I am not able to fix the error in Chapter 7 and Chapter 8.

In chapter 7, the cell 21:

estimate = model.estimate_effect(
      identified_estimand=estimand,
      method_name='backdoor.econml.dml.DML',
      method_params={
          'init_params': {
              'model_y': GradientBoostingRegressor(),
              'model_t': GradientBoostingRegressor(),
              'model_final': LassoCV(fit_intercept=False),
          },
          'fit_params': {}}
  )
  
  print(f'Estimate of causal effect (DML): {estimate.value}')

leads to an ImportError: "Error loading econml.dml.DML. Double-check the method name and ensure that all econml dependencies are installed.".

which looks similar to the cell 5 which also try to use DML:

# Get estimate (DML)
estimate_dml = model.estimate_effect(
    identified_estimand=estimand,
    method_name='backdoor.econml.dml.DML',
    method_params={
        'init_params': {
            'model_y': GradientBoostingRegressor(),
            'model_t': GradientBoostingRegressor(),
            'model_final': LassoCV(fit_intercept=False),
        },
        'fit_params': {}}
)

Complaining that: "ImportError: Error loading econml.dml.DML. Double-check the method name and ensure that all econml dependencies are installed."

Any clue/ guidance on how to fix this?

Chapter 7 notebook array shape error messages

Possibly related to the numba deprecation warning , the following code spits out an array shape error, which then propogates errors in the remaining cells in the chapter 7 notebook. I tried upgrading shap to 0.42.0 which resolved some but not all of the errors.

`estimate = model.estimate_effect(
identified_estimand=estimand,
method_name='backdoor.econml.dml.DML',
method_params={
'init_params': {
'model_y': GradientBoostingRegressor(),
'model_t': GradientBoostingRegressor(),
'model_final': LassoCV(fit_intercept=False),
},
'fit_params': {}}
)

print(f'Estimate of causal effect (DML): {estimate.value}')
`

A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().

Notebook 13 np.np typo

Notebook 13 seems to have a typo of np.np.tril, which causes an error. Changing to np.tril resolves this error.

Small errata for p27 of book (post-June 2023)

Loving that these resources are on GitHub - thank you @AlxndrMlk!

Quick notational suggestion: on p27, the line $X_{sample} = 1.9 < X < -1.9$ doesn't make sense at the moment.
The right-side reads as a boolean, and the condition can never be satisfied, as written.

Would be clearer what's meant if it was "...sampled according to the condition $X < - 1.9$ or $X > 1.9$".

Solving environment not completing

The command conda env create -f causal_book_py39_cuda117.yml never finishes. Tested on Windows and WSL. Anyone else encountering this issue? Found a solution?

Dowhy CausalModel does not have 'causal_estimator' attribute

Both chapter 9 and chapter 10 notebooks have code like effect_pred = model.causal_estimator.effect().
I got an error running them: AttributeError: 'CausalModel' object has no attribute 'causal_estimator'

The book states that it uses DoWhy 0.8 but I am currently using DoWhy 0.10.1 (just want to keep my learning experience up-to-date) but I cannot determine if that's the cause of it. If it is, then how to implement the model on test dataset with current version of DoWhy? If not, then what have I missed?

Thanks!

Environment for M1 Silicon

Hi Alex,

Please find below an environment file that successfully runs GPU related codes in Chapters 11.1 and 14 (as .txt as it won't accept yml - should just be able to change the extension back)
causal_book_py39_for_m1.txt. The changes from the yml provided in your repo are:

remove - nvidia from channels; remove - pytorch and -pytorch-cuda=11.7 from dependencies
add - notebook=6.5 to dependencies

Then replace the set device cell with

# Set device
device = "mps" if torch.backends.mps.is_available() else "cpu"

I still then had to pip install CausalPy once the env was activated.

The full yml as exported by conda is
causal_book_py39_applem1.txt

Notes:

This has only been tested to run on notebooks 11.1 and 14 but I did not closely monitor whether the results were the same. I'm only assuming at this point it should run fine on the other chapters
In notebook 14, "Expert knowledge" section, in the cell after the one with augmented Lagrangian loss objects (first line assert len(dataset_train.batch_size) == 1, "Only 1D batch size is supported"), an errors occurs with message "NotImplementedError: The operator 'aten::triu_indices' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS."