Code Monkey home page Code Monkey logo

causica's Introduction

Causica CI Build Open in Dev Containers

Causica

Overview

Causal machine learning enables individuals and organizations to make better data-driven decisions. In particular, causal ML allows us to answer “what if” questions about the effect of potential actions on outcomes.

Causal ML is a nascent area, we aim to enable a scalable, flexible, real-world applicable end-to-end causal inference framework. In perticular, we bridge between causal discovery, causal inference, and deep learning to achieve the goal. We aim to develop technology can automate causal decision-making using existing observational data alone, output both the discovered causal relationships and estimate the effect of actions simultaneously.

Causica is a deep learning library for end-to-end causal inference, including both causal discovery and inference. It implements deep end-to-end inference framework [2] and different alternatives.

This project splits the interventional decision making from observational decision making Azua repo found here Azua.

This codebase has been heavily refactored, you can find the previous version of the code here.

DECI: End to End Causal Inference

Installation

The Causica repo is on PyPI so you can be pip installed:

pip install causica

About

Real-world data-driven decision making requires causal inference to ensure the validity of drawn conclusions. However, it is very uncommon to have a-priori perfect knowledge of the causal relationships underlying relevant variables. DECI allows the end user to perform causal inference without having complete knowledge of the causal graph. This is done by combining the causal discovery and causal inference steps in a single model. DECI takes in observational data and outputs ATE and CATE estimates.

For more information, please refer to the paper.

Model Description

DECI is a generative model that employs an additive noise structural equation model (ANM-SEM) to capture the functional relationships among variables and exogenous noise, while simultaneously learning a variational distribution over causal graphs. Specifically, the relationships among variables are captured with flexible neural networks while the exogenous noise is modelled as either a Gaussian or spline-flow noise model. The SEM is reversible, meaning that we can generate an observation vector from an exogenous noise vector through forward simulation and given a observation vector we can recover a unique corresponding exogenous noise vector. In this sense, the DECI SEM can be seen as a flow from exogenous noise to observations. We employ a mean-field approximate posterior distribution over graphs, which is learnt together with the functional relationships among variables by optimising an evidence lower bound (ELBO). Additionally, DECI supports learning under partially observed data.

Simulation-based Causal Inference

DECI estimates causal quantities (ATE) by applying the relevant interventions to its learnt causal graph (i.e. mutilating incoming edges to intervened variables) and then sampling from the generative model. This process involves first sampling a vector of exogenous noise from the learnt noise distribution and then forward simulating the SEM until an observation vector is obtained. ATE can be computed via estimating an expectation over the effect variable of interest using MonteCarlo samples of the intervened distribution of observations.

How to run

The best place to start is the examples/multi_investment_sales_attribution.ipynb notebook. This explains how to fit a model using PyTorch Lightning and test ATE and ITE results.

For a more detailed introduction to the components and how they fit together, see the notebook examples/csuite_example.ipynb, for how to train a DECI model and check the causal discovery.

This will download the data from the CSuite Azure blob storage and train DECI on it. See here for more info about CSuite datasets. The notebook will work on any of the available CSuite datasets.

Specifying a noise model

The noise exogenous model can be modified by changing the noise_dist field within TrainingConfig, either Gaussian or Spline are allowed.

The Gaussian model has Gaussian exogenous noise distribution with mean set to 0 while its variance is learnt.

The Spline model uses a flexible spline flow that is learnt from the data. This model provides most gains in heavy-tailed noise settings, where the Gaussian model is at risk of overfitting to outliers, but can take longer to train.

Using a known Causal graph

To use DECI to learn the functional relationships, remove the variational distribution terms from the loss and replace the sample with the known graph.

Example using the CLI

An example of how to run a training job with the noise distribution specified in the config src/causica/config/lightning/default_gaussian.yaml and the data configuration specified in src/causica/config/lightning/default_data.yaml:

python -m causica.lightning.main \
    --config src/causica/config/lightning/default_gaussian.yaml --data src/causica/config/lightning/default_data.yaml

Further extensions

For now, we have removed Rhino and DDECI from the codebase but they will be added back. You can still access the previously released versions here.

References

If you have used the models in our code base, please consider to cite the corresponding papers:

[1], (VISL) Pablo Morales-Alvarez, Wenbo Gong, Angus Lamb, Simon Woodhead, Simon Peyton Jones, Nick Pawlowski, Miltiadis Allamanis, Cheng Zhang, "Simultaneous Missing Value Imputation and Structure Learning with Groups", ArXiv preprint

[2], (DECI) Tomas Geffner, Javier Antoran, Adam Foster, Wenbo Gong, Chao Ma, Emre Kiciman, Amit Sharma, Angus Lamb, Martin Kukla, Nick Pawlowski, Miltiadis Allamanis, Cheng Zhang. Deep End-to-end Causal Inference. Arxiv preprint (2022)

[3], (DDECI) Matthew Ashman, Chao Ma, Agrin Hilmkil, Joel Jennings, Cheng Zhang. Causal Reasoning in the Presence of Latent Confounders via Neural ADMG Learning. ICLR (2023)

[4], (Rhino) Wenbo Gong, Joel Jennings, Cheng Zhang, Nick Pawlowski. Rhino: Deep Causal Temporal Relationship Learning with History-dependent Noise. ICLR (2023)

Development

Poetry

We use Poetry to manage the project dependencies, they're specified in the pyproject.toml. To install poetry run:

    curl -sSL https://install.python-poetry.org | python3 -

To install the environment run poetry install, this will create a virtualenv that you can use by running either poetry shell or poetry run {command}. It's also a virtualenv that you can interact with in the normal way too.

More information about poetry can be found here

mlflow

We use mlflow for logging metrics and artifacts. By default it will run locally and store results in ./mlruns.

causica's People

Contributors

agrinh avatar chengzhangmsrc avatar confoundry avatar meyerscetbon avatar microsoft-github-operations[bot] avatar pawni avatar wenbogong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

causica's Issues

Access weighted adjacency matrix of a trained model

Hi team,

I was running DECI on my own dataset. According to examples/multi_investment_sales_attribution.ipynb, I was able to output a binary adjacency matrix from my trained model.

I am wondering if it is possible to access the weighted adjacency matrix the model actually optimized during the training process, instead of only the binary matrix.

Thank you and look forward to your reply!

Impossible to download the data in example multi_investment_sales_attribution.ipynb

Hi,

Inside the notebook example multi_investment_sales_attribution.ipynb I have an error when I try to download the data:

The command:

root_path = "https://azuastoragepublic.blob.core.windows.net/datasets/causal_ai_suite"
df = pd.read_csv(root_path + "/multi_attribution_data_20220819.csv")

The error message:

642 def http_error_default(self, req, fp, code, msg, hdrs):
--> 643 raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: HTTP Error 409: Public access is not permitted on this storage account.

It seems that there is an issue with the data server. Has the address of the data changed?

Thanks a lot for your help

What the active_learning argument means?

I notice there is an active_learning argument here.

However, no explanations or algorithm backends are provided.

What this argument means? Which active learning algorithm it calls?

Add penalty when true CATE is known

Typically, we have some past experiments where we have estimates by (A/B test)
of the true CATE.

When using, DECI, we obtain some CATE_estimate, due to wrong causal graph (ie bias).
Is there a way to add penalty from actual True CATE in the loss so the true Graph can be recovered ?

One possibile way would be (ie very rough idea) :

  1. Generate some graphs
  2. calculate CATE
  3. back-propagate error = (CATE- CATE_true)**2
  4. and re-train model.

Do you have any suggestion on how to adjust the learning of graph
in DECI ?

Results stability of CATE conditioning on categorical variables

was trying to get CATE conditioning on a categorical variable with 5 unique values.
The model was converged to a DAG.
I used
model.cate_rff_n_features = 1000
Nsamples_per_graph=2000,
Ngraphs=1,
most_likely_graph=True

The results varied wildly from run to run.
Can someone help me understand what might be the reasons and how to fix the issue?
Thanks,
Daili

Performance with respect to graph scale

Hello,

Just wodnering if the training can handle
up to 1 million nodes ? or rather 100k nodes ?

Just wondering if there is any scientific methods to reduce the size of the graph ? (ie KMeans aggregation…, wont it create additional issues, co-cofounder merging), what about distance ?

Sorry for many questions, thanks

DECI examples not runable

Hi Causica team,

the shape of the code seems not to be up-to-date, especially for those who would like to try DECI for the first time, following the examples. I understand that DECI is extremely flexible, but it would be nice to have a guide on which dependencies need to be met for which release and also to have checked and working examples. E.g. one cannot import CAUSICA_DATASETS_PATH from the latest release, so it's not clear how to make the examples work.

Thank you in advance for your support!

Optimization of interventions

Hi
Once we have our models and we have determined the important causal drivers of our outcome, we may want to determine the optimal set of interventions. I imagine that this is the step 6 in your diagram:
image

However, in your paper I don't see any details on this step. Given the nonlinearity of your models I imagine there are interaction effects (in sense that the effect of X2 on the endpoint may change with the value of X1) as opposed of the effects being entirely additive. Therefore, we can't optimize on the interventions of multiple variables independently, they need to be jointly optimized.

Does your framework has a way to optimize the interventions of multiple variables efficiently? How would you determine whether there is an interaction effect?
thanks
FKG

Is it possible to use an existing graph as starting point?

Are there functions to take an existing graph and data, fit the functional relationship assuming the structure of the graph and provided data and then then do simulations, causal effects estimates etc? So essentially steps 3-6 on Figure 1 of paper.
image

Accessing generated causal graph(s)

Hi,

I was wondering what's the best way to obtain the learnt causal graphs after training DECI on my dataset. I know there are models stored in best_model.pt and last_model.pt but I'm not sure how to get the causal graphs from there (for the purpose of visualization). Happy to explain further if the question isn't clear and thanks for the great library!

Insufficient References in README.md

The README.md description cites some references like [1,2,6] and [8,3]. But the References section contains only papers [1] and [2], along with two more unnumbered papers.

Constraint derived from calculate_dagness() is infinity

Hi, I am using causica0.2.0 on the Diabetes data set from the BN Learn repository. It has 10k samples, 413 nodes, 602 edges and contains mixed data.

During the first backward prop, I get the error "nan values found" due to the constraint generated from calculate_dagness() function being infinity. I have played around with the following parameters to try and fix the issue but to no avail:

class TrainingConfig:
    noise_dist: ContinuousNoiseDist = ContinuousNoiseDist.GAUSSIAN
    batch_size: int = 100
    max_epoch: int = 1 
    gumbel_temp: float = 0.5
    averaging_period: int = 100
    prior_sparsity_lambda: float = 0
    init_rho: float = 0
    init_alpha: float = 0

training_config = TrainingConfig()
auglag_config = AugLagLRConfig(lr_init_dict={"vardist": 1e-3, "icgnn": 3e-4, "noise_dist": 3e-3})

Do let me know if there's anything I can do to fix this issue, thanks so much :) !

Does Causica support missing data in the training dataset ?

As claimed in the Readme on github, "Additionally, DECI supports learning under partially observed data."
I purposely injected some Nan into pandas dataframe in multi_investment_sales_attribution example, but it's getting lots of training errors complaining missing values. How to get DECI working on real-world data that missing values are inevitable ?

Run time using GPU is vey slow

Hello

I used a Dataset which is a CSV file containing 58 columns and 50000 raws I used 16GO GPU, It' took more than 1 hour to get the causal graph.

Parameter tuning for applying DECI to large graphs

Hello,

Do you have some pieces of advice for tuning the parameters of the DECI method when applied to large graphs for Causal Discovery?

I have tried to apply the DECI method to datasets of simulated graphs with 10, 20, 50 and 100 nodes (with nb edges equal or 4x number of nodes) and different types of nonlinear SEMs (but all with Gaussian Additive Noise).

For all datasets, the training seems to be going correctly (the loss curves are correctly decreasing, there is no numerical warning), so DECI seems to be converging.
For all but 100-nodes graph datasets (and some 50-nodes graphs) I obtain valid graph estimates, more or less correct depending on the situation.
For all 100-nodes graph datasets (and some 50-nodes graphs) I obtain invalid "empty" graphs (i.e. graphs with adjacency matrix made of only 0 elements).

Could you please help me in making DECI work for these 100-nodes graphs?

Here is my setting:

  1. Using the following snippet from the current gcastle package (v1.0.3), I have simulated several datasets of 3000 graphs with 100 nodes (and 100 or 400 edges) and different nonlinear SEMs (gp, quadratic and mlp):
weighted_random_dag = DAG.erdos_renyi(n_nodes=n_nodes, n_edges=n_edges, weight_range=(0.5, 2.0), seed=seed)
dataset = IIDSimulation(W=weighted_random_dag, n=n, method=method_type, sem_type=sem_type)
true_dag, X = dataset.B, dataset.X
  1. I have adapted the source code from examples/multi_investment_sales_attribution.ipynb to process my own datasets.
    So I am using the default parameters + those specified in this example.
    I have only changed the batch size from 1024 to 128 to better fit my datasets containing 3000 samples.

Thank you very much for your help,

Looking for the dataset csuite_fork_collider_nonlin_gauss_latent_confounder

Hello,

I would like to reproduce the results of the D-DECI method (from v0.0.0 version) on the synthetic data in the corresponding paper.
But I cannot find the csuite_fork_collider_nonlin_gauss_latent_confounder dataset.
Neither here in https://github.com/microsoft/causica nor in https://github.com/microsoft/csuite.

Could you please indicate me where I can get this dataset or maybe provide me some source code to generate it by myself?

Thank you very much for your help,

Bests,

Frederic

Relative imports

I'm not sure of the choice to have a sea of relative imports, but just trying to run out of the box:

python run_experiment.py csuite_symprod_simpson --model_type deci_spline --model_config configs/deci/true_graph_deci_spline.json -dc configs/dataset_config_causal_dataset.json -c -te

You get an 'ImportError: attempted relative import with no known parent package'

I get similar errors while trying to generate data:

The CSuite datasets can generated by running `causica/data_generation/csuite/simulate.py` from the top-level directory of the package.

How does this run out of the box? Thanks

Prior setting

Hi team,

I am trying to do causal discovery on my dataset by following the steps in csuite_example. However, I cannot get the reasonable result. The key difference between my use case and the example is the prior setting.

  • I only set 0 in the expert matrix (EM), for example, there shouldn't be any cause to gender so if j is gender, then EM[i,j] = 0. I have set a lot of edges to be 0 in expert matrix and else will be 1.
  • For relevance mask (RM), it is the inverse of expert matrix, EM[i,j] == 0 then RM[i,j] = 1 and EM[i,j] == 1 then RM[i,j] =0.
  • For confidence matrix (CR), it cloned the RM however I think CM can be either 0 or 1 for edges set to be 0 in EM.

All the setting is based on my understanding from below formula:
self._expert_graph_container.mask
* (A - self._expert_graph_container.confidence * self._expert_graph_container.dag)

In the results, I don't have any causes to my dependent variables (I have 6 dependent variables and total 40+ variables). I am wondering:

  1. Is my prior setting wrong? since I only saw the example on setting the edge to be 1 in EM, my use cause is totally different.
  2. If the i is the parent node and j is the child node?
  3. Maybe there is other issue like adjacency_dist, I totally followed the csuite_example for other parts.

Hope you can help on above questions and thanks in advance!

How to make nonlinear predictions on categorical variables in causal inferece given causal graph?

I was learning causal inference and discovery these days and have suffered from this question for a long time.

From my understanding of the literature, causal inference seems quite different from traditional machine learning. For traditional machine learning, once the model is trained, and given a set of X, the model directly predicts Y's value.

However, for the causal inference, the model answers if X1 changes from 1 to 2, for example, it will return the causal effects on Y.

So how can I answer the prediction question using causal inference?

Here are some simulated data using this graph:

import networkx as nx
import matplotlib.pyplot as plt
# Create a directed graph
G = nx.DiGraph()
# Add nodes X, Y, and Z
G.add_nodes_from(['X', 'Y', 'Z'])
# Add edges representing causal relationships
G.add_edge('X', 'Y')
G.add_edge('Z', 'Y')
# Draw the graph
pos = nx.spring_layout(G)
nx.draw_networkx(G, pos, with_labels=True, node_color='lightblue', node_size=500, font_size=12, edge_color='gray')
plt.title('Causal Graph')
plt.show()

image

# Create the nonlinear relationships:
import numpy as np
import pandas as pd

# Generate X values
X = np.linspace(0, 10, 100)

# Generate Z values
Z = np.linspace(10, 20, len(X))

# Generate Y values using a non-linear relationship with X
Y = np.sin(X) + np.cos(Z) + np.random.normal(0, 0.1, len(X))

# Combine X, Z, Y into one pandas frame
# Combine X, Z, Y into one pandas DataFrame
df = pd.DataFrame({'X': X, 'Z': Z, 'Y': Y})

# Print the DataFrame
print(df)

X Z Y 0 0.00000 10.00000 -0.781419 1 0.10101 10.10101 -0.691126 2 0.20202 10.20202 -0.603684 3 0.30303 10.30303 -0.418206 4 0.40404 10.40404 -0.087543 .. ... ... ... 95 9.59596 19.59596 0.748085 96 9.69697 19.69697 0.455545 97 9.79798 19.79798 0.365235 98 9.89899 19.89899 0.023566 99 10.00000 20.00000 -0.193462

[100 rows x 3 columns]

# plot the data
import matplotlib.pyplot as plt

# Plot X, Y, and Z
plt.plot(df['X'], label='X')
plt.plot(df['Y'], label='Y')
plt.plot(df['Z'], label='Z')

# Add labels and legend
plt.xlabel('Index')
plt.ylabel('Value')
plt.legend()

# Show the plot
plt.show()

image

The problem is:

How to make predictions on Y when X = 10, Z = 20 pretending that you only know the causal graph but not the detailed causal function?

I have tried using microsoft causia to identify the causal graph. And also the causal inference, but they are not prediction problems.

Model not saving when using distributed training

Hi -

I am trying to use the DECIModule with distributed training on 4 GPUs. Saving the model in the distributed case yields an empty module whereas when training on one GPU I save the anticipated model with sem_module.

Is there a known approach to saving the model properly in the distributed training case? I am using the pytorch default DDP strategy.

   trainer = pl.Trainer(
        accelerator="gpu",
        devices=4, # distribute training
        max_epochs=1000,
        fast_dev_run=test_run,
        callbacks=[TQDMProgressBar(refresh_rate=19), checkpoint_callback],
        enable_checkpointing=True,
    )

    # Training the model
    trainer.fit(lightning_module, datamodule=data_module)
    torch.save(lightning_module.sem_module, "model.pt")

Version mismatch? example codes not working

When I pip install causica it installs correctly but then have various issues with both the sample codes.

for the csuite_example.ipynb when I run the first cell I get

`

ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 10
7 from tensordict import TensorDict
8 from torch.utils.data import DataLoader
---> 10 from causica.datasets.causica_dataset_format import DataEnum, load_data
11 from causica.datasets.tensordict_utils import tensordict_shapes
12 from causica.distributions import (
13 AdjacencyDistribution,
14 ContinuousNoiseDist,
(...)
20 create_noise_modules,
21 )

ModuleNotFoundError: No module named 'causica.datasets.causica_dataset_format'
`

Similarly when I run the multi_investment_sales_attribute.ipynb I get

`

ImportError Traceback (most recent call last)
Cell In[2], line 18
15 from pytorch_lightning.callbacks import TQDMProgressBar
16 from tensordict import TensorDict
---> 18 from causica.distributions import (
19 ContinuousNoiseDist,
20 SEMDistributionModule,
21 )
22 from causica.lightning.data_modules.basic_data_module import BasicDECIDataModule
23 from causica.lightning.modules.deci_module import DECIModule

ImportError: cannot import name 'ContinuousNoiseDist' from 'causica.distributions' (/opt/homebrew/lib/python3.8/site-packages/causica/distributions/init.py)
`

Please suggest changes ?

CATE: conditioning on categorical variable

Thanks for the awesome library.
However, when I tried to get CATE conditioning on a categorical variable, it keeps throwing errors.
Can some one help provide an example of the inputs for the model.CATE() with conditioning on a categorical variable?
The variable gender_enc can take 3 values. I tried conditioning_values=conditioning_values.reshape((1,3)) or .reshape((3,1)), or reshape(-1).
It either ran to the error 1 or error 2 (please see the end)

`# conditioning
conditioning_idxs = dataset.variables.name_to_idx['gender_enc']
conditioning_idxs = np.array([conditioning_idxs])
conditioning_values=model.data_processor.process_data_subset_by_group(np.array([[0]]),conditioning_idxs)
print(conditioning_values.shape)
# conditioning_values=conditioning_values.reshape((1,3))
# conditioning_values=np.array([1])
print(conditioning_values)
print(conditioning_values.shape)

outcome_cols='retention'
outcome_idxs=dataset.variables.name_to_idx[outcome_cols]
effect_idxs=np.array([outcome_idxs])

model.cate_rff_n_features = 100

for treatment in ['seasonaldecor_purchase']:
    treatment_idxs = dataset.variables.name_to_idx[treatment]
    intervention_idxs=np.array([treatment_idxs])
    intervention_values=model.data_processor.process_data_subset_by_group(np.array([nodes_sel_val[treatment][1]]),intervention_idxs)
    reference_values=model.data_processor.process_data_subset_by_group(np.array([nodes_sel_val[treatment][1]]),intervention_idxs)

    ate = model.cate(
        intervention_idxs=intervention_idxs,
        intervention_values=intervention_values,
        reference_values=reference_values,
        effect_idxs=effect_idxs,
        conditioning_idxs=conditioning_idxs,
        conditioning_values=conditioning_values,
        Nsamples_per_graph=100,
        Ngraphs=1,
        most_likely_graph=True,
    )
    causica_estimated_ate[treatment] = ate[0][0]
    print(f"{treatment}: {causica_estimated_ate[treatment]}")`

Error1

image

Error2

image

Thanks a lot.
Regards,
Daili

Run with my own dataset

Hi,

I'm wondering if there is a guide on how to run experiment with my own dataset?

Thank you so much.

How to test the performance of Rhino?

Hi there.

I wonder which way to test the Rhino model is more reasonable?

Namely:

  • Should I use the same full time causal graph to generate all time series in the training set?
  • Should I use an extra test set, or should I directly test the performance (e.g. F1 score between the recovered graph and the ground-truth graph) on the training set?
  • If an extra test set is required, should I use the same or different causal graph to generate data for the training and test set?

Putting back the initial high-level DECI API

Dear Causica Dev Team,

When can the re-introduction of the original (v0.0.0) high-level DECI API be expected?
The recent low-level API makes it much harder to start a project and more difficult to tune up everything manually.
I loved the high-level API, especially the part for generating counterfactual estimates. I don't know how to calculate counterfactuals with the low-level API (existing tutorials do not cover it, as far as I saw).
Thank you!

AttributeError: 'DECIModule' object has no attribute 'sem_module'

Hi,

Thank you for the great work. I was going through the sample code in multi_investment_sales_attribution.ipynb and everything seemed to be running smoothly during training. However, I encountered an error when executing the torch.save(lightning_module.sem_module, "deci.pt").
AttributeError: 'DECIModule' object has no attribute 'sem_module'.
Would you please help troubleshooting the error ? thanks

I installed causica with pip, seems the latest release version is 0.3.4.

DECI Examples

Hello,
Really amazing work!!.
Small issue - is it possible to fix the notebooks under the DECI examples so it will be possible to run them?
Currently there are several issues that prevent such run:

  1. metrics_logger not defined under the run_context
  2. model run_train doesnt accept run_context=run_context
  3. mode_f_sem is not supported under model_config when running DECI()
  4. RunContext has no attribute get (I suspect that since "evaluation_pipline.aml_run_context" is not available there is a mismatch in RunContext functionality )

Many thanks

Transfer Learning and Anomaly Attribution

Hey,

Causica looks super promising, thanks for making open!
I've been playing around for a few days now trying to get a good grasp of it, and I there a couple points that are a bit unclear for me:

  1. How could I do transfer learning/fine-tune. I've trained a model with the entirety of the data, but I'd like to fine tune it for a single customer that potentially doesnt have enough data to train it only on his data.
  2. How could I do anomaly attribution or distributional changes attribution. ie something changed could the model tell me what was the root cause of it? So far the only way I could find was to take the causica-generated graph to dowhy to perform this, but this means, I think, that I'd loose the forward methods as dowhy does not work with them.

Looking forward to hearing any suggestions!

Cheers,
Jason

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.