Comments (8)
Hello,
Even with strong L1 regularisation, the output of from_pandas
or from_pandas_lasso
will never be sparse. There will always be values that are small but different to 0.
w_threshold
removes all edges with absolute weights below the value.
If you want to automate the selection, you could use cross-validation to pick the "best-performing" threshold.
The least aggressive approach is to use the threshold that creates a valid DAG (as by the constraint of NoTears)
sm = from_pandas(df, w_threshold=0)
thresh = 0
step=0.01
while not nx.algorithms.is_directed_acyclic_graph(sm):
sm.remove_edges_below_threshold(thresh)
thresh += step
more efficient, looping over actual weights without steps:
sm = from_pandas(df, w_threshold=0)
all_weights =[w for _, _, w in sm.edges(data='weight')
sorted_weights = sorted(all_weights)
for thresh in all_weights:
if nx.algorithms.is_directed_acyclic_graph(sm):
break
sm.remove_edges_below_threshold(thresh)
from causalnex.
Thank you for your careful explanation, I have benefited a lot from it.
from causalnex.
By the way, could you please show me how to use cross-validation to pick the "best-performing" threshold rather than the "worst-performing" threshold. I believe that many people would like to see it in the docs.
Thanks again.
from causalnex.
As you said,we can get a DAG with a min w_threshold, but I need to get a better StructureModel.For exmaple, the docs' first CausalNex tutorial , 'whether a student will pass or fail an exam', set the w_threshold as 0.8. So I want to know which value of w_threshold is good in my dataset.
from causalnex.
Hi @1021808202,
I suggest treating w_threshold
as your hyperparameter, and use tools like hyperopt
along with a specified range of w_treshold to find the best w_treshold to use. Thanks 🙂
from causalnex.
As you said,we can get a DAG with a min w_threshold, but I need to get a better StructureModel.For exmaple, the docs' first CausalNex tutorial , 'whether a student will pass or fail an exam', set the w_threshold as 0.8. So I want to know which value of w_threshold is good in my dataset.
i have the same problem
from causalnex.
Hi @1021808202,
I suggest treating
w_threshold
as your hyperparameter, and use tools likehyperopt
along with a specified range of w_treshold to find the best w_treshold to use. Thanks
i have the same problem:
As you said,we can get a DAG with a min w_threshold, but I need to get a better StructureModel.For exmaple, the docs' first CausalNex tutorial , 'whether a student will pass or fail an exam', set the w_threshold as 0.8. So I want to know which value of w_threshold is good in my dataset.
from causalnex.
The so-called "good" or "correct" graph structure should be validated based on domain knowledge. In this case, you may want to define a structure quality metric based on what you know about the data/domain, and perform grid search (or use hyperopt
as above) on w_threshold
until you find a structure that optimises this metric. A plausible idea to implement this is to use our DAGRegressor
or DAGClassifier
interface together with scikit-learn's GridSearchCV, providing your own custom scoring
function, for example.
Integrating GridSearchCV
or hyperopt
into CausalNex would be beyond the scope of this project, however. As such, I propose we close this issue for now. Nevertheless, feel free to raise a new issue if you still have difficulties in tweaking w_threshold
.
from causalnex.
Related Issues (20)
- Can't install causalnex using poetry on new Apple M1 chip HOT 1
- EMSingleLatentVariable is producing random error at random times HOT 1
- Add GitHub Actions installation jobs across environments HOT 1
- Unsuitability of Notears for causal inference HOT 3
- How do I save the fitted Bayesian model locally HOT 2
- get_target_subgraph function is not working HOT 1
- vis.show() UnicodeEncodeError HOT 9
- Find out the number of cycles HOT 1
- 01_first_tutorial.ipynb hangs on `from_pandas(...)`
- [Feature Request]: Support pandas >= 2.0
- [Bug]: Pycharm cannot use causalnex HOT 1
- [help]: I am unable to display images while using viz. show() HOT 1
- [Feature Request]: batch_size for notears with GPU
- [Bug]: Inconsistent Use of CUDA Devices When Using GPU with notears
- [Bug]: Classification Model always predicting 0 HOT 1
- [Bug]: fix typo in 04_user_guide.md
- [Feature Request]: Support Python 3.11
- [Bug]: causalnex.discretiser.MDLPSupervisedDiscretiserMethod does not import MDLP
- An issue with plotting[Bug]: HOT 1
- Discontinuous time series data HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from causalnex.