bd2kccd / py-causal Goto Github PK

View Code? Open in Web Editor NEW

204.0 204.0 50.0 53.42 MB

License: Other

Python 2.73% Batchfile 0.01% Shell 0.01% Jupyter Notebook 97.26%

py-causal's People

Contributors

Stargazers

Watchers

Forkers

harryhoch joshuaburkhart ghostintheshellarise aligator4sah pittmiqi bacalfa diviyank chrisji xunzheng eamonnmag wangqianwen0418 mrmoore98 meghasin jabogithub jbdatascience arcade0 fagan2888 gakkilovemath eycab yanlirock chengweiclarence cognoscentai richardaviles vishalbelsare patbruckl dscausality xwbxxx zhangtj1996 renhongjia nezha3 drugintelligence black-swan-icl ypeng22 andersgmadsen kt8506 dmastrovito liangcao2018 taposh sera91 manmeet3591 0r miyoshi-shg gyeongeunlee16 scriddie jimmy-inl zhangcun-yan mhdehshiri

py-causal's Issues

java : heap space

Is there any way to change the heap size when I run a code from jupyter?
I am working on GFCI and trying the testId = 'kci-test' (and scoreId = 'mvp-bic-score') and given that my dataset is consists of 120 variables I run into a that heap space error.

Here is my code:

tetrad.run(algoId = 'gfci', dfs = df, testId = 'kci-test',alpha = 0.05, kernelType =2, scoreId = 'mvp-bic-score',

       maxDegree = 3, maxPathLength = -1, 
       completeRuleSetUsed = False, faithfulnessAssumed = True, verbose = True,
        resamplingEnsemble = 1, addOriginalDataset = True, colliderDiscoveryRule =2)

Problem with dataset size

Hi, can I get help with this question that was asked in the CCD help desk. The user is working with the py-causal implementation.
Thanks.
I paste it below:

Also, I have a large (8GB) csv file that is able to eventually break your code:

Exception in thread "Thread-0" java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869)
at edu.cmu.tetrad.io.AbstractContinuousDataReader.extractVariables(AbstractContinuousDataReader.java:75)
at edu.cmu.tetrad.io.AbstractContinuousDataReader.analyzeData(AbstractContinuousDataReader.java:59)
at edu.cmu.tetrad.io.TabularContinuousDataReader.readInData(TabularContinuousDataReader.java:62)

reference / acknowledgement

I'm probably just not seeing it, but any chance you could put a reference in the readme so we can cite the work (and/or point me to one)? : ]

Thanks!!

Parameter learning/CPT tables

Hi,

Are there code examples of parameter learning in py-causal given a DAG and data? I see that it is possible in Tetrad.

Thanks!

Bootstrapping FGES Continuous in Action.ipynb: sem-bic-score changed to sem-bic

In the example for Bootstrapping FGES Continuous in Action.ipynb .
The sem-bic-score is not on the list, sem-bic is used instead.

JavaException: String index out of range: 2

I am working on medical dataset, approximatly 120 discrerte variables and I would like to run the GFCI algorithm with bootstraping and no prior knowledge.

tetrad.run(algoId = 'gfci', dfs = df, testId = 'disc-bic-test', scoreId = 'bdeu-score',
priorKnowledge = None, dataType = 'discrete',
maxDegree = 3, maxPathLength = -1,
completeRuleSetUsed = False, faithfulnessAssumed = True, verbose = True,
numberResampling = 5, resamplingEnsemble = 1, addOriginalDataset = True)

Talking this error :JavaException: String index out of range: 2
Can you help me find the error?

Thank you,
Angela

Different output compared to Tetrad GUI

I'm getting different graphs compared to using the Tetrad-GUI using the same algorithm on the example files (FOFC Continuous), when using the same parameters (alpha=0.01 all else False).

This also occurs with my dataset where I get 5 latent variables on the GUI and 3 using python.

Inconsistent notebook example output

When I run py-causal/example/py-causal - Bootstrapping GFCI Discrete in Action.ipynb, all edges printed are of score 1.000, like this:

Graph Edges:

air --> speech [air --> speech dd nl]:1.0000; dd nl

ar_c --> o_ar_u [ar_c --> o_ar_u dd nl]:1.0000; dd nl

ar_u --> o_ar_u [ar_u --> o_ar_u dd nl]:1.0000; dd nl

bone --> air [air <-- bone dd nl]:1.0000; dd nl

class --> o_ar_c [class --> o_ar_c dd nl]:1.0000; dd nl

history_buzzing --> history_fullness [history_buzzing --> history_fullness dd nl]:1.0000; dd nl

history_buzzing --> m_m_sn_gt_2k [history_buzzing --> m_m_sn_gt_2k dd nl]:1.0000; dd nl

history_dizziness --> history_nausea [history_dizziness --> history_nausea dd nl]:1.0000; dd nl

history_fluctuating --> m_sn_lt_1k [history_fluctuating --> m_sn_lt_1k dd nl]:1.0000; dd nl

history_fluctuating --> static_normal [history_fluctuating --> static_normal dd nl]:1.0000; dd nl

Rollback to v1.1.1 gives the right output.
Could you please fix the example?

Building from source: "Could not initialize class edu.cmu.tetrad.util.TetradLogger"

from pycausal import pycausal as pc
from pycausal import search as s
import pandas as pd
import pydot
df = pd.read_table("data/audiology.txt", sep="\t")
pc.start_vm(java_max_heap_size='100M')
fges = s.fgesDiscrete(df, structurePrior=1.0, samplePrior=1.0, maxDegree=3, faithfulnessAssumed=True, verbose=True, numofthreads=2)

yields:

JavaException: Could not initialize class edu.cmu.tetrad.util.TetradLogger

Any thoughts? I should mention that I am running this within a Singularity container on Arch Linux with:

pacman -S --noconfirm jdk8-openjdk
conda install -y pandas numpy pydot graphviz nomkl javabridge ipykernel

Bug

Hi, also from the CCD help desk they report the next bug using py-causal.

"BTW, there's some bugs in stop_vm/start_vm, as in you can't stop and restart to clear memory."

Any ideas ?

Thanks

Ruben

add api documentation

Need to have some documentation of the capabilities of the API

Both Examples have errors

When trying to run the examples both of them throws errors.
python py-causal-fges-continuous-example.py
python py-causal-fges-discrete-example.py

For what I saw in the discrete case there is one error because the vm is used before it is started.
Both examples have errors related with the datasets:
audiology.txt
charity.txt

But they work with the other datasets avaiable:
sim_discrete_data_20vars_100cases.txt (when fixed the vm order)
and all in the 9var folder.

clarify instructions

Not sure what is meant by the comment at the end of the README about pydot. That's not part of tetrad, is it?

JavaException: Sorry, I was expecting a discrete data set.

My dataset is consists of 7000 patients and 121 variables and and would like to apply the pc algorithm so I choose
algoId = 'pc-all', dfs = df, testId = 'chi-square-test'. All my variables are discrete. Please have a look
varibele1 variable2 variable3 variable4 ... variable5 variable6 variable7 variable8 ....etc
0 18 11 15 9 ... 0.0 0.0 3.0 3.0
1 17 97 47 93 ... 0.0 0.0 3.0 3.0
2 14 56 30 60 ... 0.0 0.0 1.0 2.0

When I run the code this is the error that I take:
JavaException: Sorry, I was expecting a discrete data set.
Exception in thread "Thread-0" java.lang.IllegalArgumentException: Sorry, I was expecting a discrete data set.
at edu.cmu.tetrad.data.DataUtils.getDiscreteDataSet(DataUtils.java:1888)
at edu.cmu.tetrad.algcomparison.independence.ChiSquare.getTest(ChiSquare.java:30)
at edu.cmu.tetrad.algcomparison.algorithm.oracle.pattern.PcAll.search(PcAll.java:91)

Could you please help me?

need to install javabridge

should mention javabridge as a dependency

Issue in installing due to javabrige

Hi,

I've tried various ways to install py-causal but it is giving problems due to javabridge incorrect installation.
I see other users also facing this issue. Can you please provide a way by which we can install??
I don't matter if you can share web-app, that works as well, but the web-app link doesn't seem to be working, can you please update that.

Either one should do.

Thank you

Is it parallel?

I'm trying on a really large dataset and checking the resources usage. Apparently it is using only one core. is it possible to set it to use all cores and make it faster?

BIC score for a graph, given data

Hi,

I'm trying to get a BIC score on a graph given data but most of the time, print('Graph BIC: {}'.format(graph.getAttribute('BIC'))), returns None. I saw the thread about FCI algm returning None if graph returned was a PAG, but thought it was fixed?
On another note, is it possible to get a BIC score on any graph (not just one that was learned), given data? Such as in:
http://bnlearn.com/examples/score/

Thanks!

Issue with background knowledge

I cannot get background knowledge to work without error. I get an error on line 31 in pycausal/prior.py:

prior = javabridge.JClassWrapper('edu.cmu.tetrad.data.Knowledge2')()

When the above line is called I get the following error (in quotes):

" File "/usr/local/lib/python2.7/dist-packages/pycausal/prior.py", line 31, in knowledge
prior = javabridge.JClassWrapper('edu.cmu.tetrad.data.Knowledge2')()
File "/usr/local/lib/python2.7/dist-packages/javabridge/wrappers.py", line 182, in init
STATIC = J.get_static_field("java/lang/reflect/Modifier", "STATIC", "I")
File "/usr/local/lib/python2.7/dist-packages/javabridge/jutil.py", line 979, in get_static_field
klass = env.find_class(class_name)
AttributeError: 'NoneType' object has no attribute 'find_class' "

Issue with running FGES

Hi I am using Ubuntu and I installed using pip, however I get the following error:

JavaException Traceback (most recent call last)
in ()
1 from pycausal import search as s
----> 2 fges = s.fges(df,penaltyDiscount = 2, maxDegree = -1, faithfulnessAssumed = True, verbose = True, priorKnowledge = prior)

/usr/local/lib/python2.7/dist-packages/pycausal/search.pyc in init(self, df, penaltyDiscount, maxDegree, faithfulnessAssumed, verbose, priorKnowledge, numBootstrap, ensembleMethod)
454 score.setPenaltyDiscount(penaltyDiscount) # set to 2 if variable# <= 50 otherwise set it to 4
455
--> 456 fges = javabridge.JClassWrapper('edu.cmu.tetrad.search.Fges')(score)
457 fges.setMaxDegree(maxDegree)
458 fges.setNumPatternsToStore(0)

/usr/local/lib/python2.7/dist-packages/javabridge/wrappers.pyc in call(self, *args)
349 args_sig = "".join(map(sig, params))
350 msig = "(%s)V" % (args_sig)
--> 351 result = J.make_instance(self.cname, msig, *cargs)
352 result = JWrapper(result)
353 return result

/usr/local/lib/python2.7/dist-packages/javabridge/jutil.pyc in make_instance(class_name, sig, *args)
1726 jexception = get_env().exception_occurred()
1727 if jexception is not None:
-> 1728 raise JavaException(jexception)
1729 return result
1730

JavaException: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created

Is it possible to get the edge scores?

Hello,
I'm using pycausal in a real dataset and using FGES. I was able to generate the graph and extract the nodes and edges. If i use the bootstrap i can get the bootstrap score, but i have a huge amount of data, and using bootstrap is making it slow.
I know that when building the graph is necessary to run the scores of the edges (in my case ''sem-bic'").
Is it possible to retrieve this information?
My graph is too complex and i'd like to use this information to choose which edges i can remove.
I'm using the pycausal module, but if necessary I can do it with the javabridge like the examples you posted.
There are any other methods to reduce the graph?

Thank you,

java imports are broken in pycausal

On trying to run any of the models, it fails due to incorrect imports of the Data utilities

Python-Javabridge calling directly Tetrad FGS Example.ipynb

this file needs some additional explanation - why would a user want to call directly?

installation instructions

should probably indicate that pydot and pandas must be installed.

Prior Knowledge Module not Registering all Forbidden Variables

Regards,

I am using pycausal in:

Anaconda's distribution of Python 3.7.
Windows 8.1

Everything is working superbly. However, I am having issues with the Prior Knowledge module.

My dataframe has:

Two independent variables named "y1" and "y2". There is no causal relationship between the variables, and, temporally, they both precede the dependent variables that are listed below;
19 dependent variables named "v1" to "v19". While they are interesting, at this stage I do not seek to discover the interrelationships between these dependent variables. I am only interested in their relationships to "y1" and "y2".

I fed this prior knowledge into pycausal as follows:

# Prior Knowledge

from pycausal import prior as pk

forbidden = [['v1', 'v2', 'v3', 'v4', 'v5', 'v6', 'v7',
              'v8', 'v9', 'v10', 'v11', 'v12', 'v13, 'v14',
              'v15', 'v16', 'v17', 'v18', 'v19']]

tempForbid = pk.ForbiddenWithin(['y1', 'y2'])

temporal = [tempForbid, ['v1', 'v2', 'v3', 'v4', 'v5', 'v6', 'v7',
              'v8', 'v9', 'v10', 'v11', 'v12', 'v13, 'v14',
              'v15', 'v16', 'v17', 'v18', 'v19']]

prior = pk.knowledge(forbiddirect = forbidden, addtemporal = temporal)
prior

However, not all, i.e. just two (of), the variables are being registered as the "forbiddirect" section of the following output shows:

Instance of edu.cmu.tetrad.data.Knowledge2: /knowledge
addtemporal

1* y1 y2
2  v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18 v19

forbiddirect
v1 v2

requiredirect

Can you advise if there is anything wrong with the approach I am taking to code in prior knowledge?

Everything else works well, the subsequent parts my search, causal graph output etc work beautifully.

However, I just don't want to waste valuable time computing causal relationships (between "v#" variables) I am not interested in (my dataset is fairly big and I am using the bayesian search).

How to specify intervention variables in py-causal

Can you provide an example or any information on how to specify intervention variables using py-causal?

Is it possible to get the model scores?

It is displayed in the printed info but I was wondering how I could get it from the tetradrunner() object?

I am running continuous fGES with 'sem-bic-score'. Thanks in advance!

Bug with using LiNGAM

I'm using the latest version of the package (1.2.1) and have come across what I think must be a bug when trying to use LiNGAM. I originally noticed this when looking at my own data, but this minimal script demonstrates the problem:

import numpy as np
import pandas as pd
import pycausal.pycausal
import pycausal.search


if __name__ == "__main__":
    pc = pycausal.pycausal.pycausal()
    pc.start_vm()

    tetrad = pycausal.search.tetradrunner()

    rng = np.random.default_rng(1234)
    X = rng.normal(5, 1.5, 1000)
    Y = X + rng.uniform(-2, 2, 1000)

    df = pd.DataFrame({
        "X": X,
        "Y": Y,
    })
    df.to_csv("~/two_var.csv", index=False)

    tetrad.run(
        dfs=df, verbose=True, algoId="lingam",
        penaltyDiscount=1,
    )
    print(tetrad.getEdges())

    pc.stop_vm()

This gives the output:

graph Returning this graph: Graph Nodes:
X;Y

Graph Edges:
1. Y --> X

Graph Attributes:
BIC: -1110.654178

Graph Node Attributes:
BIC: [X: 155.157065;Y: -1265.811243]

['Y --> X']

I expected the discovered edge to be the opposite of this. When I run this file with causal-cmd, I see:

~$ java -jar causal-cmd-1.1.3-jar-with-dependencies.jar --algorithm lingam --penaltyDiscount 1 --data-type continuous --dataset ~/two_var.csv --delimiter comma --skip-latest
Jun 24, 2020 4:42:31 AM java.util.prefs.FileSystemPreferences$6 run
WARNING: Prefs file removed in background /home/ubuntu/.java/.userPrefs/prefs.xml
W = 
       V1       V2
  -0.3731  -0.9278
   0.9278  -0.3731

WTilde before normalization = 
       V1       V2
  -0.3731   0.9278
  -0.9278  -0.3731

WTilde after normalization = 
       V1       V2
   1.0000  -0.9278
   2.4867   1.0000

B = 
       V1       V2
            0.9278
  -2.4867         

BTilde = 
       V1       V2
            0.9278
  -2.4867         

graph Returning this graph: Graph Nodes:
X;Y

Graph Edges:
1. X --> Y

Graph Attributes:
BIC: -1110.654178

Graph Node Attributes:
BIC: [X: -849.681780;Y: -260.972398]

as expected. When I run the same file in the Tetrad GUI (explicitly setting penaltyDiscount = 1), the graph edges match the output from causal-cmd and the BIC values exactly match. I expected the same results from using py-causal – am I doing something wrong here? I've not noticed discrepencies between Tetrad and py-causal when using other algorithms.

Issue when running the given examples by my own data

Hi! When running "Calling Directly Py-Causal TsGFCI Continuous Example.ipynb", an error has occurred:

JavaException                             Traceback (most recent call last)
<ipython-input-19-98e1ee2644e7> in <module>
----> 1 tetradGraph = tsgfci.search()
      2 tetradGraph

c:\users\sx-liang\appdata\local\programs\python\python37\lib\site-packages\javabridge\wrappers.py in <lambda>(*args)
     67             if name not in methods:
     68                 methods[name] = []
---> 69                 fn = lambda naame=name: lambda *args: self.__call(naame, *args)
     70                 fn = fn()
     71                 fn.__doc__ = J.to_string(jmethod)

c:\users\sx-liang\appdata\local\programs\python\python37\lib\site-packages\javabridge\wrappers.py in __call(self, method_name, *args)
    153             rsig = sig(rtype)
    154             msig = "(%s)%s" % (args_sig, rsig)
--> 155             result =  J.call(self.o, method_name, msig, *cargs)
    156             if isinstance(result, J.JB_Object):
    157                 result = JWrapper(result)

c:\users\sx-liang\appdata\local\programs\python\python37\lib\site-packages\javabridge\jutil.py in call(o, method_name, sig, *args)
    890     ret_sig = sig[sig.find(')')+1:]
    891     nice_args = get_nice_args(args, args_sig)
--> 892     result = fn(*nice_args)
    893     x = env.exception_occurred()
    894     if x is not None:

c:\users\sx-liang\appdata\local\programs\python\python37\lib\site-packages\javabridge\jutil.py in fn(*args)
    857             x = env.exception_occurred()
    858             if x is not None:
--> 859                 raise JavaException(x)
    860             return result
    861     else:

JavaException: Index -1 out of bounds for length 0

My data is attached, and these data could be run in KCI test.

Could you help me fix it? Thanks in advance!

Regression_uniform1_X.txt

Issue with anaconda installation

The anaconda install instructions did not work for me perhaps because python-javabridge is imported instead of javabridge. Here is what I ran (in quotes):

"~/ conda install -c https://conda.anaconda.org/chirayu pycausal"

and here is the output (in quotes):

"Fetching package metadata ...........
Solving package specifications:

PackageNotFoundError: Dependency missing in current linux-64 channels:

pycausal -> python-javabridge

Close matches found; did you mean one of these?

python-javabridge: javabridge"

How to handle missing values in data?

Hello, I am uncertain how to properly specify missing continuous data in my pandas data frame for use by pycausal search. Specifically, if I am trying to search over multiple data sets and some data sets do not have values for some variables. Is this possible? I have tried numpy.nan, "?", "*" as well as empty pandas.Series

I receive an error like:

Exception in thread "Thread-0" edu.pitt.dbmi.data.reader.DataReaderException: Insufficient data on line 2. Extracted 35 value(s) but expected 44.
at edu.pitt.dbmi.data.reader.tabular.TabularDataFileReader.determineDiscreteDataColumns(TabularDataFileReader.java:178)
at edu.pitt.dbmi.data.reader.tabular.MixedTabularDatasetFileReader.readInData(MixedTabularDatasetFileReader.java:74)
at edu.pitt.dbmi.data.reader.tabular.MixedTabularDatasetFileReader.readInData(MixedTabularDatasetFileReader.java:55)
Traceback (most recent call last):
File "", line 1, in
File "/home/dana.mastrovito/.conda/envs/pytorch1.7.0/lib/python3.8/site-packages/pycausal/search.py", line 250, in run
dataset = pc.loadMixedData(df, numCategoriesToDiscretize)
File "/home/dana.mastrovito/.conda/envs/pytorch1.7.0/lib/python3.8/site-packages/pycausal/pycausal.py", line 128, in loadMixedData
tetradData = dataReader.readInData()
File "/home/dana.mastrovito/.conda/envs/pytorch1.7.0/lib/python3.8/site-packages/javabridge/wrappers.py", line 69, in
fn = lambda naame=name: lambda *args: self.__call(naame, *args)
File "/home/dana.mastrovito/.conda/envs/pytorch1.7.0/lib/python3.8/site-packages/javabridge/wrappers.py", line 155, in __call
result = J.call(self.o, method_name, msig, *cargs)
File "/home/dana.mastrovito/.conda/envs/pytorch1.7.0/lib/python3.8/site-packages/javabridge/jutil.py", line 892, in call
result = fn(*nice_args)
File "/home/dana.mastrovito/.conda/envs/pytorch1.7.0/lib/python3.8/site-packages/javabridge/jutil.py", line 859, in fn
raise JavaException(x)
javabridge.jutil.JavaException: Insufficient data on line 2. Extracted 35 value(s) but expected 44.

JVM size?

Just got an out of memory error on a moderately sized dataset. ~ 250 rows, 17K columns. Is there any way to adjust the memory that pycausal allots for the JVM?

Failed to get Delimiter class

Hi,

I just tried to run the latest version (from github) on a dataset, and got some strange behavior -- on a subset of columns it works fine, but on some other columns I get this error:


~/.virtualenvs/greed/lib/python3.6/site-packages/pycausal/search.py in run(self, algoId, dfs, testId, scoreId, priorKnowledge, dataType, numCategoriesToDiscretize, **parameters)
    202             if dataType == 'continuous':
    203                 if 'bootstrapSampleSize' in parameters and parameters['bootstrapSampleSize'] > 0:
--> 204                     tetradData = pc.loadContinuousData(dfs, outputDataset = True)
    205                 else:
    206                     tetradData = pc.loadContinuousData(dfs)

~/.virtualenvs/greed/lib/python3.6/site-packages/pycausal/pycausal.py in loadContinuousData(self, df, outputDataset)
    156             # Read Data from File
    157             f = javabridge.JClassWrapper('java.io.File')(temp_data_path)
--> 158             delimiter = javabridge.get_static_field('edu/pitt/dbmi/data/Delimiter','TAB','Ledu/pitt/dbmi/data/Delimiter;')
    159             dataReader = javabridge.JClassWrapper('edu.pitt.dbmi.data.reader.tabular.ContinuousTabularDataFileReader')(f,delimiter)
    160             tetradData = dataReader.readInData()

~/.virtualenvs/greed/lib/python3.6/site-packages/javabridge/jutil.py in get_static_field(klass, name, sig)
    982         if klass is None:
    983             jexception = get_env().exception_occurred()
--> 984             raise JavaException(jexception)
    985     field_id = env.get_static_field_id(klass, name, sig)
    986     if field_id is None:

JavaException: edu/pitt/dbmi/data/Delimiter

1

loadDiscreteData doesn't work on Python 3

Need to debug it!

revise README

A couple of small points

can we say IPython/Jupyter so users understand that Jupyter tools are fine?
Can we clarify which instructions go with which? it's hard to read.
A full script (.py) file might be helpful for folks who don't want to install Jupyter.

Different causal graphs for the same algorithm parameters

I am trying to infer causal graph using PC All algorithm using the following command:-

tetrad.run(algoId = 'pc-all', dfs = df, testId = 'cg-lr-test', alpha = 0.01,
dataType = 'mixed', numCategoriesToDiscretize = 7,
discretize = False, concurrentFAS = True, maxPathLength = 0,
conflictRule = 1, colliderDiscoveryRule = 1)

Executing the above command yields a different causal graph each time. Is there is some randomness involved? If so, is there any seed value that we can set so that we get the same result each time?

Furthermore, is there any documentation to read more about different types of parameters that can be passed and what they mean?

Error with PC-All - JavaException: Sorry, I was expecting a discrete data set.

When attempting the code below, I get this error about the data set.

However, I've sucessfully ran FGES discrete (with discrete score-types) for the same data set, so I'm wondering what is the problem.

--- Code ---

'''
df reads a file like this (with more lines obviously)

Cancer,Dyspnoea,Pollution,Smoker,Xray
1,1,0,1,1
1,1,0,1,0
1,1,0,1,1
1,1,0,1,1
explicitly make sure that all values are ints
'''
int_df = df.astype(int)

ind_test = chi-square-test

tetrad.run(algoId = 'pc-all', dfs = int_df, testId = ind_test,
colliderDiscoveryRule = 1, fasRule = 2, depth = 2,
stableFAS = True ,conflictRule = 1, concurrentFAS = True,
useMaxPOrientationHeuristic = True, verbose = True)

cmu-phil/tetrad

Results changed in v1.1.1

Has the usage changed in v1.1.1?

When changed from v1.1.0 to v1.1.1, the result of the following example changed:
https://github.com/bd2kccd/py-causal/blob/development/example/py-causal%20-%20Bootstrapping%20GFCI%20Continuous%20in%20Action.ipynb

The result of `tetrad.getEdges()` in v1.1.0:

['AmountDonated o-o Sympathy [AmountDonated o-o Sympathy]:0.4000;[no edge]:0.6000;',
'Sympathy o-> Impact [Impact <-o Sympathy]:1.0000;',
'Sympathy --> TangibilityCondition [Sympathy --> TangibilityCondition nl pd]:1.0000; nl pd']

The result of `tetrad.getEdges()` in v1.1.1:

['AmountDonated o-o Sympathy',
'Sympathy o-> Impact',
'Sympathy --> TangibilityCondition']

Example code throws java errors

On a fresh install, I get the exception JavaException: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created when I run py-causal-continuous-example.py

Details
OS: Arch linux 64 bit

Java:
openjdk version "1.8.0_141"
OpenJDK Runtime Environment (build 1.8.0_141-b15)
OpenJDK 64-Bit Server VM (build 25.141-b15, mixed mode)

Python version: 2.7.13

Stacktrace:

Exception in thread "Thread-0" javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
        at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
        at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
        at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
        at java.util.prefs.XmlSupport.loadPrefsDoc(XmlSupport.java:241)
        at java.util.prefs.XmlSupport.importMap(XmlSupport.java:375)
        at java.util.prefs.FileSystemPreferences$6.run(FileSystemPreferences.java:575)
        at java.util.prefs.FileSystemPreferences$6.run(FileSystemPreferences.java:568)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.util.prefs.FileSystemPreferences.loadCache(FileSystemPreferences.java:567)
        at java.util.prefs.FileSystemPreferences.initCacheIfNecessary(FileSystemPreferences.java:550)
        at java.util.prefs.FileSystemPreferences.getSpi(FileSystemPreferences.java:527)
        at java.util.prefs.AbstractPreferences.get(AbstractPreferences.java:287)
        at java.util.prefs.AbstractPreferences.getBoolean(AbstractPreferences.java:500)
        at edu.cmu.tetrad.util.TetradLogger.<init>(TetradLogger.java:65)
        at edu.cmu.tetrad.util.TetradLogger.<clinit>(TetradLogger.java:108)
        at edu.cmu.tetrad.search.Fges.<init>(Fges.java:108)
Caused by: java.lang.RuntimeException: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
        at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:308)
        ... 15 more
Caused by: java.util.ServiceConfigurationError: javax.xml.parsers.DocumentBuilderFactory: Provider org.apache.xerces.jaxp.DocumentBuilderFactoryImpl not found
        at java.util.ServiceLoader.fail(ServiceLoader.java:239)
        at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
        at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:372)
        at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
        at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
        at javax.xml.parsers.FactoryFinder$1.run(FactoryFinder.java:294)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:289)
        ... 15 more
Traceback (most recent call last):
  File "py-causal-fges-continuous-example.py", line 19, in <module>
    faithfulnessAssumed = True, verbose = True)
  File "/home/ccheek/.virtualenvs/rewrite/lib/python2.7/site-packages/pycausal/search.py", line 565, in __init__
    fges = javabridge.JClassWrapper('edu.cmu.tetrad.search.Fges')(score)
  File "/home/ccheek/.virtualenvs/rewrite/lib/python2.7/site-packages/javabridge/wrappers.py", line 311, in __call__
    result =  J.make_instance(self.cname, msig, *cargs)
  File "/home/ccheek/.virtualenvs/rewrite/lib/python2.7/site-packages/javabridge/jutil.py", line 1726, in make_instance
    raise JavaException(jexception)
javabridge.jutil.JavaException: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created

fail to install javabridge with pip

I use python 2.7 and when I try to pip install the javabridge, it fails with env errors. conda.org gives a guide to install that properly: https://anaconda.org/anaconda/javabridge;
To install this package with conda run:
conda install -c anaconda javabridge

BIC score for FCI

Hi,

I have a similar issue to #86 -- I would like to get the BIC score for the FCI algorithm. I'm working with a team and they confirm that while FCI does not use the BIC for search, it does output it. However, when I run:

graph = tetrad.getTetradGraph() print('Graph BIC: {}'.format(graph.getAttribute('BIC')))

I get:

Graph BIC: None

when I choose the fci algorithm. Any chance you could surface this score for me? I was told by the same team that TETRAD does output it for FCI, so it should be in the source somewhere.

TetradDataUtils class does not load

Hi,

when trying to use FCI with pycausal in a Jupyter Notebook, I get the following error:

tetrad.run(algoId = 'fci', dfs = df, testId = 'fisher-z', 
           alpha = 0.05, depth = -1, maxPathLength = -1, 
           completeRuleSetUsed = False, faithfulnessAssumed = True, verbose = True)
Failed to get class edu/pitt/dbmi/causal/cmd/util/TetradDataUtils

I have pycausal 1.1.0 (as far as I can tell -- I have a couple of different installations of pycausal, unfortunately, and pycausal.version does not seem to be available in the interpreter) and running this on Python 3.6.5 with JRE 1.8.0_181. This might be related to issue 58, which looks like it was fixed about a month ago, but I'm not sure if the fix made it into 1.1.0. Trying to use pycausal for a research project, and I would really appreciate your help!

Thanks,

Vlad

notebook example throws exception

The following example:
https://github.com/bd2kccd/py-causal/blob/development/example/javabridge/Python-Javabridge%20calling%20directly%20Py-Causal%20FGES%20Continuous%20Example.ipynb

Throws an exception when running the following cell:
node_list = javabridge.JClassWrapper("java.util.ArrayList")()

load dataset

for col in dframe.columns:
nodi = javabridge.JClassWrapper("edu.cmu.tetrad.data.ContinuousVariable") (col)
node_list.add(nodi)

Exception content:

JavaException Traceback (most recent call last)
in ()
2 # load dataset
3 for col in dframe.columns:
----> 4 nodi = javabridge.JClassWrapper("edu.cmu.tetrad.data.ContinuousVariable") (col)
5 node_list.add(nodi)

C:\Users\mot16\AppData\Local\Continuum\Anaconda3\lib\site-packages\javabridge\wrappers.py in init(self, class_name)
222 STATIC = J.get_static_field("java/lang/reflect/Modifier", "STATIC", "I")
223 self.cname = class_name.replace(".", "/")
--> 224 self.klass = J.get_class_wrapper(J.class_for_name(class_name), True)
225 self.static_methods = {}
226 env = J.get_env()

C:\Users\mot16\AppData\Local\Continuum\Anaconda3\lib\site-packages\javabridge\jutil.py in class_for_name(classname, ldr)
1741 '(Ljava/lang/String;ZLjava/lang/ClassLoader;)'
1742 'Ljava/lang/Class;',
-> 1743 classname, True, ldr)
1744
1745 def get_class_wrapper(obj, is_class = False):

C:\Users\mot16\AppData\Local\Continuum\Anaconda3\lib\site-packages\javabridge\jutil.py in static_call(class_name, method_name, sig, *args)
936 ret_sig = sig[sig.find(')')+1:]
937 nice_args = get_nice_args(args, args_sig)
--> 938 result = fn(*nice_args)
939 return get_nice_result(result, ret_sig)
940

C:\Users\mot16\AppData\Local\Continuum\Anaconda3\lib\site-packages\javabridge\jutil.py in fn(*args)
916 jexception = env.exception_occurred()
917 if jexception is not None:
--> 918 raise JavaException(jexception)
919 return result
920 return fn

JavaException: edu.cmu.tetrad.data.ContinuousVariable

After 5-10K insertions the process becomes very slow and not parallel.

Hi,

I am running py-causal and also tested cmd line version of java for about 3K time series. After a little while it starts inserting, at first it seems running parallel and fast but after 5-10K inserstions it becomes extremly slow.
I don't know what is the issue, but it seems either

network becomes big and becomes extremely slow
Or there is some bug that prevents from parallel run.

I have checked both CPU usage and Memory, both are ok and not maxed out.

I am running on Mac Book Pro i7 16G RAM.

Thanks
Artak

Add ccd-helpdesk as member

Hi,

This is Ruben in behalf the CCD-helpdesk. Can you add this github user: "ccd-helpdesk" as members of the bdk2kccd/py-causal repository. We just want to be notified in our gmail account about issues we can response to, for example, problems formatting data, interpretation of algorithm output, theoretical questions about causal search, etc.

Thanks,

Ruben

conda package out of date (broken on Python 3.6)

The conda package hasn't been updated since 2018-05-04 and is largely non-functional on Python 3.6 due to syntax bugs.

For example, the method tetrad.listScores() causes an error because it attempts to use .sort() on a dictionary. This issue is fixed on this repo, but none of the fixes are in the conda package yet.

Error when running "Calling Directly Py-Causal PC Example.ipynb"

Hi! When I run the code in "Calling Directly Py-Causal PC Example.ipynb" :
nodi = javabridge.JClassWrapper("edu.cmu.tetrad.data.ContinuousVariable")(col)

I get the error below:
`JavaException Traceback (most recent call last)
in
2 # load dataset
3 for col in dframe.columns:
----> 4 nodi = javabridge.JClassWrapper("edu.cmu.tetrad.data.ContinuousVariable")(col)
5 node_list.add(nodi)

c:\users\sx-liang\appdata\local\programs\python\python37\lib\site-packages\javabridge\wrappers.py in init(self, class_name)
229 STATIC = J.get_static_field("java/lang/reflect/Modifier", "STATIC", "I")
230 self.cname = class_name.replace(".", "/")
--> 231 self.klass = J.get_class_wrapper(J.class_for_name(class_name), True)
232 self.static_methods = {}
233 env = J.get_env()

c:\users\sx-liang\appdata\local\programs\python\python37\lib\site-packages\javabridge\jutil.py in class_for_name(classname, ldr)
1746 '(Ljava/lang/String;ZLjava/lang/ClassLoader;)'
1747 'Ljava/lang/Class;',
-> 1748 classname, True, ldr)
1749
1750 def get_class_wrapper(obj, is_class = False):

c:\users\sx-liang\appdata\local\programs\python\python37\lib\site-packages\javabridge\jutil.py in static_call(class_name, method_name, sig, *args)
941 ret_sig = sig[sig.find(')')+1:]
942 nice_args = get_nice_args(args, args_sig)
--> 943 result = fn(*nice_args)
944 return get_nice_result(result, ret_sig)
945

c:\users\sx-liang\appdata\local\programs\python\python37\lib\site-packages\javabridge\jutil.py in fn(*args)
921 jexception = env.exception_occurred()
922 if jexception is not None:
--> 923 raise JavaException(jexception)
924 return result
925 return fn

JavaException: edu.cmu.tetrad.data.ContinuousVariable
`

Should I check my Internet settings or pre-download the necessary package? If I need to pre-download the necessary package, where should it be put?

Thanks in advance!

Installation trouble

I had trouble installing through Anaconda. I ran through the steps described in the "Anaconda/Jupyter" section of the Readme, and found that I was able to import pydot, but not graphviz (not sure if this matters?) or pycausal.

The output from conda list included the following:

graphviz                  2.40.1               hefbbd9a_2 
pycausal                  0.1.3                    py27_0    chirayu
pydot                     1.4.1                    py27_0

At this point I could import pydot, but not graphviz or pycausal.

Poking around a little bit, it sounded like conda install graphviz doesn't install python bindings for graphviz, so I ran pip install graphviz. conda list now includes

python-graphviz           0.11.1                   pypi_0    pypi

and I can now import graphviz successfully. Unfortunately, pycausal is still borked:

>>> import pycausal
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named pycausal

Any guidance?

bd2kccd / py-causal Goto Github PK

py-causal's People

Contributors

Stargazers

Watchers

Forkers

py-causal's Issues

The result of tetrad.getEdges() in v1.1.0:

The result of tetrad.getEdges() in v1.1.1:

load dataset

Exception content:

Recommend Projects

Recommend Topics

Recommend Org

The result of `tetrad.getEdges()` in v1.1.0:

The result of `tetrad.getEdges()` in v1.1.1: