probcomp / crosscat Goto Github PK
View Code? Open in Web Editor NEWA domain-general, Bayesian method for analyzing high-dimensional data tables
Home Page: http://probcomp.csail.mit.edu/crosscat/
License: Apache License 2.0
A domain-general, Bayesian method for analyzing high-dimensional data tables
Home Page: http://probcomp.csail.mit.edu/crosscat/
License: Apache License 2.0
simple_predictive_sample_unobserved
calls numpy.random.multinomial
with no specified seed, so numpy presumably draws it from the environment.
Requires
extra_link_args = ["-stdlib=libstdc++","-mmacosx-version-min=10.6"],
LDFLAGS isn't respected, not sure the right way to detect when this is needed.
both View::remove_all() and State::remove_all() start with unused calls to _lookup.empty()
Should they be calling .clear() instead?
The multiprocessing example for DHA is missing the seend arg in its calls to engine.
eg X_L_list, X_D_list = engine.initialize(M_c, M_r, T, n_chains=num_chains) should be
engine.initialize(M_c, M_r, T, get_next_seed(), n_chains=num_chains)
After adding these the code is still very slow. much slow than the non multiprocessing version.
also looking at top shows that it is only using one core rather than the 4 I specified.
It would be nice to distribute binary builds (wheels) on common platforms, to shorten pip install time. This requires a bdist_wheel step.
According to http://python-packaging-user-guide.readthedocs.org/en/latest/distributing/#wheels platform specific wheels are only supported on OSX and Windows, not Linux.
I'm just trying to install in a environment with pyenv version manager. Running into numpy/arrayobject.h problem. FWIW- I notice that I don't have a /usr/lib/python2.7/site-packages directory. I'm somewhat new to python tools and environments.
$ python2.7 crosscat/setup.py install
running install
running build
running build_py
running build_ext
skipping 'crosscat/cython_code/ContinuousComponentModel.cpp' Cython extension (up-to-date)
skipping 'crosscat/cython_code/MultinomialComponentModel.cpp' Cython extension (up-to-date)
skipping 'crosscat/cython_code/State.cpp' Cython extension (up-to-date)
building 'crosscat.cython_code.State' extension
/usr/bin/gcc -pthread -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/usr/lib/python2.7/site-packages/numpy/core/include/ -fPIC -Icpp_code/include/CrossCat -I/.pyenv/versions/2.7.5/include/python2.7 -c crosscat/cython_code/State.cpp -o build/temp.linux-x86_64-2.7/crosscat/cython_code/State.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
crosscat/cython_code/State.cpp:262:31: fatal error: numpy/arrayobject.h: No such file or directory
compilation terminated.
error: command '/usr/bin/gcc' failed with exit status 1
The Mersenne twister sucks, the seed space we use is tiny, and the /ad hack/ seed management we do makes it even more ridiculous by seeding separate MT instances with sequential integers. All of it, including all uses of boost random and numpy.random
, should be replaced by a ChaCha-based PRNG with 256-bit seeds and small states.
Fortunately, most of the work to identify sources of nondeterminism has been done, and some seed parameter is passed in explicitly to every routine that makes random choices, so fixing this is a matter of macheteing your way through the mess you can see, rather than scrutinizing the whole code base to guess where it might be nondeterministic.
[probcomp-4 pc/crosscat]$ py.test
================================================================== test session starts ===================================================================
platform linux2 -- Python 2.7.6 -- pytest-2.5.1
plugins: flaky
collecting 0 items / 4 errorsSegmentation fault
[probcomp-4 pc/crosscat]$
I ran into some problems during the install of crosscat where it tries to install the hcluster package (see my comment on closed issue #8). I looked into why a simple pip install hcluster
wouldn't work, and found that hcluster has been incorporated into scipy.
Specifically, these are the hcluster functions that I can see used in crosscat, and their scipy counterparts:
hcluster | scipy |
---|---|
pdist | spatial.distance.pdist |
linkage | cluster.hierarchy.linkage |
dendrogram | cluster.hierarchy.dendrogram |
Note that I haven't compared them in detail, but I presume they haven't changed much, if at all; their docstrings look very similar.
It seems that a niggly dependency could be eliminated by switching to the scipy versions, so I'm wondering whether you'd accept pull requests implementing that (if so, are there any tests I should run), or have you considered all this and is there a good reason for using the original hcluster versions?
Rather than use a random seed in the `engine', Crosscat should store a random seed in each model, and the state transitions performed by LocalEngine and MultiprocessingEngine should be the same given the same seed in the model. That way, the only source of nondeterminism in Crosscat will be the initial selection of random seeds, which can be provided entirely by the caller, and LocalEngine and MultiprocessingEngine will yield exactly the same results for the same inputs.
Conditional draws from each model are given equal weight. This is inaccurate. Models should be drawn from the multinomial given by the total probability of the conditions.
Playing around with install using pyenv &Ubuntu 12.10.
I see a rather joyous Cython compile error.
$ python2.7 crosscat/setup.py install
running install
running build
running build_py
running build_ext
skipping 'crosscat/cython_code/ContinuousComponentModel.cpp' Cython extension (up-to-date)
skipping 'crosscat/cython_code/MultinomialComponentModel.cpp' Cython extension (up-to-date)
cythoning crosscat/cython_code/State.pyx to crosscat/cython_code/State.cpp
...
import crosscat.utils.general_utils as gu
cdef double set_double(double& to_set, double value):
to_set = value
crosscat/cython_code/State.pyx:35:12: Assignment to reference 'to_set'
building 'crosscat.cython_code.State' extension
/usr/bin/gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icpp_code/include/CrossCat -I/scratch2/apps/.pyenv/versions/2.7.5/include/python2.7 -c crosscat/cython_code/State.cpp -o build/temp.linux-x86_64-2.7/crosscat/cython_code/State.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
crosscat/cython_code/State.cpp:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.
error: command '/usr/bin/gcc' failed with exit status 1
This causes construct_continuous_specific_hyper_grid to compute 0 for the sum of squared deviations from the mean, and then to fill s_grid with the smallest positive normal floating-point number, due to a workaround I put into log_linspace a while ago to avoid zeros (which probably had the effect of masking this problem, unbeknownst to ignorant me back then).
The consequences manifest as NaN for marginal logps / logscores when analysis is attempted later. Of course, Crosscat can't conclude anything useful about the column if all its values are the same. But that's no reason for it to barf and choke.
We who are still paying attention to Crosscat ought to sit down and figure out what this hyperparameter gridding business is supposed to accomplish and accomplish it in a way that does not cause mysterious crashes later on.
def test_vonmises_vonmises_model(self):
> assert(check_one_feature_mixture(cycmext.p_CyclicComponentModel,
show_plot=self.show_plot) > .1)
E AssertionError: assert 0.093436305410546178 > 0.1
E + where 0.093436305410546178 = check_one_feature_mixture(<class 'crosscat.tests.component_model_extensions.CyclicComponentModel.p_CyclicComponentModel'>, show_plot=True)
E + where <class 'crosscat.tests.component_model_extensions.CyclicComponentModel.p_CyclicComponentModel'> = cycmext.p_CyclicComponentModel
E + and True = <test_mixture_inference_quality.TestComponentModelQuality testMethod=test_vonmises_vonmises_model>.show_plot
Hi,
I'm having trouble installing CrossCat using pip
in my local virtual environment. I simply cloned the crosscat
repository from github, and in that repository ran:
pip install .
The results were:
$ pip install .
Unpacking /ahg/regevdata/users/yarden/software/crosscat
Running setup.py egg_info for package from file:///ahg/regevdata/users/yarden/software/crosscat
which: no ccache in (/ahg/regevdata/users/yarden/myenv/bin:/home/unix/yarden/bin:/broad/lsf9/9.1/linux2.6-glibc2.3-x86_64/bin:/ahg/regevdata/users/yarden/myenv/bin:/home/unix/yarden/bin:/ahg/regevdata/users/yarden/myenv/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/emacs_24.2/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/mysql_5.6.20/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/perl_5.14.2/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bioperl_1.6.923/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/sqlite_3.6.19/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/tophat_2.0.10/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bowtie2_2.0.5:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bowtie_0.12.9:/broad/software/free/Linux/redhat_6_x86_64/pkgs/samtools/samtools_0.1.19/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/zlib_1.2.6/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bedtools_version-2.16.1/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/scipy_0.13.0-python-2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/matplotlib_1.3.1-python-2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/db_4.7.25/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/tcltk8.5.9/bin:/broad/lsf9/9.1/linux2.6-glibc2.3-x86_64/etc:/broad/software/free/Linux/redhat_6_x86_64/pkgs/lsf_drmaa-1.1.1/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/r_3.1.0-bioconductor-2.14/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_4.9.0/bin:/home/unix/yarden/bin:/broad/tools/NoArch/pkgs/local:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/)
Requirement already satisfied (use --upgrade to upgrade): scipy>=0.11.0 in /broad/software/free/Linux/redhat_6_x86_64/pkgs/scipy_0.13.0-python-2.7.1-sqlite3-rtrees/lib/python2.7/site-packages (from CrossCat==0.1)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.7.0 in /ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages (from CrossCat==0.1)
Installing collected packages: CrossCat
Running setup.py install for CrossCat
which: no ccache in (/ahg/regevdata/users/yarden/myenv/bin:/home/unix/yarden/bin:/broad/lsf9/9.1/linux2.6-glibc2.3-x86_64/bin:/ahg/regevdata/users/yarden/myenv/bin:/home/unix/yarden/bin:/ahg/regevdata/users/yarden/myenv/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/emacs_24.2/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/mysql_5.6.20/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/perl_5.14.2/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bioperl_1.6.923/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/sqlite_3.6.19/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/tophat_2.0.10/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bowtie2_2.0.5:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bowtie_0.12.9:/broad/software/free/Linux/redhat_6_x86_64/pkgs/samtools/samtools_0.1.19/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/zlib_1.2.6/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bedtools_version-2.16.1/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/scipy_0.13.0-python-2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/matplotlib_1.3.1-python-2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/db_4.7.25/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/tcltk8.5.9/bin:/broad/lsf9/9.1/linux2.6-glibc2.3-x86_64/etc:/broad/software/free/Linux/redhat_6_x86_64/pkgs/lsf_drmaa-1.1.1/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/r_3.1.0-bioconductor-2.14/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_4.9.0/bin:/home/unix/yarden/bin:/broad/tools/NoArch/pkgs/local:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/)
building 'crosscat.cython_code.CyclicComponentModel' extension
gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c crosscat/cython_code/CyclicComponentModel.c -o build/temp.linux-x86_64-2.7/crosscat/cython_code/CyclicComponentModel.o
gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/utils.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/utils.o
gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/numerics.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/numerics.o
gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/RandomNumberGenerator.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/RandomNumberGenerator.o
gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/ComponentModel.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/ComponentModel.o
gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/CyclicComponentModel.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/CyclicComponentModel.o
gcc: error: crosscat/cython_code/CyclicComponentModel.c: No such file or directory
gcc: fatal error: no input files
compilation terminated.
error: command 'gcc' failed with exit status 1
Complete output from command /ahg/regevdata/users/yarden/myenv/bin/python -c "import setuptools;__file__='/tmp/pip-ZYN_sU-build/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-5armZF-record/install-record.txt --single-version-externally-managed --install-headers /ahg/regevdata/users/yarden/myenv/include/site/python2.7:
which: no ccache in (/ahg/regevdata/users/yarden/myenv/bin:/home/unix/yarden/bin:/broad/lsf9/9.1/linux2.6-glibc2.3-x86_64/bin:/ahg/regevdata/users/yarden/myenv/bin:/home/unix/yarden/bin:/ahg/regevdata/users/yarden/myenv/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/emacs_24.2/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/mysql_5.6.20/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/perl_5.14.2/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bioperl_1.6.923/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/sqlite_3.6.19/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/tophat_2.0.10/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bowtie2_2.0.5:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bowtie_0.12.9:/broad/software/free/Linux/redhat_6_x86_64/pkgs/samtools/samtools_0.1.19/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/zlib_1.2.6/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bedtools_version-2.16.1/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/scipy_0.13.0-python-2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/matplotlib_1.3.1-python-2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/db_4.7.25/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/tcltk8.5.9/bin:/broad/lsf9/9.1/linux2.6-glibc2.3-x86_64/etc:/broad/software/free/Linux/redhat_6_x86_64/pkgs/lsf_drmaa-1.1.1/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/r_3.1.0-bioconductor-2.14/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_4.9.0/bin:/home/unix/yarden/bin:/broad/tools/NoArch/pkgs/local:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/)
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/crosscat
copying crosscat/JSONRPCEngine.py -> build/lib.linux-x86_64-2.7/crosscat
copying crosscat/CrossCatClient.py -> build/lib.linux-x86_64-2.7/crosscat
copying crosscat/MultiprocessingEngine.py -> build/lib.linux-x86_64-2.7/crosscat
copying crosscat/LocalEngine.py -> build/lib.linux-x86_64-2.7/crosscat
copying crosscat/__init__.py -> build/lib.linux-x86_64-2.7/crosscat
copying crosscat/starcluster_plugin.py -> build/lib.linux-x86_64-2.7/crosscat
copying crosscat/EngineTemplate.py -> build/lib.linux-x86_64-2.7/crosscat
copying crosscat/HadoopEngine.py -> build/lib.linux-x86_64-2.7/crosscat
copying crosscat/IPClusterEngine.py -> build/lib.linux-x86_64-2.7/crosscat
copying crosscat/settings.py -> build/lib.linux-x86_64-2.7/crosscat
creating build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/convergence_test_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/api_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/diagnostic_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/inference_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/xnet_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/hadoop_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/__init__.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/mutual_information_test_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/enumerate_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/timing_test_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/validate_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/plot_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/geweke_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/sample_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/useCase_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/file_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/data_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
copying crosscat/utils/general_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils
creating build/lib.linux-x86_64-2.7/crosscat/convergence_analysis
copying crosscat/convergence_analysis/parse_convergence_results.py -> build/lib.linux-x86_64-2.7/crosscat/convergence_analysis
copying crosscat/convergence_analysis/automated_convergence_tests.py -> build/lib.linux-x86_64-2.7/crosscat/convergence_analysis
copying crosscat/convergence_analysis/__init__.py -> build/lib.linux-x86_64-2.7/crosscat/convergence_analysis
copying crosscat/convergence_analysis/generate_convergence_script.py -> build/lib.linux-x86_64-2.7/crosscat/convergence_analysis
copying crosscat/convergence_analysis/plot_convergence_results.py -> build/lib.linux-x86_64-2.7/crosscat/convergence_analysis
copying crosscat/convergence_analysis/convergence_test.py -> build/lib.linux-x86_64-2.7/crosscat/convergence_analysis
creating build/lib.linux-x86_64-2.7/crosscat/jsonrpc_http
copying crosscat/jsonrpc_http/test_resume.py -> build/lib.linux-x86_64-2.7/crosscat/jsonrpc_http
copying crosscat/jsonrpc_http/stub_client_jsonrpc.py -> build/lib.linux-x86_64-2.7/crosscat/jsonrpc_http
copying crosscat/jsonrpc_http/__init__.py -> build/lib.linux-x86_64-2.7/crosscat/jsonrpc_http
copying crosscat/jsonrpc_http/test_engine.py -> build/lib.linux-x86_64-2.7/crosscat/jsonrpc_http
copying crosscat/jsonrpc_http/server_jsonrpc.py -> build/lib.linux-x86_64-2.7/crosscat/jsonrpc_http
creating build/lib.linux-x86_64-2.7/crosscat/cython_code
copying crosscat/cython_code/test_pred_prob_and_density.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code
copying crosscat/cython_code/test_multinomial_impute.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code
copying crosscat/cython_code/__init__.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code
copying crosscat/cython_code/setup.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code
copying crosscat/cython_code/test_missing_value.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code
copying crosscat/cython_code/mixed_state_test.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code
copying crosscat/cython_code/state_test.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code
copying crosscat/cython_code/test_sample.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code
copying crosscat/cython_code/continuous_component_model_test.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code
creating build/lib.linux-x86_64-2.7/crosscat/tests
copying crosscat/tests/test_pred_prob_and_density.py -> build/lib.linux-x86_64-2.7/crosscat/tests
copying crosscat/tests/cpp_long_tests.py -> build/lib.linux-x86_64-2.7/crosscat/tests
copying crosscat/tests/test_sampler_enumeration.py -> build/lib.linux-x86_64-2.7/crosscat/tests
copying crosscat/tests/cpp_unit_tests.py -> build/lib.linux-x86_64-2.7/crosscat/tests
copying crosscat/tests/__init__.py -> build/lib.linux-x86_64-2.7/crosscat/tests
copying crosscat/tests/timing_analysis.py -> build/lib.linux-x86_64-2.7/crosscat/tests
copying crosscat/tests/test_log_likelihood.py -> build/lib.linux-x86_64-2.7/crosscat/tests
copying crosscat/tests/geweke_on_schemas.py -> build/lib.linux-x86_64-2.7/crosscat/tests
creating build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests
copying crosscat/tests/quality_tests/synthetic_data_generator.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests
copying crosscat/tests/quality_tests/test_impute_quality.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests
copying crosscat/tests/quality_tests/test_mixture_inference_quality.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests
copying crosscat/tests/quality_tests/test_kl_divergence_quality.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests
copying crosscat/tests/quality_tests/test_component_model_quality.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests
copying crosscat/tests/quality_tests/__init__.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests
copying crosscat/tests/quality_tests/test_predictive_confidence.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests
copying crosscat/tests/quality_tests/quality_test_utils.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests
creating build/lib.linux-x86_64-2.7/crosscat/tests/component_model_extensions
copying crosscat/tests/component_model_extensions/MultinomialComponentModel.py -> build/lib.linux-x86_64-2.7/crosscat/tests/component_model_extensions
copying crosscat/tests/component_model_extensions/__init__.py -> build/lib.linux-x86_64-2.7/crosscat/tests/component_model_extensions
copying crosscat/tests/component_model_extensions/CyclicComponentModel.py -> build/lib.linux-x86_64-2.7/crosscat/tests/component_model_extensions
copying crosscat/tests/component_model_extensions/ContinuousComponentModel.py -> build/lib.linux-x86_64-2.7/crosscat/tests/component_model_extensions
running build_ext
building 'crosscat.cython_code.CyclicComponentModel' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/crosscat
creating build/temp.linux-x86_64-2.7/crosscat/cython_code
creating build/temp.linux-x86_64-2.7/cpp_code
creating build/temp.linux-x86_64-2.7/cpp_code/src
gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c crosscat/cython_code/CyclicComponentModel.c -o build/temp.linux-x86_64-2.7/crosscat/cython_code/CyclicComponentModel.o
gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/utils.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/utils.o
gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/numerics.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/numerics.o
gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/RandomNumberGenerator.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/RandomNumberGenerator.o
gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/ComponentModel.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/ComponentModel.o
gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/CyclicComponentModel.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/CyclicComponentModel.o
gcc: error: crosscat/cython_code/CyclicComponentModel.c: No such file or directory
gcc: fatal error: no input files
compilation terminated.
error: command 'gcc' failed with exit status 1
----------------------------------------
Command /ahg/regevdata/users/yarden/myenv/bin/python -c "import setuptools;__file__='/tmp/pip-ZYN_sU-build/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-5armZF-record/install-record.txt --single-version-externally-managed --install-headers /ahg/regevdata/users/yarden/myenv/include/site/python2.7 failed with error code 1 in /tmp/pip-ZYN_sU-build
Storing complete log in /home/unix/yarden/.pip/pip.log
Any thoughts on this?
I haven't looked too closely but it looks like Cython never gets invoked so the *.c files it should produce like crosscat/cython_code/CyclicComponentModel.c
are not found.
There is currently no way to evaluate P(X=x,Y=y,Z=z)
under the joint density of (X,Y,Z)
. We have
def simple_predictive_probability(self, M_c, X_L, X_D, Y, Q):
:param Q: A list of values to sample. Each value is doublet of (r, d, v):
r is the row index, d is the column index, v is the value
:type Q: list of lists
:returns: list of floats -- probabilities of the values specified by Q
which can only evaluate P(col_d = v | row = r)
, or the univariate marginal distribution for col_d
. If you input multiple columns, than multiple univariate densities are returned. We need Q
to take a list of lists of tuples.
Libboost 1.48 is not available for Ubuntu 13.10 changing the install.sh file to 1.49, could not get the install of Libboost 1.49 to take properly and.. (after commenting out the libboost line in the install) It seems the install is walking all over my system, installing Python 2.7 and Python 3.3 as well as reinstalling IPython, Pandas and much else that I already had (and had on dev or newer versions),, who knows what I'll have when this is done??
So I'd suggest that you NOT freeze/pin the requirement but allow newer versions to remain (I forget the spec at the moment.
Eg the install is highly risky for people with already running Python set-ups...
some strong warning should be provided!!
It looks like the HadoopEngine isn't currently in an executable state. There are references to settings that have been removed fairly recently. Is the implementation here something that could be made to work?
Crosscat adopts a uniform hyperprior over the parameters of the Normal Gamma prior distribution on the Normals (see footnote 1, p. 8.) For the "variance" parameter, this is done by constructing a grid from roughly 0 to sum((x-x̄)^2) in [construct_continuous_specific_hyper_grid
](https://github.com/probcomp/crosscat/blob/6dadb9b33f7111449d5daf5683a1eac6365431a4/cpp_code/src/utils.cpp#L434}. The largest variance which makes sense for a sample {x} is max((x-x̄)^2), though. Since this is a grid of 31 elements, we're potentially losing a fair bit of precision here, and may be able to tighten up convergence a bit by tightening this bound.
Oops.
This may not be so terrible if the probabilities are all close to each other, as in that case the arithmetic and geometric means will be close.
Need a test case that definitely demonstrates the right thing to do.
This failure should not be stochastic because we fixed the seed. But sometimes it fails like this:
=================================== FAILURES ===================================
_________________ test_multiple_col_ensure[130223-False-True] __________________
seed = 130223, dependent = False, analyze = True
@pytest.mark.parametrize("seed, dependent, analyze", single_args)
def test_multiple_col_ensure(seed, dependent, analyze):
rng = random.Random(seed)
T, M_r, M_c, X_L, X_D, engine = quick_le(get_next_seed(rng))
col_pairs = [(c1, c2,) for c1, c2 in it.combinations(range(N_COLS), 2)]
ensure_pairs = random.sample(col_pairs, 4)
dep_constraints = [(c1, c2, dependent) for c1, c2 in ensure_pairs]
X_L, X_D = engine.ensure_col_dep_constraints(M_c, M_r, T, X_L, X_D,
dep_constraints, get_next_seed(rng))
for col1, col2, dep in dep_constraints:
assert engine.assert_col_dep_constraints(X_L, X_D, col1, col2, dep, True)
if analyze:
X_L, X_D = engine.analyze(M_c, T, X_L, X_D, get_next_seed(rng),
n_steps=1000)
for col1, col2, dep in dep_constraints:
> assert engine.assert_col_dep_constraints(X_L, X_D, col1, col2, dep, True)
E assert <bound method LocalEngine.assert_col_dep_constraints of <crosscat.LocalEngine.LocalEngine object at 0x7fc498d4f210>>({'col_ensure': {'dependent': {}, 'independent': {'1': [9], '2': [5], '3': [6], '5': [2], ...}}, 'column_hypers': [{'fi... 'column_names': [5, 6, 7, 8, 9], 'row_partition_model': {'counts': [5, 5], 'hypers': {'alpha': 0.2511886431509581}}}]}, [[0, 1, 1, 0, 0, 0, ...], [0, 1, 1, 1, 1, 0, ...]], 6, 9, False, True)
E + where <bound method LocalEngine.assert_col_dep_constraints of <crosscat.LocalEngine.LocalEngine object at 0x7fc498d4f210>> = <crosscat.LocalEngine.LocalEngine object at 0x7fc498d4f210>.assert_col_dep_constraints
src/tests/unit_tests/test_col_ensure.py:123: AssertionError
===Flaky Test Report===
BOOST_ROOT is not being picked up when you do a "pip install ." on OS X so it isn't finding the include files and subsequently the boost library.
I am building this under a separate conda environment.
Hi there,
Just ran into this cython compile error:
cythoning crosscat/cython_code/State.pyx to crosscat/cython_code/State.cpp
Error compiling Cython file:
------------------------------------------------------------
...
import crosscat.utils.general_utils as gu
# import crosscat.utils.plot_utils as pu
cdef double set_double(double& to_set, double value):
to_set = value
^
------------------------------------------------------------
crosscat/cython_code/State.pyx:35:12: Assignment to reference 'to_set'
I'm building on Arch Linux, using Python 3.3.3, and cython 0.19.2.
Baxter's comment in the source about issues with impute_confidence
for continuous values should be documented somewhere more stable + visible (ie here).
# The confidence in continuous imputation is "the probability that
# there exists a unimodal summary" which is defined as the proportion of
# probability mass in the largest mode of a DPMM inferred from the simulate
# samples. We use crosscat on the samples for a given number of iterations,
# then calculate the proportion of mass in the largest mode.
#
# NOTE: The definition of confidence and its implementation do not agree.
# The probability of a unimodal summary is P(k=1|X), where k is the number
# of components in some infinite mixture model. I would describe the
# current implementation as "Is there a mode with sufficient enough mass
# that we can ignore the other modes". If this second formulation is to be
# used, it means that we need to not use the median of all the samples as
# the imputed value, but the median of the samples of the summary mode,
# because the summary (the imputed value) should come from the summary
# mode.
#
# There are a lot of problems with this second formulation.
#0. SLOW. Like, for real.
#1. Non-deterministic. The answer will be different given the same
# samples.
#2. Inaccurate. Approximate inference about approximate inferences.
# In practice confidences on the sample samples could be significantly
# different because the Gibbs sampler that underlies crosscat is
# susceptible to getting stuck in local maximum. Of course, this could be
# mitigated to some extent by using more chains, but things are slow
# enough as it is.
#3. Confidence (interval) has a distinct meaning to the people who will
# be using this software. A unimodal summary does not necessarily mean
# that inferences are within an acceptable range. We are going to need to
# be loud about this. Maybe there should be a notion of tolerance?
#
# An alternative: mutual predictive coverage
# ------------------------------------------
# Divide the number of samples in the intersection of the 90% CI's of each
# component model by the number of samples in the union of the 90% CI's of
# each component model.
Should take a single shell command to build Crosscat and run all automatic tests.
simple_predictive_sample_unobserved
(
crosscat/crosscat/utils/sample_utils.py
Lines 547 to 549 in b0a7c1a
probs = numpy.exp(cluster_logps)
probs /= sum(probs)
draw = numpy.nonzero(numpy.random.multinomial(1, probs))[0][0]
In numerics.cpp there is a pair of lines:
// FIXME: should this fail?
assert(rand_u < 1E-10);
I just hit this assertion error. I believe the answer to the question is, no, it shouldn't fail. It should just return draw. If that is the wrong return for some reason, then it is at least better than crashing out right?
Not extremely important but - For a baremetal machine - either of them may not be installed by default. Adding required versions of these dependencies will be great. Again - not a very big deal but dha_example requires a filename parameter mention in the command line, I guess these are just simple typos.
Currently predictive_probability
is computed by invoking the chain rule on the legacy simple_predictive_probability
. @axch suggests an alternative implementation
On second thought, crosscat should have a better algorithm for doing this:
- Group the query columns by view
- For each view that appears
- Compute the cluster logps the way simple_predictive_probability does
- For each cluster
- Compute the sum of the component_model.calc_element_predictive_logp_constrained across relevant component models and add it to the cluster logp
- Return the logsumexp of all the above.
In other words, retain the structure of simple_predictive_probability_unobserved (mutatis mutandis for observed) but expand it to handle multiple columns.
The reason this should be OK is independence of columns given cluster assignments.
The present implementation can be retained as a test, possibly at the Bayeslite level: the logpdf_joint of any metamodel should respect the chain rule exactly as computed here.
the error is
c:\Sander\my_code\crosscat-master>
c:\Sander\my_code\crosscat-master>python examples/dha_example.py www/data/dha.csv --num_chains 2 --num_transitions 2
Traceback (most recent call last):
File "examples/dha_example.py", line 78, in
X_L_list, X_D_list = engine.initialize(M_c, M_r, T, get_next_seed(), initialization='from_the_prior', n_chains=num_chains)
File "C:\Anaconda\lib\site-packages\crosscat\LocalEngine.py", line 110, in initialize
make_get_next_seed(seed),
File "C:\Anaconda\lib\site-packages\crosscat\LocalEngine.py", line 62, in get_initialize_arg_tuples
seeds = [get_next_seed() for seed_idx in range(n_chains)]
File "C:\Anaconda\lib\site-packages\crosscat\LocalEngine.py", line 908, in
return lambda: generator.next()
File "C:\Anaconda\lib\site-packages\crosscat\utils\general_utils.py", line 95, in int_generator
for _ in xrange(2**62):
OverflowError: Python int too large to convert to C long
c:\Sander\my_code\crosscat-master>
Just wanted to share a build success with other Mac OS users:
setup.py
, replacing boost_random
with boost_random-mt
throughout.BOOST_ROOT=/opt/local CPLUS_INCLUDE_PATH=/opt/local/include python setup.py build
BOOST_ROOT=/opt/local CPLUS_INCLUDE_PATH=/opt/local/include python setup.py install
Currently setup.py assumes numpy and cython are already installed, which requires a workaround in build scripts to explicitly install them before running pip.
There is currently no way to evaluate P(X=x,Y=y,Z=z)
under the joint density of (X,Y,Z)
.
We need to expose joint density evaluation in the interface. My first idea was to change simple_predictive_probability
which takes a list of disjoint univariate queries and interpret the input instead as joint. Currently we have:
def simple_predictive_probability(self, M_c, X_L, X_D, Y, Q):
:param Q: A list of values to sample. Each value is doublet of (r, d, v):
r is the row index, d is the column index, v is the value
:type Q: list of lists
:returns: list of floats -- probabilities of the values specified by Q
which can only evaluate P(col_d = v | row = r)
, or the univariate marginal distribution for col_d
. If you input multiple columns, than multiple univariate densities are returned. We need Q
to take a list of lists of tuples.
However it appears there are several tests that invoke simple_predictive_probability
which such a change might break.
The second best option is to create joint_predictive_probability
function in the interface (and invoke this one through bayeslite).
by block proposing the dependent column cliques.
Apparently the current implementation strategy for ENSURE DEPENDENT is to still propose column moves one at a time, but zero out the probability of any that violate the constraints. This means that a column that is DEPENDENT on another can never change views.
The better proposal mechanism is conceptually simple: just propose moving such a collection of columns as a group. Actual implementation difficulty is unknown.
It is being called from your install script... Guess I'll try an independent install Not sure what the issue is
and for the record my Ipython notebook is clobbered with can't import zmq.ipkernel (sigh)
Dockerfile and output below.
FROM continuumio/anaconda
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y g++ git libboost-dev
RUN conda install conda-build
# Build crosscat package
RUN conda skeleton pypi crosscat && conda build crosscat && rm -rf crosscat
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Icpp_code/include/CrossCat -I/opt/conda/envs/_build/lib/python2.7/site-packages/numpy/core/include -I/opt/conda/envs/_build/include/python2.7 -c cpp_code/src/weakprng.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/weakprng.o
cpp_code/src/weakprng.cpp: In function ‘int crypto_weakprng_selftest()’:
cpp_code/src/weakprng.cpp:314:38: error: ‘UINT64_C’ was not declared in this scope
error: command 'gcc' failed with exit status 1
Currently it uses ~32-bit seeds everywhere, and the Python part of the code still uses the Mersenne twister everywhere.
Unable to login into cross cat vm using the login and password on the wiki -
crosscat
crosscat
Trying to initialize a one-feature crosscat state cause a freeze. The interactive python session below recreates this phenomenon.
>>> import crosscat.utils.data_utils as du
>>> import crosscat.cython_code.State as State
>>> n_rows = 20
>>> n_cols = 1
>>> n_clusters = 1
>>> n_splits = 1
>>> T, M_r, M_c = du.gen_factorial_data_objects(0, n_clusters, n_cols, n_rows, n_splits)
>>> T
[[-5.787782515779506], [-5.826566594944548], [-4.958365620785989], [-5.492487488802081], [-6.224880036852396], [-6.827562219103892], [-5.56692163658829], [-6.292623845594414], [-6.5554462707688455], [-7.177709985746689], [-6.642867927741707], [-7.009385232315293], [-5.2122421142029545], [-6.465641354565031], [-5.991088472905642], [-5.195031202572548], [-4.403616309173517], [-5.86833115423013], [-5.037314522029348], [-6.54195489382407]]
>>> M_r
{'idx_to_name': {'11': 11, '10': 10, '13': 13, '12': 12, '15': 15, '14': 14, '17': 17, '16': 16, '19': 19, '18': 18, '1': 1, '0': 0, '3': 3, '2': 2, '5': 5, '4': 4, '7': 7, '6': 6, '9': 9, '8': 8}, 'name_to_idx': {'11': 11, '10': 10, '13': 13, '12': 12, '15': 15, '14': 14, '17': 17, '16': 16, '19': 19, '18': 18, '1': 1, '0': 0, '3': 3, '2': 2, '5': 5, '4': 4, '7': 7, '6': 6, '9': 9, '8': 8}}
>>> M_c
{'idx_to_name': {'0': 0}, 'column_metadata': [{'code_to_value': {}, 'value_to_code': {}, 'modeltype': 'normal_inverse_gamma'}], 'name_to_idx': {0: 0}}
>>> state = State.p_State(M_c, T)
Then nothing happens.
Dockerfile and output below. It builds fine from branch master
with python setup.py build
. Maybe the cythonize C++ code has gone stale.
# Dockerfile that builds, installs, and tests bayeslite. It is for development
# only; users should use the python package.
FROM ubuntu:15.10
RUN apt-get update -qq --fix-missing
# Installation dependencies:
RUN apt-get install -y -qq python2.7-dev python-pip git apt-utils pkg-config \
libfreetype6-dev libboost-dev liblapack-dev gfortran git
RUN pip install setuptools virtualenv
WORKDIR /bayesdb
RUN pip -q install pyzmq ipython[notebook]==3.2.1 cython numpy==1.8.2 \
matplotlib==1.4.3 scipy pandas
RUN BOOST_ROOT=/usr/include pip install crosscat
src/cython_code/State.cpp: In function 'int __pyx_pf_8crosscat_11cython_code_5State_7p_State___cinit__(__pyx_obj_8crosscat_11cython_code_5State_p_State*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*)':
src/cython_code/State.cpp:3765:263: error: no matching function for call to 'State::State(boost::numeric::ublas::matrix<double>&, std::vector<std::__cxx11::basic_string<char> >&, std::vector<int>&, std::vector<int>&, std::vector<int>&, std::__cxx11::string&, std::__cxx11::string&, std::vector<double>&, std::vector<double>&, std::vector<double>&, std::vector<double>&, int&, int&, int&)'
If you constrain column 3 to be 42, and sample column 3, Crosscat draws randomly instead of returning 42 as one might expect. This results in strange results in bayesdb like:
bayeslite> SIMULATE Murder FROM states_cc GIVEN Murder = 1 LIMIT 4;
Murder
-------------
2.96562662884
9.17781629692
4.15232993703
2.88644682395
Running the commands (plus sudo apt-get install git to get git) listed in the readme result in errors in a fresh virtual machine running in Hyper-V on Windows 8.1 64bit.
When attempting a build of setup.py, "error:command 'x86_64-linux-gnu-gcc' failed with exit status 1\r\nerror typing to exec 'cc1plus': execvp: No such file or directory" error occurs, ending the build.
When attempting to install after that, the same error occurs.
When attempting to run the example code, "ImportError: No module named crosscat.settings" occurs.
When attempting the pip install variation (obviously now running sudo apt-get install python-pip before attempting), the install loops through the same warning for many many modules "cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for c++ [enabled by default]".
Seems like it shouldn't be this hard to get this thing to run.
In sample_utils.py
invoking the function
def simple_predictive_probability_unobserved(M_c, X_L, X_D, Y,
query_row, query_columns, elements)
With constraints Y
not equal to []
causes an assertion error in the line:
# cluster_logps should logsumexp to log(1)
assert(numpy.abs(logsumexp(cluster_logps)) < .0000001)
Here is example:
File "/home/riastradh/crosscat/master/build/lib.linux-x86_64-2.7/crosscat/utils/sample_utils.py", line 122, in simple_predictive_probability_unobserved
assert(numpy.abs(logsumexp(cluster_logps)) < .0000001)
AssertionError: assert 4.0103806420029304 < 1e-07
+ where 4.0103806420029304 = <ufunc 'absolute'>(-4.0103806420029304)
+ where <ufunc 'absolute'> = <ufunc 'absolute'>
+ where <ufunc 'absolute'> = numpy.abs
+ and -4.0103806420029304 = logsumexp(array([-4.10965973, -6.86404678, -9.5251692 , -7.78917709, -8.61550711]))
python crosscat/examples/dha_example.py crosscat/www/data/dha.csv --num_chains 2 --num_transitions 2
Traceback (most recent call last):
File "crosscat/examples/dha_example.py", line 25, in
import crosscat.settings as S
ImportError: No module named settings
Machine: Ubuntu 15.04 x86_64 with Anaconda. Crosscat built from github source (same error with simple pip install)
Readme should be fixed (not that anyone wouldn't be able to figure this out)
Based on the following install error from pip, it appears that crosscat is not Python 3 compatible:
$ pip install git+https://github.com/probcomp/crosscat.git
Collecting git+https://github.com/probcomp/crosscat.git
Cloning https://github.com/probcomp/crosscat.git to /var/folders/7q/rgspm2s915d7nctflyjjgx5r0000gn/T/pip-gu18rboj-build
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/var/folders/7q/rgspm2s915d7nctflyjjgx5r0000gn/T/pip-gu18rboj-build/setup.py", line 53, in <module>
version = get_version()
File "/var/folders/7q/rgspm2s915d7nctflyjjgx5r0000gn/T/pip-gu18rboj-build/setup.py", line 15, in get_version
if version.endswith('+'):
TypeError: endswith first arg must be bytes or a tuple of bytes, not str
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /var/folders/7q/rgspm2s915d7nctflyjjgx5r0000gn/T/pip-gu18rboj-build
fonnescj on Christy.local in ~/Repositories/
Are there plans for compatibility?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.