Code Monkey home page Code Monkey logo

crosscat's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crosscat's Issues

unused calls to .empty()

both View::remove_all() and State::remove_all() start with unused calls to _lookup.empty()
Should they be calling .clear() instead?

dha_example_multiprocessing

The multiprocessing example for DHA is missing the seend arg in its calls to engine.
eg X_L_list, X_D_list = engine.initialize(M_c, M_r, T, n_chains=num_chains) should be
engine.initialize(M_c, M_r, T, get_next_seed(), n_chains=num_chains)

After adding these the code is still very slow. much slow than the non multiprocessing version.
also looking at top shows that it is only using one core rather than the 4 I specified.

State.cpp fatal error: numpy/arrayobject.h not found (Ubuntu12.10/pyenv2.7)

I'm just trying to install in a environment with pyenv version manager. Running into numpy/arrayobject.h problem. FWIW- I notice that I don't have a /usr/lib/python2.7/site-packages directory. I'm somewhat new to python tools and environments.

$ python2.7 crosscat/setup.py install
running install
running build
running build_py
running build_ext
skipping 'crosscat/cython_code/ContinuousComponentModel.cpp' Cython extension (up-to-date)
skipping 'crosscat/cython_code/MultinomialComponentModel.cpp' Cython extension (up-to-date)
skipping 'crosscat/cython_code/State.cpp' Cython extension (up-to-date)
building 'crosscat.cython_code.State' extension
/usr/bin/gcc -pthread -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/usr/lib/python2.7/site-packages/numpy/core/include/ -fPIC -Icpp_code/include/CrossCat -I/.pyenv/versions/2.7.5/include/python2.7 -c crosscat/cython_code/State.cpp -o build/temp.linux-x86_64-2.7/crosscat/cython_code/State.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
crosscat/cython_code/State.cpp:262:31: fatal error: numpy/arrayobject.h: No such file or directory
compilation terminated.
error: command '/usr/bin/gcc' failed with exit status 1

our PRNG sucks

The Mersenne twister sucks, the seed space we use is tiny, and the /ad hack/ seed management we do makes it even more ridiculous by seeding separate MT instances with sequential integers. All of it, including all uses of boost random and numpy.random, should be replaced by a ChaCha-based PRNG with 256-bit seeds and small states.

Fortunately, most of the work to identify sources of nondeterminism has been done, and some seed parameter is passed in explicitly to every routine that makes random choices, so fixing this is a matter of macheteing your way through the mess you can see, rather than scrutinizing the whole code base to guess where it might be nondeterministic.

Tests segfault

[probcomp-4 pc/crosscat]$ py.test
================================================================== test session starts ===================================================================
platform linux2 -- Python 2.7.6 -- pytest-2.5.1
plugins: flaky
collecting 0 items / 4 errorsSegmentation fault
[probcomp-4 pc/crosscat]$

hcluster -> scipy?

I ran into some problems during the install of crosscat where it tries to install the hcluster package (see my comment on closed issue #8). I looked into why a simple pip install hcluster wouldn't work, and found that hcluster has been incorporated into scipy.

Specifically, these are the hcluster functions that I can see used in crosscat, and their scipy counterparts:

hcluster scipy
pdist spatial.distance.pdist
linkage cluster.hierarchy.linkage
dendrogram cluster.hierarchy.dendrogram

Note that I haven't compared them in detail, but I presume they haven't changed much, if at all; their docstrings look very similar.

It seems that a niggly dependency could be eliminated by switching to the scipy versions, so I'm wondering whether you'd accept pull requests implementing that (if so, are there any tests I should run), or have you considered all this and is there a good reason for using the original hcluster versions?

associate random seed with each model

Rather than use a random seed in the `engine', Crosscat should store a random seed in each model, and the state transitions performed by LocalEngine and MultiprocessingEngine should be the same given the same seed in the model. That way, the only source of nondeterminism in Crosscat will be the initial selection of random seeds, which can be provided entirely by the caller, and LocalEngine and MultiprocessingEngine will yield exactly the same results for the same inputs.

State.pyx:35:12; Assignment to reference 'to_set'

Playing around with install using pyenv &Ubuntu 12.10.
I see a rather joyous Cython compile error.

$ python2.7 crosscat/setup.py install
running install
running build
running build_py
running build_ext
skipping 'crosscat/cython_code/ContinuousComponentModel.cpp' Cython extension (up-to-date)
skipping 'crosscat/cython_code/MultinomialComponentModel.cpp' Cython extension (up-to-date)
cythoning crosscat/cython_code/State.pyx to crosscat/cython_code/State.cpp

Error compiling Cython file:

...
import crosscat.utils.general_utils as gu

import crosscat.utils.plot_utils as pu

cdef double set_double(double& to_set, double value):
to_set = value

^

crosscat/cython_code/State.pyx:35:12: Assignment to reference 'to_set'
building 'crosscat.cython_code.State' extension
/usr/bin/gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icpp_code/include/CrossCat -I/scratch2/apps/.pyenv/versions/2.7.5/include/python2.7 -c crosscat/cython_code/State.cpp -o build/temp.linux-x86_64-2.7/crosscat/cython_code/State.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
crosscat/cython_code/State.cpp:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.
error: command '/usr/bin/gcc' failed with exit status 1

continuous component model can't handle constant column

This causes construct_continuous_specific_hyper_grid to compute 0 for the sum of squared deviations from the mean, and then to fill s_grid with the smallest positive normal floating-point number, due to a workaround I put into log_linspace a while ago to avoid zeros (which probably had the effect of masking this problem, unbeknownst to ignorant me back then).

The consequences manifest as NaN for marginal logps / logscores when analysis is attempted later. Of course, Crosscat can't conclude anything useful about the column if all its values are the same. But that's no reason for it to barf and choke.

We who are still paying attention to Crosscat ought to sit down and figure out what this hyperparameter gridding business is supposed to accomplish and accomplish it in a way that does not cause mysterious crashes later on.

vonmises test in test_mixture_inference_quality.py fails

def test_vonmises_vonmises_model(self):
>       assert(check_one_feature_mixture(cycmext.p_CyclicComponentModel,
        show_plot=self.show_plot) > .1)
E       AssertionError: assert 0.093436305410546178 > 0.1
E        +  where 0.093436305410546178 = check_one_feature_mixture(<class 'crosscat.tests.component_model_extensions.CyclicComponentModel.p_CyclicComponentModel'>, show_plot=True)
E        +    where <class 'crosscat.tests.component_model_extensions.CyclicComponentModel.p_CyclicComponentModel'> = cycmext.p_CyclicComponentModel
E        +    and   True = <test_mixture_inference_quality.TestComponentModelQuality testMethod=test_vonmises_vonmises_model>.show_plot

Unable to install CrossCat using pip

Hi,

I'm having trouble installing CrossCat using pip in my local virtual environment. I simply cloned the crosscat repository from github, and in that repository ran:

pip install .

The results were:

$ pip install .
Unpacking /ahg/regevdata/users/yarden/software/crosscat
  Running setup.py egg_info for package from file:///ahg/regevdata/users/yarden/software/crosscat
    which: no ccache in (/ahg/regevdata/users/yarden/myenv/bin:/home/unix/yarden/bin:/broad/lsf9/9.1/linux2.6-glibc2.3-x86_64/bin:/ahg/regevdata/users/yarden/myenv/bin:/home/unix/yarden/bin:/ahg/regevdata/users/yarden/myenv/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/emacs_24.2/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/mysql_5.6.20/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/perl_5.14.2/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bioperl_1.6.923/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/sqlite_3.6.19/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/tophat_2.0.10/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bowtie2_2.0.5:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bowtie_0.12.9:/broad/software/free/Linux/redhat_6_x86_64/pkgs/samtools/samtools_0.1.19/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/zlib_1.2.6/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bedtools_version-2.16.1/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/scipy_0.13.0-python-2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/matplotlib_1.3.1-python-2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/db_4.7.25/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/tcltk8.5.9/bin:/broad/lsf9/9.1/linux2.6-glibc2.3-x86_64/etc:/broad/software/free/Linux/redhat_6_x86_64/pkgs/lsf_drmaa-1.1.1/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/r_3.1.0-bioconductor-2.14/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_4.9.0/bin:/home/unix/yarden/bin:/broad/tools/NoArch/pkgs/local:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/)

Requirement already satisfied (use --upgrade to upgrade): scipy>=0.11.0 in /broad/software/free/Linux/redhat_6_x86_64/pkgs/scipy_0.13.0-python-2.7.1-sqlite3-rtrees/lib/python2.7/site-packages (from CrossCat==0.1)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.7.0 in /ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages (from CrossCat==0.1)
Installing collected packages: CrossCat
  Running setup.py install for CrossCat
    which: no ccache in (/ahg/regevdata/users/yarden/myenv/bin:/home/unix/yarden/bin:/broad/lsf9/9.1/linux2.6-glibc2.3-x86_64/bin:/ahg/regevdata/users/yarden/myenv/bin:/home/unix/yarden/bin:/ahg/regevdata/users/yarden/myenv/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/emacs_24.2/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/mysql_5.6.20/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/perl_5.14.2/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bioperl_1.6.923/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/sqlite_3.6.19/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/tophat_2.0.10/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bowtie2_2.0.5:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bowtie_0.12.9:/broad/software/free/Linux/redhat_6_x86_64/pkgs/samtools/samtools_0.1.19/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/zlib_1.2.6/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bedtools_version-2.16.1/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/scipy_0.13.0-python-2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/matplotlib_1.3.1-python-2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/db_4.7.25/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/tcltk8.5.9/bin:/broad/lsf9/9.1/linux2.6-glibc2.3-x86_64/etc:/broad/software/free/Linux/redhat_6_x86_64/pkgs/lsf_drmaa-1.1.1/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/r_3.1.0-bioconductor-2.14/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_4.9.0/bin:/home/unix/yarden/bin:/broad/tools/NoArch/pkgs/local:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/)
    building 'crosscat.cython_code.CyclicComponentModel' extension
    gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c crosscat/cython_code/CyclicComponentModel.c -o build/temp.linux-x86_64-2.7/crosscat/cython_code/CyclicComponentModel.o
    gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/utils.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/utils.o
    gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/numerics.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/numerics.o
    gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/RandomNumberGenerator.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/RandomNumberGenerator.o
    gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/ComponentModel.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/ComponentModel.o
    gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/CyclicComponentModel.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/CyclicComponentModel.o
    gcc: error: crosscat/cython_code/CyclicComponentModel.c: No such file or directory
    gcc: fatal error: no input files
    compilation terminated.
    error: command 'gcc' failed with exit status 1
    Complete output from command /ahg/regevdata/users/yarden/myenv/bin/python -c "import setuptools;__file__='/tmp/pip-ZYN_sU-build/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-5armZF-record/install-record.txt --single-version-externally-managed --install-headers /ahg/regevdata/users/yarden/myenv/include/site/python2.7:
    which: no ccache in (/ahg/regevdata/users/yarden/myenv/bin:/home/unix/yarden/bin:/broad/lsf9/9.1/linux2.6-glibc2.3-x86_64/bin:/ahg/regevdata/users/yarden/myenv/bin:/home/unix/yarden/bin:/ahg/regevdata/users/yarden/myenv/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/emacs_24.2/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/mysql_5.6.20/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/perl_5.14.2/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bioperl_1.6.923/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/sqlite_3.6.19/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/tophat_2.0.10/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bowtie2_2.0.5:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bowtie_0.12.9:/broad/software/free/Linux/redhat_6_x86_64/pkgs/samtools/samtools_0.1.19/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/zlib_1.2.6/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/bedtools_version-2.16.1/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/scipy_0.13.0-python-2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/matplotlib_1.3.1-python-2.7.1-sqlite3-rtrees/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/db_4.7.25/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/tcltk8.5.9/bin:/broad/lsf9/9.1/linux2.6-glibc2.3-x86_64/etc:/broad/software/free/Linux/redhat_6_x86_64/pkgs/lsf_drmaa-1.1.1/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/r_3.1.0-bioconductor-2.14/bin:/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_4.9.0/bin:/home/unix/yarden/bin:/broad/tools/NoArch/pkgs/local:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/:/home/unix/yarden/regevdata/bedtools2/bin:/home/unix/yarden/regevdata/software/sratoolkit.2.4.2-ubuntu64/bin:/home/unix/yarden/regevdata/software/ncftp-3.2.5/bin:/home/unix/yarden/bin/)

running install

running build

running build_py

creating build

creating build/lib.linux-x86_64-2.7

creating build/lib.linux-x86_64-2.7/crosscat

copying crosscat/JSONRPCEngine.py -> build/lib.linux-x86_64-2.7/crosscat

copying crosscat/CrossCatClient.py -> build/lib.linux-x86_64-2.7/crosscat

copying crosscat/MultiprocessingEngine.py -> build/lib.linux-x86_64-2.7/crosscat

copying crosscat/LocalEngine.py -> build/lib.linux-x86_64-2.7/crosscat

copying crosscat/__init__.py -> build/lib.linux-x86_64-2.7/crosscat

copying crosscat/starcluster_plugin.py -> build/lib.linux-x86_64-2.7/crosscat

copying crosscat/EngineTemplate.py -> build/lib.linux-x86_64-2.7/crosscat

copying crosscat/HadoopEngine.py -> build/lib.linux-x86_64-2.7/crosscat

copying crosscat/IPClusterEngine.py -> build/lib.linux-x86_64-2.7/crosscat

copying crosscat/settings.py -> build/lib.linux-x86_64-2.7/crosscat

creating build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/convergence_test_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/api_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/diagnostic_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/inference_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/xnet_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/hadoop_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/__init__.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/mutual_information_test_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/enumerate_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/timing_test_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/validate_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/plot_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/geweke_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/sample_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/useCase_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/file_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/data_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

copying crosscat/utils/general_utils.py -> build/lib.linux-x86_64-2.7/crosscat/utils

creating build/lib.linux-x86_64-2.7/crosscat/convergence_analysis

copying crosscat/convergence_analysis/parse_convergence_results.py -> build/lib.linux-x86_64-2.7/crosscat/convergence_analysis

copying crosscat/convergence_analysis/automated_convergence_tests.py -> build/lib.linux-x86_64-2.7/crosscat/convergence_analysis

copying crosscat/convergence_analysis/__init__.py -> build/lib.linux-x86_64-2.7/crosscat/convergence_analysis

copying crosscat/convergence_analysis/generate_convergence_script.py -> build/lib.linux-x86_64-2.7/crosscat/convergence_analysis

copying crosscat/convergence_analysis/plot_convergence_results.py -> build/lib.linux-x86_64-2.7/crosscat/convergence_analysis

copying crosscat/convergence_analysis/convergence_test.py -> build/lib.linux-x86_64-2.7/crosscat/convergence_analysis

creating build/lib.linux-x86_64-2.7/crosscat/jsonrpc_http

copying crosscat/jsonrpc_http/test_resume.py -> build/lib.linux-x86_64-2.7/crosscat/jsonrpc_http

copying crosscat/jsonrpc_http/stub_client_jsonrpc.py -> build/lib.linux-x86_64-2.7/crosscat/jsonrpc_http

copying crosscat/jsonrpc_http/__init__.py -> build/lib.linux-x86_64-2.7/crosscat/jsonrpc_http

copying crosscat/jsonrpc_http/test_engine.py -> build/lib.linux-x86_64-2.7/crosscat/jsonrpc_http

copying crosscat/jsonrpc_http/server_jsonrpc.py -> build/lib.linux-x86_64-2.7/crosscat/jsonrpc_http

creating build/lib.linux-x86_64-2.7/crosscat/cython_code

copying crosscat/cython_code/test_pred_prob_and_density.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code

copying crosscat/cython_code/test_multinomial_impute.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code

copying crosscat/cython_code/__init__.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code

copying crosscat/cython_code/setup.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code

copying crosscat/cython_code/test_missing_value.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code

copying crosscat/cython_code/mixed_state_test.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code

copying crosscat/cython_code/state_test.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code

copying crosscat/cython_code/test_sample.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code

copying crosscat/cython_code/continuous_component_model_test.py -> build/lib.linux-x86_64-2.7/crosscat/cython_code

creating build/lib.linux-x86_64-2.7/crosscat/tests

copying crosscat/tests/test_pred_prob_and_density.py -> build/lib.linux-x86_64-2.7/crosscat/tests

copying crosscat/tests/cpp_long_tests.py -> build/lib.linux-x86_64-2.7/crosscat/tests

copying crosscat/tests/test_sampler_enumeration.py -> build/lib.linux-x86_64-2.7/crosscat/tests

copying crosscat/tests/cpp_unit_tests.py -> build/lib.linux-x86_64-2.7/crosscat/tests

copying crosscat/tests/__init__.py -> build/lib.linux-x86_64-2.7/crosscat/tests

copying crosscat/tests/timing_analysis.py -> build/lib.linux-x86_64-2.7/crosscat/tests

copying crosscat/tests/test_log_likelihood.py -> build/lib.linux-x86_64-2.7/crosscat/tests

copying crosscat/tests/geweke_on_schemas.py -> build/lib.linux-x86_64-2.7/crosscat/tests

creating build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests

copying crosscat/tests/quality_tests/synthetic_data_generator.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests

copying crosscat/tests/quality_tests/test_impute_quality.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests

copying crosscat/tests/quality_tests/test_mixture_inference_quality.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests

copying crosscat/tests/quality_tests/test_kl_divergence_quality.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests

copying crosscat/tests/quality_tests/test_component_model_quality.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests

copying crosscat/tests/quality_tests/__init__.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests

copying crosscat/tests/quality_tests/test_predictive_confidence.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests

copying crosscat/tests/quality_tests/quality_test_utils.py -> build/lib.linux-x86_64-2.7/crosscat/tests/quality_tests

creating build/lib.linux-x86_64-2.7/crosscat/tests/component_model_extensions

copying crosscat/tests/component_model_extensions/MultinomialComponentModel.py -> build/lib.linux-x86_64-2.7/crosscat/tests/component_model_extensions

copying crosscat/tests/component_model_extensions/__init__.py -> build/lib.linux-x86_64-2.7/crosscat/tests/component_model_extensions

copying crosscat/tests/component_model_extensions/CyclicComponentModel.py -> build/lib.linux-x86_64-2.7/crosscat/tests/component_model_extensions

copying crosscat/tests/component_model_extensions/ContinuousComponentModel.py -> build/lib.linux-x86_64-2.7/crosscat/tests/component_model_extensions

running build_ext

building 'crosscat.cython_code.CyclicComponentModel' extension

creating build/temp.linux-x86_64-2.7

creating build/temp.linux-x86_64-2.7/crosscat

creating build/temp.linux-x86_64-2.7/crosscat/cython_code

creating build/temp.linux-x86_64-2.7/cpp_code

creating build/temp.linux-x86_64-2.7/cpp_code/src

gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c crosscat/cython_code/CyclicComponentModel.c -o build/temp.linux-x86_64-2.7/crosscat/cython_code/CyclicComponentModel.o

gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/utils.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/utils.o

gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/numerics.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/numerics.o

gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/RandomNumberGenerator.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/RandomNumberGenerator.o

gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/ComponentModel.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/ComponentModel.o

gcc -pthread -fno-strict-aliasing -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/tcltk8.5.9/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/sqlite_3.7.5/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/db_4.7.25/include -DNDEBUG -fPIC -Icpp_code/include/CrossCat -I/ahg/regevdata/users/yarden/myenv/lib/python2.7/site-packages/numpy/core/include -I/broad/software/free/Linux/redhat_5_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/include/python2.7 -c cpp_code/src/CyclicComponentModel.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/CyclicComponentModel.o

gcc: error: crosscat/cython_code/CyclicComponentModel.c: No such file or directory

gcc: fatal error: no input files

compilation terminated.

error: command 'gcc' failed with exit status 1

----------------------------------------
Command /ahg/regevdata/users/yarden/myenv/bin/python -c "import setuptools;__file__='/tmp/pip-ZYN_sU-build/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-5armZF-record/install-record.txt --single-version-externally-managed --install-headers /ahg/regevdata/users/yarden/myenv/include/site/python2.7 failed with error code 1 in /tmp/pip-ZYN_sU-build
Storing complete log in /home/unix/yarden/.pip/pip.log

Any thoughts on this?

I haven't looked too closely but it looks like Cython never gets invoked so the *.c files it should produce like crosscat/cython_code/CyclicComponentModel.c are not found.

simple_predictive_probability should evaluate density for joint distribution

There is currently no way to evaluate P(X=x,Y=y,Z=z) under the joint density of (X,Y,Z). We have

    def simple_predictive_probability(self, M_c, X_L, X_D, Y, Q):
        :param Q: A list of values to sample.  Each value is doublet of (r, d, v):
                  r is the row index, d is the column index, v is the value
        :type Q: list of lists
        :returns: list of floats -- probabilities of the values specified by Q

which can only evaluate P(col_d = v | row = r), or the univariate marginal distribution for col_d. If you input multiple columns, than multiple univariate densities are returned. We need Q to take a list of lists of tuples.

Install requires libboost1.48 and 1.49 is current on Ubuntu 13.10

Libboost 1.48 is not available for Ubuntu 13.10 changing the install.sh file to 1.49, could not get the install of Libboost 1.49 to take properly and.. (after commenting out the libboost line in the install) It seems the install is walking all over my system, installing Python 2.7 and Python 3.3 as well as reinstalling IPython, Pandas and much else that I already had (and had on dev or newer versions),, who knows what I'll have when this is done??
So I'd suggest that you NOT freeze/pin the requirement but allow newer versions to remain (I forget the spec at the moment.
Eg the install is highly risky for people with already running Python set-ups...

some strong warning should be provided!!

Hadoop state non-functional

It looks like the HadoopEngine isn't currently in an executable state. There are references to settings that have been removed fairly recently. Is the implementation here something that could be made to work?

Crosscat hyperprior grid on variance parameter is broader than it needs to be

Crosscat adopts a uniform hyperprior over the parameters of the Normal Gamma prior distribution on the Normals (see footnote 1, p. 8.) For the "variance" parameter, this is done by constructing a grid from roughly 0 to sum((x-x̄)^2) in [construct_continuous_specific_hyper_grid](https://github.com/probcomp/crosscat/blob/6dadb9b33f7111449d5daf5683a1eac6365431a4/cpp_code/src/utils.cpp#L434}. The largest variance which makes sense for a sample {x} is max((x-x̄)^2), though. Since this is a grid of 31 elements, we're potentially losing a fair bit of precision here, and may be able to tighten up convergence a bit by tightening this bound.

test_multiple_col_ensure.py is stochastic

This failure should not be stochastic because we fixed the seed. But sometimes it fails like this:

=================================== FAILURES ===================================
_________________ test_multiple_col_ensure[130223-False-True] __________________

seed = 130223, dependent = False, analyze = True

@pytest.mark.parametrize("seed, dependent, analyze", single_args)
def test_multiple_col_ensure(seed, dependent, analyze):
    rng = random.Random(seed)
    T, M_r, M_c, X_L, X_D, engine = quick_le(get_next_seed(rng))
    col_pairs = [(c1, c2,) for c1, c2 in it.combinations(range(N_COLS), 2)]
    ensure_pairs = random.sample(col_pairs, 4)
    dep_constraints = [(c1, c2, dependent) for c1, c2 in ensure_pairs]

    X_L, X_D = engine.ensure_col_dep_constraints(M_c, M_r, T, X_L, X_D,
    dep_constraints, get_next_seed(rng))

    for col1, col2, dep in dep_constraints:
    assert engine.assert_col_dep_constraints(X_L, X_D, col1, col2, dep, True)

    if analyze:
    X_L, X_D = engine.analyze(M_c, T, X_L, X_D, get_next_seed(rng),
        n_steps=1000)

    for col1, col2, dep in dep_constraints:
>               assert engine.assert_col_dep_constraints(X_L, X_D, col1, col2, dep, True)
E               assert <bound method LocalEngine.assert_col_dep_constraints of <crosscat.LocalEngine.LocalEngine object at 0x7fc498d4f210>>({'col_ensure': {'dependent': {}, 'independent': {'1': [9], '2': [5], '3': [6], '5': [2], ...}}, 'column_hypers': [{'fi... 'column_names': [5, 6, 7, 8, 9], 'row_partition_model': {'counts': [5, 5], 'hypers': {'alpha': 0.2511886431509581}}}]}, [[0, 1, 1, 0, 0, 0, ...], [0, 1, 1, 1, 1, 0, ...]], 6, 9, False, True)
E                +  where <bound method LocalEngine.assert_col_dep_constraints of <crosscat.LocalEngine.LocalEngine object at 0x7fc498d4f210>> = <crosscat.LocalEngine.LocalEngine object at 0x7fc498d4f210>.assert_col_dep_constraints

src/tests/unit_tests/test_col_ensure.py:123: AssertionError
===Flaky Test Report===

Cython compile error in State.pyx...

Hi there,

Just ran into this cython compile error:

cythoning crosscat/cython_code/State.pyx to crosscat/cython_code/State.cpp

Error compiling Cython file:
------------------------------------------------------------
...
import crosscat.utils.general_utils as gu
# import crosscat.utils.plot_utils as pu


cdef double set_double(double& to_set, double value):
     to_set = value
           ^
------------------------------------------------------------

crosscat/cython_code/State.pyx:35:12: Assignment to reference 'to_set'

I'm building on Arch Linux, using Python 3.3.3, and cython 0.19.2.

Issues with impute_confidence.

Baxter's comment in the source about issues with impute_confidence for continuous values should be documented somewhere more stable + visible (ie here).

    # The confidence in continuous imputation is "the probability that
    # there exists a unimodal summary" which is defined as the proportion of
    # probability mass in the largest mode of a DPMM inferred from the simulate
    # samples. We use crosscat on the samples for a given number of iterations,
    # then calculate the proportion of mass in the largest mode.
    #
    # NOTE: The definition of confidence and its implementation do not agree.
    # The probability of a unimodal summary is P(k=1|X), where k is the number
    # of components in some infinite mixture model. I would describe the
    # current implementation as "Is there a mode with sufficient enough mass
    # that we can ignore the other modes". If this second formulation is to be
    # used, it means that we need to not use the median of all the samples as
    # the imputed value, but the median of the samples of the summary mode,
    # because the summary (the imputed value) should come from the summary
    # mode.
    #
    # There are a lot of problems with this second formulation.
    #0. SLOW. Like, for real.
    #1. Non-deterministic. The answer will be different given the same
    #   samples.
    #2. Inaccurate. Approximate inference about approximate inferences.
    #   In practice confidences on the sample samples could be significantly
    #   different because the Gibbs sampler that underlies crosscat is
    #   susceptible to getting stuck in local maximum. Of course, this could be
    #   mitigated to some extent by using more chains, but things are slow
    #   enough as it is.
    #3. Confidence (interval) has a distinct meaning to the people who will
    #   be using this software. A unimodal summary does not necessarily mean
    #   that inferences are within an acceptable range. We are going to need to
    #   be loud about this. Maybe there should be a notion of tolerance?
    #
    # An alternative: mutual predictive coverage
    # ------------------------------------------
    # Divide the number of samples in the intersection of the 90% CI's of each
    # component model by the number of samples in the union of the 90% CI's of
    # each component model.

assertion fail in numerics.cpp

In numerics.cpp there is a pair of lines:

// FIXME: should this fail?
assert(rand_u < 1E-10);

I just hit this assertion error. I believe the answer to the question is, no, it shouldn't fail. It should just return draw. If that is the wrong return for some reason, then it is at least better than crashing out right?

Install of boost/gcc/numpy is required for compilation of crosscat

Not extremely important but - For a baremetal machine - either of them may not be installed by default. Adding required versions of these dependencies will be great. Again - not a very big deal but dha_example requires a filename parameter mention in the command line, I guess these are just simple typos.

Revising algorithm for computing joint pdf

Currently predictive_probability is computed by invoking the chain rule on the legacy simple_predictive_probability. @axch suggests an alternative implementation

On second thought, crosscat should have a better algorithm for doing this:

  • Group the query columns by view
  • For each view that appears
  • Compute the cluster logps the way simple_predictive_probability does
  • For each cluster
    • Compute the sum of the component_model.calc_element_predictive_logp_constrained across relevant component models and add it to the cluster logp
  • Return the logsumexp of all the above.

In other words, retain the structure of simple_predictive_probability_unobserved (mutatis mutandis for observed) but expand it to handle multiple columns.

The reason this should be OK is independence of columns given cluster assignments.

The present implementation can be retained as a test, possibly at the Bayeslite level: the logpdf_joint of any metamodel should respect the chain rule exactly as computed here.

pls help run on windows 10 machine with anaconda python 2.7

the error is

c:\Sander\my_code\crosscat-master>
c:\Sander\my_code\crosscat-master>python examples/dha_example.py www/data/dha.csv --num_chains 2 --num_transitions 2
Traceback (most recent call last):
File "examples/dha_example.py", line 78, in
X_L_list, X_D_list = engine.initialize(M_c, M_r, T, get_next_seed(), initialization='from_the_prior', n_chains=num_chains)
File "C:\Anaconda\lib\site-packages\crosscat\LocalEngine.py", line 110, in initialize
make_get_next_seed(seed),
File "C:\Anaconda\lib\site-packages\crosscat\LocalEngine.py", line 62, in get_initialize_arg_tuples
seeds = [get_next_seed() for seed_idx in range(n_chains)]
File "C:\Anaconda\lib\site-packages\crosscat\LocalEngine.py", line 908, in
return lambda: generator.next()
File "C:\Anaconda\lib\site-packages\crosscat\utils\general_utils.py", line 95, in int_generator
for _ in xrange(2**62):
OverflowError: Python int too large to convert to C long

c:\Sander\my_code\crosscat-master>

Build success with Mac OS X

Just wanted to share a build success with other Mac OS users:

  • Install Boost via MacPorts.
  • Set up Python virtualenv.
  • Edit setup.py, replacing boost_random with boost_random-mt throughout.
  • Build with:
BOOST_ROOT=/opt/local CPLUS_INCLUDE_PATH=/opt/local/include python setup.py build
BOOST_ROOT=/opt/local CPLUS_INCLUDE_PATH=/opt/local/include python setup.py install

Evaluate joint density

There is currently no way to evaluate P(X=x,Y=y,Z=z) under the joint density of (X,Y,Z).

We need to expose joint density evaluation in the interface. My first idea was to change simple_predictive_probability which takes a list of disjoint univariate queries and interpret the input instead as joint. Currently we have:

    def simple_predictive_probability(self, M_c, X_L, X_D, Y, Q):
        :param Q: A list of values to sample.  Each value is doublet of (r, d, v):
                  r is the row index, d is the column index, v is the value
        :type Q: list of lists
        :returns: list of floats -- probabilities of the values specified by Q

which can only evaluate P(col_d = v | row = r), or the univariate marginal distribution for col_d. If you input multiple columns, than multiple univariate densities are returned. We need Q to take a list of lists of tuples.

However it appears there are several tests that invoke simple_predictive_probability which such a change might break.

The second best option is to create joint_predictive_probability function in the interface (and invoke this one through bayeslite).

Speed up inference (and ensure ergodicity) in the presence of ENSURE DEPENDENT

by block proposing the dependent column cliques.

Apparently the current implementation strategy for ENSURE DEPENDENT is to still propose column moves one at a time, but zero out the probability of any that violate the constraints. This means that a column that is DEPENDENT on another can never change views.

  • If there is only one connected component of such columns, this merely slows convergence (I think)
  • If there is more than one, inference becomes non-ergodic, because the two components will get stuck either dependent or independent during initialization, and never be able to change.

The better proposal mechanism is conceptually simple: just propose moving such a collection of columns as a group. Actual implementation difficulty is unknown.

Crosscat build fails in anaconda environment: "cpp_code/src/weakprng.cpp:314:38: error: ‘UINT64_C’ was not declared in this scope~

Dockerfile and output below.

FROM continuumio/anaconda

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y g++ git libboost-dev
RUN conda install conda-build
# Build crosscat package
RUN conda skeleton pypi crosscat && conda build crosscat && rm -rf crosscat
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Icpp_code/include/CrossCat -I/opt/conda/envs/_build/lib/python2.7/site-packages/numpy/core/include -I/opt/conda/envs/_build/include/python2.7 -c cpp_code/src/weakprng.cpp -o build/temp.linux-x86_64-2.7/cpp_code/src/weakprng.o
cpp_code/src/weakprng.cpp: In function ‘int crypto_weakprng_selftest()’:
cpp_code/src/weakprng.cpp:314:38: error: ‘UINT64_C’ was not declared in this scope
error: command 'gcc' failed with exit status 1

Initializing one-feature state causes freeze

Trying to initialize a one-feature crosscat state cause a freeze. The interactive python session below recreates this phenomenon.

>>> import crosscat.utils.data_utils as du
>>> import crosscat.cython_code.State as State
>>> n_rows = 20
>>> n_cols = 1
>>> n_clusters = 1
>>> n_splits = 1
>>> T, M_r, M_c = du.gen_factorial_data_objects(0, n_clusters, n_cols, n_rows, n_splits)
>>> T
[[-5.787782515779506], [-5.826566594944548], [-4.958365620785989], [-5.492487488802081], [-6.224880036852396], [-6.827562219103892], [-5.56692163658829], [-6.292623845594414], [-6.5554462707688455], [-7.177709985746689], [-6.642867927741707], [-7.009385232315293], [-5.2122421142029545], [-6.465641354565031], [-5.991088472905642], [-5.195031202572548], [-4.403616309173517], [-5.86833115423013], [-5.037314522029348], [-6.54195489382407]]
>>> M_r
{'idx_to_name': {'11': 11, '10': 10, '13': 13, '12': 12, '15': 15, '14': 14, '17': 17, '16': 16, '19': 19, '18': 18, '1': 1, '0': 0, '3': 3, '2': 2, '5': 5, '4': 4, '7': 7, '6': 6, '9': 9, '8': 8}, 'name_to_idx': {'11': 11, '10': 10, '13': 13, '12': 12, '15': 15, '14': 14, '17': 17, '16': 16, '19': 19, '18': 18, '1': 1, '0': 0, '3': 3, '2': 2, '5': 5, '4': 4, '7': 7, '6': 6, '9': 9, '8': 8}}
>>> M_c
{'idx_to_name': {'0': 0}, 'column_metadata': [{'code_to_value': {}, 'value_to_code': {}, 'modeltype': 'normal_inverse_gamma'}], 'name_to_idx': {0: 0}}
>>> state = State.p_State(M_c, T)

Then nothing happens.

"pip install crosscat" fails

Dockerfile and output below. It builds fine from branch master with python setup.py build. Maybe the cythonize C++ code has gone stale.

# Dockerfile that builds, installs, and tests bayeslite. It is for development
# only; users should use the python package.

FROM        ubuntu:15.10
RUN         apt-get update -qq --fix-missing

# Installation dependencies:
RUN apt-get install -y -qq python2.7-dev python-pip git apt-utils pkg-config \
    libfreetype6-dev libboost-dev liblapack-dev gfortran git
RUN pip install setuptools virtualenv

WORKDIR /bayesdb
RUN     pip -q install pyzmq ipython[notebook]==3.2.1 cython numpy==1.8.2 \
        matplotlib==1.4.3 scipy pandas
RUN     BOOST_ROOT=/usr/include pip install crosscat
src/cython_code/State.cpp: In function 'int __pyx_pf_8crosscat_11cython_code_5State_7p_State___cinit__(__pyx_obj_8crosscat_11cython_code_5State_p_State*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*, PyObject*)':

src/cython_code/State.cpp:3765:263: error: no matching function for call to 'State::State(boost::numeric::ublas::matrix<double>&, std::vector<std::__cxx11::basic_string<char> >&, std::vector<int>&, std::vector<int>&, std::vector<int>&, std::__cxx11::string&, std::__cxx11::string&, std::vector<double>&, std::vector<double>&, std::vector<double>&, std::vector<double>&, int&, int&, int&)'

simple_predictive_sample_unobserved draws randomly even for constrained columns

If you constrain column 3 to be 42, and sample column 3, Crosscat draws randomly instead of returning 42 as one might expect. This results in strange results in bayesdb like:

bayeslite> SIMULATE Murder FROM states_cc GIVEN Murder = 1 LIMIT 4;
Murder
-------------
2.96562662884
9.17781629692
4.15232993703
2.88644682395

Readme Commands Don't Work w/ Ubunutu Server 14.04.2 Server 64bit

Running the commands (plus sudo apt-get install git to get git) listed in the readme result in errors in a fresh virtual machine running in Hyper-V on Windows 8.1 64bit.

When attempting a build of setup.py, "error:command 'x86_64-linux-gnu-gcc' failed with exit status 1\r\nerror typing to exec 'cc1plus': execvp: No such file or directory" error occurs, ending the build.

When attempting to install after that, the same error occurs.

When attempting to run the example code, "ImportError: No module named crosscat.settings" occurs.

When attempting the pip install variation (obviously now running sudo apt-get install python-pip before attempting), the install loops through the same warning for many many modules "cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for c++ [enabled by default]".

Seems like it shouldn't be this hard to get this thing to run.

Evaluating predictive probability for unobserved row throws assertion error with constraints

In sample_utils.py invoking the function

def simple_predictive_probability_unobserved(M_c, X_L, X_D, Y,
    query_row, query_columns, elements)

With constraints Y not equal to [] causes an assertion error in the line:

# cluster_logps should logsumexp to log(1)
assert(numpy.abs(logsumexp(cluster_logps)) < .0000001)

Here is example:

File "/home/riastradh/crosscat/master/build/lib.linux-x86_64-2.7/crosscat/utils/sample_utils.py", line 122, in simple_predictive_probability_unobserved
    assert(numpy.abs(logsumexp(cluster_logps)) < .0000001)
AssertionError: assert 4.0103806420029304 < 1e-07
 +  where 4.0103806420029304 = <ufunc 'absolute'>(-4.0103806420029304)
 +    where <ufunc 'absolute'> = <ufunc 'absolute'>
 +      where <ufunc 'absolute'> = numpy.abs
 +    and   -4.0103806420029304 = logsumexp(array([-4.10965973, -6.86404678, -9.5251692 , -7.78917709, -8.61550711]))

Error in running the example (crosscat.settings module)

python crosscat/examples/dha_example.py crosscat/www/data/dha.csv --num_chains 2 --num_transitions 2

Traceback (most recent call last):
File "crosscat/examples/dha_example.py", line 25, in
import crosscat.settings as S
ImportError: No module named settings

Machine: Ubuntu 15.04 x86_64 with Anaconda. Crosscat built from github source (same error with simple pip install)

Python 3 compatibility

Based on the following install error from pip, it appears that crosscat is not Python 3 compatible:

$ pip install git+https://github.com/probcomp/crosscat.git
Collecting git+https://github.com/probcomp/crosscat.git
  Cloning https://github.com/probcomp/crosscat.git to /var/folders/7q/rgspm2s915d7nctflyjjgx5r0000gn/T/pip-gu18rboj-build
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/var/folders/7q/rgspm2s915d7nctflyjjgx5r0000gn/T/pip-gu18rboj-build/setup.py", line 53, in <module>
        version = get_version()
      File "/var/folders/7q/rgspm2s915d7nctflyjjgx5r0000gn/T/pip-gu18rboj-build/setup.py", line 15, in get_version
        if version.endswith('+'):
    TypeError: endswith first arg must be bytes or a tuple of bytes, not str

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /var/folders/7q/rgspm2s915d7nctflyjjgx5r0000gn/T/pip-gu18rboj-build
fonnescj on Christy.local in ~/Repositories/

Are there plans for compatibility?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.