eltrompetero / coniii Goto Github PK

Convenient Interface to Inverse Ising (ConIII)

License: MIT License

Python 43.62% TeX 1.84% Jupyter Notebook 32.23% Shell 0.08% HTML 21.65% C++ 0.58%

python statistics maxent ising inverse-ising spin-glass inference

coniii's Introduction

Convenient Interface to Inverse Ising

ConIII is a Python package for solving maximum entropy problems with a focus on the pairwise maximum entropy model, also known as the inverse Ising problem.

If you use ConIII for your research, please consider citing the following:

Lee, E.D. and Daniels, B.C., 2019. Convenient Interface to Inverse Ising (ConIII): A Python 3 Package for Solving Ising-Type Maximum Entropy Models. Journal of Open Research Software, 7(1), p.3. DOI: http://doi.org/10.5334/jors.217.

The paper also contains an overview of the modules. For code documentation, see here.

Installation

To set up an Anaconda environment called "test" and install from pip, run the following code. The openblas package is only recommended for AMD users.

$ conda create -n test -c conda-forge python=3.10 numpy scipy numba cython jupyter ipython multiprocess boost==1.74 matplotlib mpmath blas=*=openblas
$ pip install coniii

If you have trouble using pip, then you can always build this package from source. The following code will down download the latest release from GitHub and install the package. Make sure that you are running Python 3.10 and have boost v1.74.0 installed.

$ git clone https://github.com/eltrompetero/coniii.git
$ cd coniii
$ ./pypi_compile.sh
$ pip install dist/*.whl

Setting up exact solution for systems N > 9

If you would like to use the Enumerate solver for system sizes greater than 9 spins, you must run enumerate.py to write those files yourself. This can be run from the install directory. If you do not know where the installation directory is, you can find it by starting a Python terminal and running

>>> import coniii
>>> coniii.__path__

Once inside the install directory, you can run in your bash shell

$ python enumerate.py [N] 1

where [N] should be replaced by the size of the system. This specifies that the system should be written for the {-1,1} basis. Note that the package uses the {-1,1} basis by default. For more details, see the __main__ block at the end of the file enumerate.py.

For the {0,1} basis, use

$ python enumerate.py [N]

Quick guide with Jupyter notebook

A Jupyter notebook with a brief introduction and examples for how to use ConIII is available. The notebook is also installed into your package directory if you used pip.

To use the notebook, install jupyter such as by following the setup instructions above. Then, copy the notebook file "usage_guide.ipynb" into a directory outside the "coniii" directory. Change to this directory and run

$ jupyter notebook

This should open the notebook in your default web browser.

Troubleshooting

This package is only maintained for Python 3 and has only been tested for Python 3.10. Check which version of Python you are running in your terminal with

$ python --version

ConIII has been tested on the following systems

Ubuntu 20.04.5

Trouble compiling the Boost extension manually? Check if your Boost library is included in your path. If it is not, then you can add an include directory entry into the EXTRA_COMPILE_ARGS variable in "setup.py" before compiling.

Support

Please file an issue on the GitHub if you have any problems or feature requests. Provide a stack trace or other information that would be helpful in debugging. For example, OS, system configuration details, and the results of unit tests. Unit tests can be run by navigating to the package directory and running

$ pytest -q

The package directory can be found by running inside python

>>> import coniii
>>> coniii.__path__

You may also need to install pytest.

$ conda install -c conda-forge pytest

Updating

When updating, please read the RELEASE_NOTES. There may be modifications to the interface including parameter names as we make future versions more user friendly.

Documentation.

coniii's People

Contributors

Stargazers

Watchers

Forkers

rcofre nitrogenase mjbommar pombredanne rodrigogaf mitzaye willst-db rdguerrerom mazenslab al-borno-lab

coniii's Issues

Pseudolikelihood is slow

I suspect that pseudolikelihood could be made much faster for the general case.

Boost C++ extension fails to compile with Xcode 11.5

This was working before I last updated Xcode, but now I get while compiling

g++ -bundle -undefined dynamic_lookup -L/Users/eddie/anaconda3/envs/scotus4/lib -arch x86_64 -L/Users/eddie/anaconda3/envs/scotus4/lib -arch 
x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.7/./cpp/samplers.o build/temp.macosx-10.7-x86_64-3.7/./cpp/py.o -lboost_python37 -lboost_
numpy37 -L/usr/local/lib -L/usr/local/lib/boost_1_72_0/stage/lib -o build/lib.macosx-10.7-x86_64-3.7/coniii/samplers_ext.cpython-37m-darwin.s
o                                                                     
clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
ld: warning: directory not found for option '-L/usr/local/lib/boost_1_72_0/stage/lib'
ld: library not found for -lstdc++                                    
clang: error: linker command failed with exit code 1 (use -v to see invocation)
*****************************************************                 
Boost not compiled. See above errors for g++ message.                                                                                        
*****************************************************

RegularizedMeanField sometimes fails

I am trying to run RMF solver with synthetic data, but sometimes it results in an error. Here is the traceback:

File "python3.7/site-packages/coniii/solvers.py", line 1964, in solve solution = minimize_scalar(func)
File "python3.7/site-packages/scipy/optimize/_minimize.py", line 770, in minimize_scalar return _minimize_scalar_brent(fun, bracket, args, **options)
File "python3.7/site-packages/scipy/optimize/optimize.py", line 2141, in _minimize_scalar_brent brent.optimize()
File "python3.7/site-packages/scipy/optimize/optimize.py", line 1925, in optimize xa, xb, xc, fa, fb, fc, funcalls = self.get_bracket_info()
File "python3.7/site-packages/scipy/optimize/optimize.py", line 1899, in get_bracket_info xa, xb, xc, fa, fb, fc, funcalls = bracket(func, args=args)
File "python3.7/site-packages/scipy/optimize/optimize.py", line 2324, in bracket fa = func(*(xa,) + args)
File "python3.7/site-packages/coniii/solvers.py", line 1935, in func isingSamples = samples(J)
File "python3.7/site-packages/coniii/solvers.py", line 1917, in samples self.multipliers = np.concatenate([J.diagonal(), squareform(mean_field_ising.zeroDiag(-J))])
File "python3.7/site-packages/scipy/spatial/distance.py", line 2193, in squareform is_valid_dm(X, throw=True, name='X')
File "python3.7/site-packages/scipy/spatial/distance.py", line 2269, in is_valid_dm 'symmetric.') % name)
ValueError: Distance matrix 'X' must be symmetric.

automate generation of Ising equation files

If Ising equation files that are not provided with installation are generated at runtime, this would make it easier for users.

Inconsistency in Pseudo with general_case option

Setting general_case=True gives a different result from general_case=False in Pseudo.solve. Is this supposed to be the case?

See usage_guide.ipynb. Results are slightly different there, but there are serious disagreements in some other examples.

Numba: cannot cache function... no locator available for file

Downloaded master branch.
installed with anaconda python3 within home directory.
python3 setup.py install
Try to import. Got error. Seems to be numba has problem caching.

Same problem with the latest release at https://github.com/eltrompetero/coniii/releases

>>> import coniii
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
  File "<...>/anaconda3/lib/python3.6/site-packages/coniii-1.1.4-py3.6.egg/coniii/__init__.py", line 23, in <module>
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
  File "<...>/anaconda3/lib/python3.6/site-packages/coniii-1.1.4-py3.6.egg/coniii/solvers.py", line 32, in <module>
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
  File "<...>/anaconda3/lib/python3.6/site-packages/coniii-1.1.4-py3.6.egg/coniii/utils.py", line 34, in <module>
  File "<...>/anaconda3/lib/python3.6/site-packages/numba/decorators.py", line 191, in wrapper
    disp.enable_caching()
  File "<...>/anaconda3/lib/python3.6/site-packages/numba/dispatcher.py", line 564, in enable_caching
    self._cache = FunctionCache(self.py_func)
  File "<...>/anaconda3/lib/python3.6/site-packages/numba/caching.py", line 614, in __init__
    self._impl = self._impl_class(py_func)
  File "<...>/anaconda3/lib/python3.6/site-packages/numba/caching.py", line 349, in __init__
    "for file %r" % (qualname, source_path))
RuntimeError: cannot cache function 'sub_to_ind': no locator available for file '<...>/anaconda3/lib/python3.6/site-packages/coniii-1.1.4-py3.6.egg/coniii/utils.py'

Ising equation files are a few times slower in v1.1.x

Seems that logsumexp is responsible for a slowdown. As an example, when comparing the two different version:

In [1]: np.random.seed(0)
In [2]: hJ = random.normal(size=15,scale=.3)                             
In [3]: from coniii.ising_eqn import ising_eqn_5 as ising                
In [4]: %timeit ising.calc_observables(hJ)                               
642 µs ± 6.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  
In [5]: from coniii.ising_eqn import ising_eqn_5 as ising                
In [6]: %timeit ising.calc_observables(hJ)                               
1.52 ms ± 23.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

v1.1.2 is the slower one.

check type for system size variable n

In Solvers and Samplers

GLIBC_2_29 not found when importing coniii

I installed coniii using a conda environment without any errors. Unfortunately when importing the package I got the following error

  File "/home/romuald/miniconda3/envs/ising/lib/python3.10/site-packages/coniii/samplers.py", line 40, in <module>
    from .samplers_ext import BoostIsing, BoostPotts3
ImportError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by /home/romuald/miniconda3/envs/ising/lib/python3.10/site-packages/coniii/samplers_ext.cpython-310-x86_64-linux-gnu.so)

On our computational server unfortunately we have an older version:

ldd --version
ldd (Ubuntu GLIBC 2.27-3ubuntu1.6) 2.27

Is it possible to install a version compatible with this library (or is there a workaround?)?
I would be grateful for your help!

Make Samplers classes interface same as Solvers

It would be simpler if these were all the same.

Add basic examples using command-line

For those who don't want to use jupyter notebooks, it would be useful to have some straight python scripts and/or command-line examples of typical use cases. E.g. below is a script for loading data and solving the inverse Ising problem using MPF.

# MPF.py
#
# Bryan Daniels
# 11.25.2019
#
# Simple script that uses coniii.MPF to solve an inverse Ising problem
# using the Minimum Probability Flow algorithm.
#

import coniii
import sys

if len(sys.argv) != 3:
    print("Usage: python MPF.py input_data.txt output_multipliers.txt")
else:
    samples = coniii.np.loadtxt(sys.argv[1])
    _,calc_observables,_ = coniii.define_ising_helpers_functions()
    solver = coniii.MPF(len(samples[0]),calc_observables=calc_observables,adj=coniii.adj)
    estMultipliers = solver.solve(samples)
    coniii.np.savetxt(sys.argv[2],estMultipliers[0])

Pseudo doesn't use Hessian

Pseudo should implement calculation for Hessian (as is written in the paper), which would also make it faster.

Incorporate Emcee MCMC sampler?

I ran across Emcee, which implements MCMC sampling in Python:

https://emcee.readthedocs.io/en/stable/

This could be useful for coniii as another well-tested sampler to compare with. (I haven't looked too deeply to understand its performance or efficiency.)

No define_ising_helpers_functions & pseudolikelihood scaling

pip3 install coniii

followed : https://github.com/eltrompetero/coniii/blob/master/ipynb/usage_guide.ipynb

>>> from coniii import *
>>> define_ising_helpers_functions()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'define_ising_helpers_functions' is not defined

Unit test also used that function: https://github.com/eltrompetero/coniii/blob/py3/coniii/test_solvers.py

What is the correct way to use the pseudo likelihood to fit Ising model?

Error in MCH solve

Hi developers of Coniii,

I been trying to use the MCH solver but failed to get it running as I run into this error.
TypeError: No matching definition for argument type(s) array(int8, 2d, C), array(float64, 1d, C)

TraceBack:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-50-ac6cc4f8a810>](https://localhost:8080/#) in <module>
     12 
     13 soln = solver.solve(maxiter=40,
---> 14                     custom_convergence_f=learn_settings, iprint=True)

3 frames
[/content/coniii/coniii/solvers.py](https://localhost:8080/#) in solve(self, initial_guess, constraints, tol, tolNorm, n_iters, burn_in, maxiter, custom_convergence_f, iprint, full_output, learn_params_kwargs, generate_kwargs)
    906             if iprint:
    907                 print("Iterating parameters with MCH...")
--> 908             self.learn_parameters_mch(thisConstraints, constraints, **learn_params_kwargs)
    909             if iprint=='detailed':
    910                 print("After MCH step, the parameters are...")

[/content/coniii/coniii/solvers.py](https://localhost:8080/#) in learn_parameters_mch(self, estConstraints, constraints, maxdlamda, maxdlamdaNorm, maxLearningSteps, eta)
   1016 
   1017             # Predict distribution with new parameters.
-> 1018             estConstraints = self.mch_approximation(self.model.sample, dlamda)
   1019             distance = np.linalg.norm(estConstraints - constraints)
   1020 

[/content/coniii/coniii/utils.py](https://localhost:8080/#) in mch_approximation(samples, dlamda)
    987     def mch_approximation(samples, dlamda):
    988         """Function for making MCH approximation step for Ising model."""
--> 989         dE = calc_e(samples, dlamda)
    990         ZFraction = len(dE) / np.exp(logsumexp(-dE))
    991         predsisj = pair_corr(samples, weights=np.exp(-dE)/len(dE), concat=True) * ZFraction

[/usr/local/lib/python3.7/dist-packages/numba/core/dispatcher.py](https://localhost:8080/#) in _explain_matching_error(self, *args, **kws)
    701         msg = ("No matching definition for argument type(s) %s"
    702                % ', '.join(map(str, args)))
--> 703         raise TypeError(msg)
    704 
    705     def _search_new_conversions(self, *args, **kws):

TypeError: No matching definition for argument type(s) array(int8, 2d, C), array(float64, 1d, C)

Below is my code

# Define common functions.
calc_e,calc_observables,mchApproximation = utils.define_ising_helper_functions()


sisj = pair_corr(spins, exclude_empty=True, concat=True)

# Declare and call solver.
solver = MCH(np.ascontiguousarray(spins,dtype=np.int8),
             sample_size=1000,
             rng=np.random.RandomState(0),
             n_cpus=1,
             mch_approximation = mchApproximation)

# Define function for changing learning parameters as we converge.
def learn_settings(i):
    """
    Take in the iteration counter and set the maximum change allowed in any given
    parameter (maxdlamda) and the multiplicative factor eta, where
    d(parameter) = (error in observable) * eta.

    Additional option is to also return the sample size for that step by returning a
    tuple. Larger sample sizes are necessary for higher accuracy.
    """
    return {'maxdlamda':exp(-i/5.),'eta':exp(-i/5.)}

soln = solver.solve(maxiter=40,
                    custom_convergence_f=learn_settings, iprint=True)

let me know what I should change or adjust for it to run. Thank you!

Sphinx doc is not showing on readthedocs.org

Maximum system size?

For the MCH method, I am wondering if there is a maximum system size? n=35 system size with the same MCH parameters, the results returned make sense, and then precisely at the n >= 36 barrier, the returned couplings have extremely different behaviour. This occurs for any randomly selected n = 36 (or greater) subset of my data.

MCH does not return same results with fixed rand seed

The following code should always return the same solution, but it does not:

solver = MCH(X,
             sample_size=10_000,
             rng=np.random.RandomState(0),
             calc_observables=calc_observables,
             model=model,
             mch_approximation=mch_approximation)

# Define function for changing learning parameters as we converge.
def learn_settings(i):
    """
    Take in the iteration counter and set the maximum change allowed in any given 
    parameter (maxdlamda) and the multiplicative factor eta, where 
    d(parameter) = (error in observable) * eta.
    
    Additional option is to also return the sample size for that step by returning a 
    tuple. Larger sample sizes are necessary for higher accuracy.
    """
    return {'maxdlamda':exp(-i/5.)*.5,'eta':exp(-i/5.)*.5}

# Run solver.
solver.solve(initial_guess=model.multipliers,
             maxiter=30,
             custom_convergence_f=learn_settings,
             n_iters=500,
             burn_in=1_000);

choosing basis for solvers

Hello, I know the default basis for the ising model in this package is {-1,1}, but is it possible to select a different basis {0,1} when running the solvers?

easy Ising sampling

I wanted to do some very basic sampling from a pairwise Ising model—I guess I assumed this would be a one-liner in coniii, but not so (or not yet!).

I think this could be useful to include even if it's not used at a low level (since we will probably want to remain open to more general maximum entropy models).

Here's one implementation. Is this the best way to do it?

def sample_ising(Jflat,n_samples,n_cpus=1,seed=0):
    """
    Jflat            : N individual parameters followed by 
                       N(N-1)/2 pairwise parameters (ordered
                       as in np.triu_indices(N,k=1))
    """
    # set up coniii metropolis sampler
    calc_e_fast, calc_observables, _ = coniii.utils.define_ising_helper_functions()
    rng = np.random.RandomState(seed=seed)
    multipliers = Jflat
    n = 0.5 * (-1 + np.sqrt(1 + 8*len(multipliers)) )
    assert n == int(n),"The length of Jflat does not correspond to an integer number of spins"
    calc_e = lambda s, multipliers : -calc_observables(s).dot(multipliers)
    sampler = coniii.samplers.Metropolis( int(n), multipliers, calc_e,
                                          n_cpus=n_cpus, rng=rng )
    
    # generate samples
    if n_cpus > 1:
        sampler.generate_samples_parallel(n_samples)
    else:
        sampler.generate_samples(n_samples)
    return sampler.samples

Notes:

I didn't use calc_e_fast due to typing issues, but maybe I just need to put things into numpy arrays?
We may want to catch non-integer n within the Solver class more generally.
Running this in the simplest case—e.g. sample_ising([0,0,0],100)—gives poor results because when the energy is always zero every spin is always flipped, meaning you always flip between only the initial state and its inverse. This is of course a problem with the metropolis algorithm generally, but we might avoid this by e.g. starting from multiple starting points by default (I haven't yet thought this through carefully). Running with larger n_cpus already effectively does this.

bug in MCH: requires enumeration

Hello,

I had another question if that’s ok. I am trying to use the MCH method to solve an ising system, for a system of size 50. I assumed from the documentation that its only necessary to run enumerate.py to generating the ising equations, if you are using the Enumerate solver. However, when I try to run the MCH method, I get the error “Python file enumerating the Ising equations for system for size must be written”

Thank you so much,
Shazia

On Apr 22, 2021, at 5:13 PM, Eddie Lee @.***> wrote:

Closed #25 #25.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub #25 (comment), or unsubscribe https://github.com/notifications/unsubscribe-auth/ATZL6SGF5SSW4YOLUKEFHODTKA4INANCNFSM43MOBECQ.

Originally posted by @saynbabul in #25 (comment)

User manual not up to date

The user manual is filled with code that does not run as advertised.

Streamline typical use cases

A typical use case currently takes more steps than maybe it should. E.g. to solve an inverse ising problem using MPF:

_,calc_observables,_ = coniii.define_ising_helpers_functions()
solver = coniii.MPF(len(samples[0]),calc_observables=calc_observables,adj=coniii.adj)
estMultipliers = solver.solve(samples)

Ideally, I would think we could do something like

solver = coniii.MPF(samples)
estMultipliers = solver.solve()

where MPF would default to inverse ising.

Also, we may want to return the multipliers in a friendlier form by default (e.g. a matrix instead of a list).

better robustness in multiplier member type

Int-type array for multipliers leads to incorrect sampling in MCH. This should be robust to type of array.

order convention for j_ij matrix

what is the ordering convention for the j_ij coupling matrix? it seems like it is a n * (n-1) / 2 sized matrix, but how is it ordered? I have a full, n*n j_ij matrix (symmetric, with 0 diagonals) that I want to convert to your convention, but I'm not exactly sure how to.

@eltrompetero Thanks for the great software package! I'm looking forward to using it.

Method for chaining together different solvers

Ideally, one would use a heuristic solver (Pseudo or MPF) to get a fast, approximate solution then feed that into a slower, more accurate method (like MCH). Could we automate this using some kind of chaining function?

This would also need to preserve the intermediate states in case users would want to go back and use a previous solution.

calc_observables — a little more documentation?

Hello genius gurus —

We're booting up your code to do fun things!

One thing I was hoping to get a little more information on is how to set "calc_observables" to something other than the default.

It would be very helpful (for example!) to see how one might define a different calc_observables function, and pass it to the system to fit. For example, how would I write a function that fixed (for example) a few of the triplet correlations as well?

(In the end, we're planning to use this with some missing data, so we want to redefine calc_observables so that it's matching just the stuff we know — e.g., if we have 500 observations of a 20-spin system, maybe for observation 451 we only have 18 of the spins measured; we want to make sure that the expectation values are correctly calculated.)

Any help very gratefully received! I have a feeling that this might be useful to others.

BTW we are planning right now just to use the MPF — but perhaps if we get calc_observables working correctly, it can be slotted in to the MCH as well?

Thank you for any help,

The Pittsburgh Gang

Need to handle `coniii.path` in LICENSE check

See L160 below in enumerate.py and and email discussion:

coniii/coniii/enumerate.py

Line 160 in c6c960f

license = open('../LICENSE.txt','r').readlines()

Code could be updated to permit choice of basis when instantiating solvers.