icb-dcm / pyabc Goto Github PK
View Code? Open in Web Editor NEWdistributed, likelihood-free inference
Home Page: https://pyabc.rtfd.io
License: BSD 3-Clause "New" or "Revised" License
distributed, likelihood-free inference
Home Page: https://pyabc.rtfd.io
License: BSD 3-Clause "New" or "Revised" License
Apparently, building the docs fails due to vector graphics inclusion in latex compilation no longer being supported on the platform.
Allow to have everything on a log-scale and only move to linear scale at the last point, after summing, normalizing etc.
This helps avoid numerical issues.
When working on the population, e.g. via update_distances() or to_dict(), make sure we do not run into difficulties from referencing. Therefore, the thing to do should be that when the population data are changed, this happens only on a copied population, which is then returned. Here, the copying should go just as deep as components are changed.
Before being saved into the database, the weights of particles are normalized s.t. all particles belonging to one model have a summed weight of 1. However, there are multiple problems:
Solution:
w = w/sum(w)
. Then the analyses should be right, I guess.Maybe it would also make sense to not normalize the weights that are inserted into the database at all. Then it should be easiest to always apply the correct normalization depending on what's needed.
In https://github.com/ICB-DCM/pyABC/blob/master/pyabc/sampler/redis_eps/sampler.py, the keys from https://github.com/ICB-DCM/pyABC/blob/master/pyabc/sampler/redis_eps/cmd.py are used. These are the same for abc runs running in parallel, so that they get mixed up. I.e. a server can currently only be used for one abc run at a time. This problem came up several times recently.
To solve this, one could prefix all keys in redis with a uuid (https://docs.python.org/3.7/library/uuid.html), either per abc run or per population (thanks @neuralyzer for explaining).
Running the visualization server via abc-server /tmp/test.db
fails instantly (up-to-date master branch).
Error message:
Traceback (most recent call last):
File "/home/yannik/anaconda3/bin/abc-server", line 11, in
load_entry_point('pyabc==0.8.20', 'console_scripts', 'abc-server')()
File "/home/yannik/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 570, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/home/yannik/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 2751, in load_entry_point
return ep.load()
File "/home/yannik/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 2405, in load
return self.resolve()
File "/home/yannik/anaconda3/lib/python3.6/site-packages/pkg_resources/init.py", line 2411, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/home/yannik/anaconda3/lib/python3.6/site-packages/pyabc/visserver/server.py", line 19, in
from bkcharts import Line # noqa: E402
File "/home/yannik/anaconda3/lib/python3.6/site-packages/bkcharts/init.py", line 17, in
from .builders.histogram_builder import Histogram
File "/home/yannik/anaconda3/lib/python3.6/site-packages/bkcharts/builders/histogram_builder.py", line 27, in
from .bar_builder import BarBuilder
File "/home/yannik/anaconda3/lib/python3.6/site-packages/bkcharts/builders/bar_builder.py", line 21, in
from bokeh.core.enums import Aggregation
ImportError: cannot import name 'Aggregation'
It looks like in smc._initialize_dist_and_eps (called in new), there is always a sample from the prior constructed to initalize distance and epsilon. To reduce computation overhead, it might be of interest to do this lazily, i.e. only when required by any of the components.
E.g. the adaptive distances are intended to render this pre-calibration unnecessary.
At least the Epsilon adaptations do not work correctly setting more than one simulation per proposed parameter. This feature is rarely used and not currently tested. So probably not a major issue and rarely done by users but we should still fix that of course.
Currently, e.g. the Transition and Acceptor are not saved to history, though this would be of interest while recovering passed runs. Thus, one should add these to History during initial database setup.
Attention: Backwards compatibility. In particular, the abc-server uses this information.
Kind of similar to Epsilon and DistanceFunction, introduce a class Acceptor or similar, which encodes the acceptance step.
This is so as to allow for more complex acceptance rules than the simple comparison
The best solution would seem to be to create a new class Acceptor which is passed to ABCSMC, and passed to the model in the accept() method, so that the user can easily override it.
The default kernel density estimators apparently sometimes over-smoothen. In particular the MultivariateNormalTransition
used for visualization is unsuited for multi-model landscapes (see attached picture). The problem in this case could be solved by reducing the scaling
factor.
.
At least, one should allow the user to specify a kde here.
Simplify / change the quickstart example to be as short as possible. Show and explain only the necessary basic lines of code always needed. Do not use model selection here (move to separate example), since this is probably not the most common application, rather parameter inference.
As done already for pypesto, add codecov and codacy to keep track of changes in the code quality and test coverage.
Also, show badges on the README.md page.
Following warning occurs on running pyabc:
FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
in:
pyabc/storage/history.py:200
pyabc/transition/multivariatenormal.py:64
pyabc/smc.py:729
among others.
The compilation of the documentation on readthedocs fails. This seems to be due to some name confusion of flask and flask-bootstrap: when searching for flask, flask-bootstrap is found as closest match, which does not make sense. We were so far not able to reproduce this error.
In storing and loading dataframes and numpy arrays, there are some build-around functions. I in particular wonder what the line https://github.com/ICB-DCM/pyABC/blob/master/pyabc/storage/numpy_bytes_storage.py#L50 is supposed to do? And could it fail in the special case of size-1 numpy arrays?
Generalize the MedianEpsilon to allow arbitrary quantiles, with both weighted and non-weighted distances.
In order to automatically make sure all tests are still working, it would be good to run all as part of the travis tests (maybe except ion_channels which might be a bit difficult to set up).
Pre-defined / updated as max. over all previous / guessed + updated?
Location: Comparator?
This feature is to my knowledge no-where really used. It causes trouble with weighting particles vs. summary statistics normalized to 1. And if multiple sampling per parameter is desired, this is very easy to implement on the user side in a much more flexible way.
If there are no objections, this feature will thus be discontinued.
This will be possible without changing the database format, though that is not completely clear yet.
The seaborn.PairGrid, as used in pyabc.visualization.plot_kde_matrix, seems to have changed its api. namely, one runs into TypeError: cannot concatenate object of type "class 'numpy.ndarray'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
, triggered by pyabc.visualization.py line 310 df = pd.concat((x,), axis=1)
. Reason seems to be that seaborn since version v0.9.0 (July 2018) converts an array to numpy before passing it on to the plotting function (https://github.com/mwaskom/seaborn/blob/master/seaborn/axisgrid.py#L1390).
Suggestion: We need the column name, so check whether this is passed somewhere else, or just drop seaborn.
To improve readability when running the provided jupyter notebook examples, not only viewing them on readthedocs,io, move esp. all mathematical expression to markdown text instead of reST, because the latter cannot be interpreted by jupyter.
Links to classes etc. will need to remain reST to allow relative linking.
Set up a new branching and commit system, so that developers always merge into develop (which is also monitored on travis and rtd, ...), and only fully-fledged stable versions are merged into master, getting a new version number. See also https://nvie.com/posts/a-successful-git-branching-model/.
Also, for every commit to master automatically (via travis) upload the new version to pypi, and created a tag (https://git-scm.com/book/de/v1/Git-Grundlagen-Tags).
Make samplers process and return only required summary statistics, in order to avoid communication overhead if distance functions only need the accepted summary statistics, but not all sampled summary statistics.
Create a new way of passing additional information to samplers (e.g. as by now, to record also non-accepted particles or a strategy of which particles to record), generically. Also, reduce code duplication in the different samplers.
Apart from the visualization, it would be good to also offer some functions that compute percentiles, a-posteriori expectations and related things, to summarize the a-posteriori distribution encoded in the sample population.
For the diagonal, the y axis label is wrong bc taken as the range of the y parameter.
The rtd build is on the limits of the allowed 15mint time frame. This is mainly bc there are many dependencies like r-base installed. I think that these are not necessary if we run all notebooks in advance, so that nbsphinx does not need to do that any more.
After a sample round has finished, check whether any assumption was hit (e.g. a distance found that is above the previously assumed maximum distance), re-evalulate the population, and possibly run some more samples.
Iterate this until sufficiently many samples accepted.
Allow to (probably from the samples) compute credible intervals for the parameters in the approximate posterior distribution.
Sometimes the redis server reportedly does not get notified when one of its workers got killed, and then waits for eternity for this worker to come back.
When exactly this happens has not been reproduced yet.
Implement comfort functions that allow to log to console or file. E.g. log_to_console(level=logging.DEBUG)
. Also use in the diverse loggers __name__
in logging.getLogger() to improve tracking back the messages to their respective modules, and having a super module pyabc.
Both on github and in the documentation on readthedocs, the correct paper for pyabc in its final version should be linked, as well as a "bibtex entry".
implement the possibility to have distances adapt weights according to data. Based on [Prangle. Adapting the ABC distance function. 2015] and adapted for pyabc.
It is not yet very clear from the documentation how to handle anaconda.
Make more explicit hot to set the paths and explain how to use source activate
From a theoretical points of view, convergence of the approximate posterior to the real posterior can be proven provided bounded eccentricity of the acceptance regions. Transfered to the adaptive distances, this implies that the ratio of maximum and minimum weight should be bounded in order to get the theoretical confirmation that the convergence is correct.
In practice, this seems to be of minor importance, yet it might be good to add a field to the implemented adaptive distance functions to allow the user to set a maximum eccentricity.
I tried to get running the example using_R
and I got the error that the package external is not found.
I came up with a solution, but I do not feel very confident yet with git to try to fix it....
This can be easily fixed by adding in __init__.py
the following line:
from .external import R
and then include the entry "External"
in the array __all__
This makes the example run smoothly.
in addition to #48 , since recently the readthedocs build does not start at all. It stops at conda install ...
using the environment.yml file, without any output.
Possible problem: R dependencies cannot be installed. Do we really need those on rtd?
In Epsilon.call(), always pass a valid population (maybe taken from history?). The SMC class should always make sure it gives the correction population. A problem to be checked is that this works reliably when the ABC resumed.
For predicting the acceptance rate, use weighted distances instead of unweighted ones.
When exporting, e.g. via abc-export --db results.db --out exported.feather --format feather
, sometimes the result table does not include all columns, e.g. not the parameter value ones. This seems to happen when exporting only a single generation (everything fine with --generation=all
), because then the tidy
functionality in https://github.com/ICB-DCM/pyABC/blob/master/pyabc/storage/history.py#L826 apparently messes things up a bit.
Make the epsmixin sampler fully object-oriented.
Also, there seem to be race conditions occuring sometimes and leading to errors, the reasons of which have not been discovered yet.
We ran two cases of the same model, once with 1000dim sumstats, once with 6dim sumstats. The number of samples was almost the same for both cases (200-400 for 100 acceptances), but the first one took several times as long. Once we tried accessing the database during the run, but this was not possible bc it was locked bc pyabc was currently writing into it, apparently, over a long time. This indicates that the writing to database for high-dim sumstats takes a non-neglectable time.
Possible solutions would be to have a flag to not store sumstats, or to move the file system operations to another thread (if available), so that pyabc can continue meanwhile. Maybe there is also a more efficient data format, but the SQL database is already quite nice and readable, so that I would not want to change that.
import pyabc
fails when git is not installed due to en error in the gitpython package.
Steps to reproduce:
import pyabc
In the notebook, there are some minor inaccuracies:
Y
component is observed, not X
At some point (not now) it might be interesting to adapt Algorithm 5 in [Prangle. Adapting the ABC distance function. 2015], since it gives an even improved guess of the distance weights. However, it is not so easily integrable with the pyABC framework (and might lead to increased sampling times when the weights are rather homogeneous?).
Small error in the documentation on this page: http://pyabc.readthedocs.io/en/latest/what.html
It reads:
What you don’t need
the likelihood function: p(parameter|data) is not required.
Should be this right?:
the likelihood function: p(data|parameter) is not required.
p.s. I really like this library!! Just what I needed and very easy to use so far.
When a ABCSMC run is continued (e.g. in order to run a few more populatons because the estimate is not satisfying yet), the smc.load() method is called. Here, not all parameters adjusted during the abc run can be re-initialized (e.g. distance weights, epsilon etc.). It must be checked in detail what is done here, and e.g. adapt the method to perform some history-based initialization (using the last populaton instead of the prior predictve distribution).
But works well enough so far.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.