molpopgen / fwdpy11 Goto Github PK
View Code? Open in Web Editor NEWForward-time simulation in Python using fwdpp
Home Page: https://molpopgen.github.io/fwdpy11
License: GNU General Public License v3.0
Forward-time simulation in Python using fwdpp
Home Page: https://molpopgen.github.io/fwdpy11
License: GNU General Public License v3.0
So, the new parent_data branch is seg faulting on GNU/Linux but not OS X. I have attached a MRE.
centos gcc:
$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
os x "gcc"
$ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin15.6.0
Here's an MRE:
import fwdpy11 as fp11
import fwdpy11.multilocus as fp11ml
from fwdpy11.model_params import MlocusParamsQ
import fwdpy11.wright_fisher_qtrait as fp11qt
import fwdpy11.trait_values as fp11tv
import fwdpy11.wright_fisher_qtrait as wfq
import math
import numpy as np
N, theta, rho, trait_mu, s, h, repid, seed = 100, 10.0, 10.0, 1e-4, 0.1, 0.5, 0, 0
nloci = 2
rng = fp11.GSLrng(seed)
class ExponentialFitnessShift:
def __init__(self, shift_gen):
self.s = 0
self.shift_gen = shift_gen
def update(self, pop):
if pop.generation >= self.shift_gen:
self.s = 1
def __call__(self, g, e):
return math.exp(g*self.s)
pdict = dict(demography=np.array([N]*(10*N)),
nregions=[[fp11.Region(0, 1, 1)], [fp11.Region(2, 3, 1)]],
sregions=[[], [fp11.regions.ConstantS(2, 3, 1, s=s, h=h)]],
recregions=[[fp11.Region(0, 1, 1)], [fp11.Region(2, 3, 1)]],
recrates=[rho/float(4*N), rho/float(4*N)],
mutrates_n=[theta/float(4*N), theta/float(4*N)],
mutrates_s=[0.0, trait_mu],
interlocus=fp11ml.binomial_rec([0.5]),
agg=fp11ml.AggAddTrait(),
gvalue=fp11ml.MultiLocusGeneticValue([fp11tv.SlocusAdditiveTrait(2.0)]*nloci),
trait2w = ExponentialFitnessShift(10*N),
prune_selected=True, # new dev feature, sets the simulations to take all
# fixations out of the gametes entirely. When False, they are copied
# into pop.fixations but kept in gametes (according to KT).
)
pop = fp11.MlocusPop(N, nloci, [(0.0, 1.0), (2.0, 3.0)])
params = MlocusParamsQ(**pdict)
pops = wfq.evolve(rng, pop, params)
There's a warning re: pruning selected fixations in one of the unit tests.
A C++ exception is raised when running the doc tests. This was discovered after fixing #23.
This script, courtesy of @vsbuffalo, shows that fwdpy11 0.1.2 will segfault if rates > 0 are applied to "loci" containing no "regions". This needs to be checked in the model params classes.
import fwdpy11 as fp11
import fwdpy11.wright_fisher_qtrait as wfq
import fwdpy11.model_params
import fwdpy11.trait_values as fp11tv
# import fwdpy11.sampling
import fwdpy11.wright_fisher_qtrait as fp11qt
import fwdpy11.multilocus as fp11ml
import numpy as np
rng = fp11.GSLrng(42)
N = 1000
theta = 100
rho = 100
trait_mu, trait_sd = 1e-4, 0.1
nloci = 2
loci = {'sregions': [[fp11.GaussianS(0, 1, trait_mu, trait_sd)], []], 'nregions': [[], [fp11.Region(2, 3, 1)]],
'recregions': [[fp11.Region(0, 1, 1)], [fp11.Region(2, 3, 1)]]}
locus_boundaries = [(0, 1), (2, 3)]
moving_optima = fp11qt.GSSmo([(0, 0, 1)])
pdict = {'demography': np.array([N]*N, dtype=np.uint32),
'agg': fp11ml.AggAddTrait(),
'gvalue': fp11ml.MultiLocusGeneticValue([fp11tv.SlocusAdditiveTrait(2.0)]*nloci),
'trait2w': moving_optima,
#'mutrates_s': [trait_mu/float(nloci)]*nloci,
'mutrates_s': [trait_mu/float(nloci),0.0],
#'mutrates_n': [float(nloci)*theta/float(4*N)]*nloci,
'mutrates_n': [0.,float(nloci)*theta/float(4*N)],
'recrates': [float(nloci)*rho/float(4*N)]*nloci,
'interlocus': fp11ml.binomial_rec(rng, [0.5]*(nloci-1))
}
pdict = {**pdict, **loci}
params = fp11.model_params.MlocusParamsQ(**pdict)
pop = fp11.MlocusPop(N, nloci, locus_boundaries)
#This function returns nothing:
wfq.evolve(rng, pop, params)
Tests of fixation properties are skipped on Travis because they takes minutes to run the simulations. However, the concepts can be tested using populations created on-demand and then evolved for a single generation.
Depending on what happens upstream with molpopgen/fwdpp#130, we may need to update the version of update_mutations
implemented for this project.
2.2.0 breaks fwdpy11. The most likely culprit is the new handling of opaque containers and bind_vector behavior in 2.2.0.
When testing out the new compilers in a clean env on Linux, the main package built fine but the unit tests did not. They could not find GSL headers. While this clearly seems like a compiler config issue, the following can be added to the mako headers to make sure that GSL's headers are found:
# Requires Python >= 3.5
import subprocess
GSL=subprocess.run(['gsl-config','--cflags'],stdout=subprocess.PIPE).stdout.strip(b'-I').decode().strip()
cfg['include_dirs'].extend([GSL, fp11.get_includes(), fp11.get_fwdpp_includes() ])
Curiously, the unit tests executed, suggesting no run-time linker issues.
The field order should be refactored so that the elements of a numpy array can be used to construct a new mutation instance.
cc @DL42
There are probably several use cases for this. @vsbuffalo has one, for example, when changing mutation effect sizes.
This could conflict with having regions auto-fill this field, but that would simply need to be documented, etc.
This is now possible via molpopgen/fwdpp#115
Currently, there are ways to have an MlocusPop with empty locus boundaries. This cause API headaches on both the C++ side and on the Python side.
The simulations store fixations sorted according to position. I believe the approach taken is naive and will lead to a problem where fixation times are mis-entered vis-a-vis the mutation that fixes. One solution is to use std::upper_bound instead of std::lower_bound. Another is to refactor the internal storage of fixations, which should be done upstream in fwdpp
Related: Issue #8
The current Sregions models distributions on "s" itself. However, it would be useful to allow the distribution to be on Ns, 2Ns, 4Ns, or whatever.
Hi Kevin!
We are thinking about using fwdpy11 for some forward simulations of Neanderthal demography. I was wondering if there is support for multiple populations with migration? I saw that it's on the long term todo (http://fwdpy11.readthedocs.io/en/dev/pages/todo.html), but Jeff mentioned there may be something on a dev branch that supports this.
Thanks!
Arun
This function is meant to assist pop creation, but the containers are not declared as opaque, meaning that it is working on a copy.
The simulations should release the GIL and re-acquire it only as needed.
Ideally, this would happen by 0.2.0.
Redoing these classes to use @property/@x.setter will help with error checking and reducing copy/paste of code.
Due to molpopgen/fwdpp#90, we will have to break pickle format compatibility when we update to what is currently fwdpp/dev.
We will need to document this, too.
Unit tests are failing under 3.6 due to changes to fwdpy11.regions from #21. The most likely cause is due to storing a callable in Sregion.
There's quite a bit of wrapper code in use to interface between Python, the end user, and the fwdpp back-end. It also seems a bit silly to have two additive functions, etc., that only differ in whether they're used for fitness or for trait value.
Ideally, this stuff would all get moved into a single module and some of the redundancy mentioned above gets abstracted away.
Upstream changes in molpopgen/fwdpp#79 should make a lot of this possible.
In order to prevent a huge increase in the code base, we should replace the use of fwdpp's popgenmut
with an equivalent type that also allows for multiple effects per mutation.
Once fwdpp/0.6 is used as the base, we can make this change in a way that is transparent to users.
The recent merge of #74 was in anticipation of this change.
The mutation type is modifiable in Python code. It should not be.
Thanks to @vsbuffalo for catching this.
The current development branch fails to build on RTD due to excessive memory use. The culprit is fwdpy11.fwdpy11_types
, which is massive. The memory use comes in large part from excessive template instantiation. Some of it is also unnecessary header inclusion.
fwdpp/sugar.hpp
from the module.It would be useful if these objects can be pickled, so that they can be sent to other process by, for e.g., concurrent.futures.
The limitation seems to be that some of the C++ types used to parameterize a simulation are non-pickleable.
Ideally, we'd be able to both pickle and copy.deepcopy these objects.
Thanks to @vsbuffalo for pointing this out.
The implementation of these classes is quite messy and reflects a lack of understanding of Python's OO idioms and how to properly use class properties.
A do-over is needed.
The current sampling member functions are implemented only for the Python classes. However, ideas like #113 would be more useful if the C++ side of a population also had this functionality. The Python function would then dispatch to the relevant C++ functions.
fwdpy11.wright_fisher_qtrait.evolve_regions_sampler_fitness is removing fixed variants affecting trait values. This is incorrect and will be fixed in 0.1.1.
The DES callbacks are implemented in terms of std::bind
. They should be re-implemented in terms of lambda expressions.
Issues like #48 illustrate the problems with build systems based on distutils, where managing the source/header dependency relationship properly is hard or impossible.
The solution is to subclass the build class to run ./configure and make via a subprocess.
It would be cleaner and clearer if add_mutation
and change_effect_size
were members of classes rather than standalone functions.
It would be nice if a mutation would return (pos, origin, esize) as a tuple.
Instead of struct foo {...};
and foo make_foo(...)
, we can replace all make_foo
functions with brace initialization.
If we allow diploids/mutations/gametes to be constructed by the user, then we're most of the way there to being able to seed a forward sim from the output of tools like msprime
This will depend on molpopgen/fwdpp#28. Once that's ready, we can update the submodule.
We currently rely on GSL's default error handling. Doing so is bad, and will crash the interpreter when an error occurs within GSL. As we're about to start adding more features depending on linear algebra, we need to fix this, which requires a small module that will be loaded from init.py.
Ideally by 0.2.0 or so.
When a mutation is flagged for recycling, all we do is set its count to zero. However, when a user is doing something like tracking a variant's frequency over time, this will lead to a zero value immediately after fixation. This happens because the mutation object is still present, meaning that its "key" still exists and is find-able. To prevent these oddities, we can set the position to max double, which will create a non-discoverable key.
Note: orignally, the thought was to use NaN, but that causes operator==
to start failing for mutations.
We should get caught up with the latest, described here. Allows C++14, etc.!
Different types of simulations store fixations either sorted or unsorted (according to position). This should be fixed, and be sorted, too.
Thanks to @vsbuffalo for suffering through this.
This issue is a reminder to look into the efficiency of this function. When fixations are happening rapidly, it is possible that this function is bogging things down.
This should be done. Interaction with #9 should be noted in the docs when this is addressed.
pybind11 >= 2.2.0 has deprecated the new of placement new, but the warning only shows up in debug mode.
The relevant dox are here.
In numpy-1.14, dtype field lookup has been changed to be by position/offset rather than by field name. If that version is installed, an ImportError
gets raised when importing fwdpy11
because dtype fields are declared in a different order than how they appear in the structs they reflect.
Another user of pybind11
has run into this: pybind/pybind11#1274.
This should be fixed ASAP and a new release of 0.13 should follow.
Currently, Population.mut_lookup
returns a list of (position, index) tuples for each mutation. It may be preferable to return a dict with (position, list of indices) key/value pairs.
Additionally, it may be useful to have separate function returning a list of keys for mutations at a given position. The function should return None if no extant keys exist for a given position.
cc @DL42
The C++ and Python classes contain popdata
and popdata_user
. Both are instances of pbybind11::object
. The latter is redundant with the concept of a temporal sampler and should be removed. The former should be replaced by something like unique_ptr<ModelData>
where ModelData
is an abstract class exposed to Python as an ABC.
The reason to make these changes is that holding Python objects on the C++ side is a barrier to releasing the GiL during simulations. (See #50)
A minimal API for ModelData
could look like:
struct ModelData
{
virtual std::string serialize() const = 0;
virtual void deserialize(std::istream &) = 0;
virtual unique_ptr<ModelData> clone() const = 0;
};
The intent of such structures is to allow tracking of more complex model features. For example, the details of a landscape. However, one could argue that this violates the single responsibility criterion, and thus these types should simply be eliminated. (For e.g., the features of a landscape are properties of the landscape and not the population.) Such removal would indeed be simpler...
The __init__ calls super(SlocusParams,self) rather than super(SlocusParamsQ,self).
We should allow for "power users" to write such samplers entirely in C++. This is easy in principle, but brings up supporting C++ and Python samplers transparently. For "evolve" functions, we'd need to set up code paths overloaded to separate out these two concepts.
This change could allow simulations to be run with the GiL released (#50) and C++ samplers could still grab the GiL to modify and Python objects.
I've been trying to do some profiling, and the lack of debug symbols is causing some unexpected havoc with Linux perf. I don't really want to enable the assertions, since I would like to get an accurate measurement of the code in production.
Unless the debugging symbols make the shared objects very large indeed, it would be better for me if they were left in by default. That way I can watch a live execution with 'perf top' and see exactly where the time is being spent.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.