molpopgen / fwdpy11 Goto Github PK

View Code? Open in Web Editor NEW

29.0 29.0 6.0 28.21 MB

Forward-time simulation in Python using fwdpp

Home Page: https://molpopgen.github.io/fwdpy11

License: GNU General Public License v3.0

Python 47.02% C++ 51.21% Shell 0.23% CMake 0.97% Dockerfile 0.11% C 0.01% Rust 0.44%

genetics genomics population-genetics simulation tree-sequences

fwdpy11's People

Contributors

Stargazers

Watchers

Forkers

vancleve rilab dl42 pythseq apragsdale grahamgower

fwdpy11's Issues

origin/parent_data segfault on CentOS 7.3.1611, but works on OS X 10.11.6

So, the new parent_data branch is seg faulting on GNU/Linux but not OS X. I have attached a MRE.

centos gcc:

$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)

os x "gcc"

$ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin15.6.0

Here's an MRE:

import fwdpy11 as fp11
import fwdpy11.multilocus as fp11ml
from fwdpy11.model_params import MlocusParamsQ

import fwdpy11.wright_fisher_qtrait as fp11qt
import fwdpy11.trait_values as fp11tv
import fwdpy11.wright_fisher_qtrait as wfq
import math
import numpy as np

N, theta, rho, trait_mu, s, h, repid, seed = 100, 10.0, 10.0, 1e-4, 0.1, 0.5, 0, 0
nloci = 2
rng = fp11.GSLrng(seed)

class ExponentialFitnessShift:
    def __init__(self, shift_gen):
        self.s = 0
        self.shift_gen = shift_gen
    def update(self, pop):
        if pop.generation >= self.shift_gen:
            self.s = 1
    def __call__(self, g, e):
        return math.exp(g*self.s)

pdict = dict(demography=np.array([N]*(10*N)),
             nregions=[[fp11.Region(0, 1, 1)], [fp11.Region(2, 3, 1)]],
             sregions=[[], [fp11.regions.ConstantS(2, 3, 1, s=s, h=h)]],
             recregions=[[fp11.Region(0, 1, 1)], [fp11.Region(2, 3, 1)]],
             recrates=[rho/float(4*N), rho/float(4*N)],
             mutrates_n=[theta/float(4*N), theta/float(4*N)],
             mutrates_s=[0.0, trait_mu],
             interlocus=fp11ml.binomial_rec([0.5]),
             agg=fp11ml.AggAddTrait(),
             gvalue=fp11ml.MultiLocusGeneticValue([fp11tv.SlocusAdditiveTrait(2.0)]*nloci),
             trait2w = ExponentialFitnessShift(10*N),
             prune_selected=True,  # new dev feature, sets the simulations to take all
             # fixations out of the gametes entirely. When False, they are copied
             # into pop.fixations but kept in gametes (according to KT).
             )

pop = fp11.MlocusPop(N, nloci,  [(0.0, 1.0), (2.0, 3.0)])
params = MlocusParamsQ(**pdict)
pops = wfq.evolve(rng, pop, params)

Fix warning in unit tests

There's a warning re: pruning selected fixations in one of the unit tests.

Bad function call when GaussianNoise is used in single-locus qtrait sims.

A C++ exception is raised when running the doc tests. This was discovered after fixing #23.

fwdpy11.model_params needs to check that empty "loci" correspond to rates == 0..0

This script, courtesy of @vsbuffalo, shows that fwdpy11 0.1.2 will segfault if rates > 0 are applied to "loci" containing no "regions". This needs to be checked in the model params classes.

import fwdpy11 as fp11
import fwdpy11.wright_fisher_qtrait as wfq
import fwdpy11.model_params
import fwdpy11.trait_values as fp11tv
# import fwdpy11.sampling
import fwdpy11.wright_fisher_qtrait as fp11qt
import fwdpy11.multilocus as fp11ml
import numpy as np

rng = fp11.GSLrng(42)
N = 1000
theta = 100
rho = 100
trait_mu, trait_sd = 1e-4, 0.1

nloci = 2

loci = {'sregions': [[fp11.GaussianS(0, 1, trait_mu, trait_sd)], []], 'nregions': [[], [fp11.Region(2, 3, 1)]],
        'recregions': [[fp11.Region(0, 1, 1)], [fp11.Region(2, 3, 1)]]}
locus_boundaries = [(0, 1), (2, 3)]

moving_optima = fp11qt.GSSmo([(0, 0, 1)])
pdict = {'demography': np.array([N]*N, dtype=np.uint32),
         'agg': fp11ml.AggAddTrait(),
         'gvalue': fp11ml.MultiLocusGeneticValue([fp11tv.SlocusAdditiveTrait(2.0)]*nloci),
         'trait2w': moving_optima,
         #'mutrates_s': [trait_mu/float(nloci)]*nloci,
         'mutrates_s': [trait_mu/float(nloci),0.0],
         #'mutrates_n': [float(nloci)*theta/float(4*N)]*nloci,
         'mutrates_n': [0.,float(nloci)*theta/float(4*N)],
         'recrates': [float(nloci)*rho/float(4*N)]*nloci,
         'interlocus': fp11ml.binomial_rec(rng, [0.5]*(nloci-1))
         }

pdict = {**pdict, **loci}
params = fp11.model_params.MlocusParamsQ(**pdict)
pop = fp11.MlocusPop(N, nloci, locus_boundaries)
#This function returns nothing:
wfq.evolve(rng, pop, params)

Tests of fixation properties are skipped on Travis because they takes minutes to run the simulations. However, the concepts can be tested using populations created on-demand and then evolved for a single generation.

Track molpopgen/fwdpp#130

Depending on what happens upstream with molpopgen/fwdpp#130, we may need to update the version of update_mutations implemented for this project.

Compatibility with pybind11 2.2.0

2.2.0 breaks fwdpy11. The most likely culprit is the new handling of opaque containers and bind_vector behavior in 2.2.0.

Potential cppimport issue with new Anaconda compilers

When testing out the new compilers in a clean env on Linux, the main package built fine but the unit tests did not. They could not find GSL headers. While this clearly seems like a compiler config issue, the following can be added to the mako headers to make sure that GSL's headers are found:

# Requires Python >= 3.5
import subprocess
GSL=subprocess.run(['gsl-config','--cflags'],stdout=subprocess.PIPE).stdout.strip(b'-I').decode().strip()
cfg['include_dirs'].extend([GSL, fp11.get_includes(), fp11.get_fwdpp_includes() ])

Curiously, the unit tests executed, suggesting no run-time linker issues.

flattened_popgenmut field order is not ideal

The field order should be refactored so that the elements of a numpy array can be used to construct a new mutation instance.

cc @DL42

Make mutation "flag" (fwdpp's popgenmut::xtra) writeable

There are probably several use cases for this. @vsbuffalo has one, for example, when changing mutation effect sizes.

This could conflict with having regions auto-fill this field, but that would simply need to be documented, etc.

Create proper Python class hierarchy for populations.

This is now possible via molpopgen/fwdpp#115

Require MlocusPop.locus_boundaries to be filled

Currently, there are ways to have an MlocusPop with empty locus boundaries. This cause API headaches on both the C++ side and on the Python side.

Sorting of fixations

The simulations store fixations sorted according to position. I believe the approach taken is naive and will lead to a problem where fixation times are mis-entered vis-a-vis the mutation that fixes. One solution is to use std::upper_bound instead of std::lower_bound. Another is to refactor the internal storage of fixations, which should be done upstream in fwdpp

Related: Issue #8

Allow DFE for selection coefficients to be scaled with respect to a population size.

The current Sregions models distributions on "s" itself. However, it would be useful to allow the distribution to be on Ns, 2Ns, 4Ns, or whatever.

Multiple populations

Hi Kevin!

We are thinking about using fwdpy11 for some forward simulations of Neanderthal demography. I was wondering if there is support for multiple populations with migration? I saw that it's on the long term todo (http://fwdpy11.readthedocs.io/en/dev/pages/todo.html), but Jeff mentioned there may be something on a dev branch that supports this.

Thanks!
Arun

fwdpy11.util.sort_gamete_keys doesn't work

This function is meant to assist pop creation, but the containers are not declared as opaque, meaning that it is working on a copy.

Release GIL during simulation.

The simulations should release the GIL and re-acquire it only as needed.

Remove deprecated versions of poisson_rec and binomial_rec.

Ideally, this would happen by 0.2.0.

fwdpy11.regions needs @property, etc.

Redoing these classes to use @property/@x.setter will help with error checking and reducing copy/paste of code.

Warning re: pickling formats

Due to molpopgen/fwdpp#90, we will have to break pickle format compatibility when we update to what is currently fwdpp/dev.

We will need to document this, too.

Pickling of Region types in Python 3.6

Unit tests are failing under 3.6 due to changes to fwdpy11.regions from #21. The most likely cause is due to storing a callable in Sregion.

Simplify back-end for genetic value calculations

There's quite a bit of wrapper code in use to interface between Python, the end user, and the fwdpp back-end. It also seems a bit silly to have two additive functions, etc., that only differ in whether they're used for fitness or for trait value.

Ideally, this stuff would all get moved into a single module and some of the redundancy mentioned above gets abstracted away.

Upstream changes in molpopgen/fwdpp#79 should make a lot of this possible.

New mutation type

In order to prevent a huge increase in the code base, we should replace the use of fwdpp's popgenmut with an equivalent type that also allows for multiple effects per mutation.

Once fwdpp/0.6 is used as the base, we can make this change in a way that is transparent to users.

The recent merge of #74 was in anticipation of this change.

Mutation types are declared read/write

The mutation type is modifiable in Python code. It should not be.

Thanks to @vsbuffalo for catching this.

Memory use during compilation

The current development branch fails to build on RTD due to excessive memory use. The culprit is fwdpy11.fwdpy11_types, which is massive. The memory use comes in large part from excessive template instantiation. Some of it is also unnecessary header inclusion.

Remove global include of fwdpp/sugar.hpp from the module.

fwdpy11.model_params objects are non-pickleable

It would be useful if these objects can be pickled, so that they can be sent to other process by, for e.g., concurrent.futures.

The limitation seems to be that some of the C++ types used to parameterize a simulation are non-pickleable.

Ideally, we'd be able to both pickle and copy.deepcopy these objects.

Thanks to @vsbuffalo for pointing this out.

fwdpy11.model_params is non-idiomatic

The implementation of these classes is quite messy and reflects a lack of understanding of Python's OO idioms and how to properly use class properties.

A do-over is needed.

Add low-level sampling operations to the C++ back-end of a Population

The current sampling member functions are implemented only for the Python classes. However, ideas like #113 would be more useful if the C++ side of a population also had this functionality. The Python function would then dispatch to the relevant C++ functions.

Fixations in simulations of quantitative traits

fwdpy11.wright_fisher_qtrait.evolve_regions_sampler_fitness is removing fixed variants affecting trait values. This is incorrect and will be fixed in 0.1.1.

Remove std::bind from fwdpp_extensions

The DES callbacks are implemented in terms of std::bind. They should be re-implemented in terms of lambda expressions.

Refactor build system to use Makefile

Issues like #48 illustrate the problems with build systems based on distutils, where managing the source/header dependency relationship properly is hard or impossible.

The solution is to subclass the build class to run ./configure and make via a subprocess.

Make some fwdpy11.util functions member functions of population objects

It would be cleaner and clearer if add_mutation and change_effect_size were members of classes rather than standalone functions.

Add properties to mutations

It would be nice if a mutation would return (pos, origin, esize) as a tuple.

Use brace initialization of structs for numpy dtypes

Instead of struct foo {...}; and foo make_foo(...), we can replace all make_foo functions with brace initialization.

Allow low-level types to be constructed directly.

If we allow diploids/mutations/gametes to be constructed by the user, then we're most of the way there to being able to seed a forward sim from the output of tools like msprime

This will depend on molpopgen/fwdpp#28. Once that's ready, we can update the submodule.

Need an initialization module for GSL error handling

We currently rely on GSL's default error handling. Doing so is bad, and will crash the interpreter when an error occurs within GSL. As we're about to start adding more features depending on linear algebra, we need to fix this, which requires a small module that will be loaded from init.py.

Remove deprecated property names from fwdpy11.model_params

Ideally by 0.2.0 or so.

Set mutation position to max double when flagging for recycling

When a mutation is flagged for recycling, all we do is set its count to zero. However, when a user is doing something like tracking a variant's frequency over time, this will lead to a zero value immediately after fixation. This happens because the mutation object is still present, meaning that its "key" still exists and is find-able. To prevent these oddities, we can set the position to max double, which will create a non-discoverable key.

Note: orignally, the thought was to use NaN, but that causes operator== to start failing for mutations.

Get Travis CI set up with latest Anaconda compilers

We should get caught up with the latest, described here. Allows C++14, etc.!

Fixation storage is inconsistent

Different types of simulations store fixations either sorted or unsorted (according to position). This should be fixed, and be sorted, too.

Thanks to @vsbuffalo for suffering through this.

fwdpy11::update_mutations may be a performance bottleneck

This issue is a reminder to look into the efficiency of this function. When fixations are happening rapidly, it is possible that this function is bogging things down.

Implement regions filling xtra field.

This should be done. Interaction with #9 should be noted in the docs when this is addressed.

Get rid of placement new

pybind11 >= 2.2.0 has deprecated the new of placement new, but the warning only shows up in debug mode.

The relevant dox are here.

Fix registration of NumPy dtypes

In numpy-1.14, dtype field lookup has been changed to be by position/offset rather than by field name. If that version is installed, an ImportError gets raised when importing fwdpy11 because dtype fields are declared in a different order than how they appear in the structs they reflect.

Another user of pybind11 has run into this: pybind/pybind11#1274.

This should be fixed ASAP and a new release of 0.13 should follow.

cc @jeromekelleher

Implementation of mut_lookup property?

Currently, Population.mut_lookup returns a list of (position, index) tuples for each mutation. It may be preferable to return a dict with (position, list of indices) key/value pairs.

Additionally, it may be useful to have separate function returning a list of keys for mutations at a given position. The function should return None if no extant keys exist for a given position.

cc @DL42

Remove Python data types from Population objects.

The C++ and Python classes contain popdata and popdata_user. Both are instances of pbybind11::object. The latter is redundant with the concept of a temporal sampler and should be removed. The former should be replaced by something like unique_ptr<ModelData> where ModelData is an abstract class exposed to Python as an ABC.

The reason to make these changes is that holding Python objects on the C++ side is a barrier to releasing the GiL during simulations. (See #50)

A minimal API for ModelData could look like:

struct ModelData
{
virtual std::string serialize() const = 0;
virtual void deserialize(std::istream &) = 0;
virtual unique_ptr<ModelData> clone() const = 0;
};

The intent of such structures is to allow tracking of more complex model features. For example, the details of a landscape. However, one could argue that this violates the single responsibility criterion, and thus these types should simply be eliminated. (For e.g., the features of a landscape are properties of the landscape and not the population.) Such removal would indeed be simpler...

SlocusParamsQ calls super incorrectly

The __init__ calls super(SlocusParams,self) rather than super(SlocusParamsQ,self).

Allow for C++ implementations of temporal samplers.

We should allow for "power users" to write such samplers entirely in C++. This is easy in principle, but brings up supporting C++ and Python samplers transparently. For "evolve" functions, we'd need to set up code paths overloaded to separate out these two concepts.

This change could allow simulations to be run with the GiL released (#50) and C++ samplers could still grab the GiL to modify and Python objects.

Add deme data field to diploid.

Consider removing '-g0' from default build

I've been trying to do some profiling, and the lack of debug symbols is causing some unexpected havoc with Linux perf. I don't really want to enable the assertions, since I would like to get an accurate measurement of the code in production.

Unless the debugging symbols make the shared objects very large indeed, it would be better for me if they were left in by default. That way I can watch a live execution with 'perf top' and see exactly where the time is being spent.