Code Monkey home page Code Monkey logo

fwdpy11's People

Contributors

apragsdale avatar dependabot[bot] avatar kevin-meyers avatar molpopgen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fwdpy11's Issues

origin/parent_data segfault on CentOS 7.3.1611, but works on OS X 10.11.6

So, the new parent_data branch is seg faulting on GNU/Linux but not OS X. I have attached a MRE.

centos gcc:

$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)

os x "gcc"

$ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin15.6.0

Here's an MRE:

import fwdpy11 as fp11
import fwdpy11.multilocus as fp11ml
from fwdpy11.model_params import MlocusParamsQ

import fwdpy11.wright_fisher_qtrait as fp11qt
import fwdpy11.trait_values as fp11tv
import fwdpy11.wright_fisher_qtrait as wfq
import math
import numpy as np

N, theta, rho, trait_mu, s, h, repid, seed = 100, 10.0, 10.0, 1e-4, 0.1, 0.5, 0, 0
nloci = 2
rng = fp11.GSLrng(seed)

class ExponentialFitnessShift:
    def __init__(self, shift_gen):
        self.s = 0
        self.shift_gen = shift_gen
    def update(self, pop):
        if pop.generation >= self.shift_gen:
            self.s = 1
    def __call__(self, g, e):
        return math.exp(g*self.s)

pdict = dict(demography=np.array([N]*(10*N)),
             nregions=[[fp11.Region(0, 1, 1)], [fp11.Region(2, 3, 1)]],
             sregions=[[], [fp11.regions.ConstantS(2, 3, 1, s=s, h=h)]],
             recregions=[[fp11.Region(0, 1, 1)], [fp11.Region(2, 3, 1)]],
             recrates=[rho/float(4*N), rho/float(4*N)],
             mutrates_n=[theta/float(4*N), theta/float(4*N)],
             mutrates_s=[0.0, trait_mu],
             interlocus=fp11ml.binomial_rec([0.5]),
             agg=fp11ml.AggAddTrait(),
             gvalue=fp11ml.MultiLocusGeneticValue([fp11tv.SlocusAdditiveTrait(2.0)]*nloci),
             trait2w = ExponentialFitnessShift(10*N),
             prune_selected=True,  # new dev feature, sets the simulations to take all
             # fixations out of the gametes entirely. When False, they are copied
             # into pop.fixations but kept in gametes (according to KT).
             )

pop = fp11.MlocusPop(N, nloci,  [(0.0, 1.0), (2.0, 3.0)])
params = MlocusParamsQ(**pdict)
pops = wfq.evolve(rng, pop, params)

fwdpy11.model_params needs to check that empty "loci" correspond to rates == 0..0

This script, courtesy of @vsbuffalo, shows that fwdpy11 0.1.2 will segfault if rates > 0 are applied to "loci" containing no "regions". This needs to be checked in the model params classes.

import fwdpy11 as fp11
import fwdpy11.wright_fisher_qtrait as wfq
import fwdpy11.model_params
import fwdpy11.trait_values as fp11tv
# import fwdpy11.sampling
import fwdpy11.wright_fisher_qtrait as fp11qt
import fwdpy11.multilocus as fp11ml
import numpy as np

rng = fp11.GSLrng(42)
N = 1000
theta = 100
rho = 100
trait_mu, trait_sd = 1e-4, 0.1

nloci = 2

loci = {'sregions': [[fp11.GaussianS(0, 1, trait_mu, trait_sd)], []], 'nregions': [[], [fp11.Region(2, 3, 1)]],
        'recregions': [[fp11.Region(0, 1, 1)], [fp11.Region(2, 3, 1)]]}
locus_boundaries = [(0, 1), (2, 3)]

moving_optima = fp11qt.GSSmo([(0, 0, 1)])
pdict = {'demography': np.array([N]*N, dtype=np.uint32),
         'agg': fp11ml.AggAddTrait(),
         'gvalue': fp11ml.MultiLocusGeneticValue([fp11tv.SlocusAdditiveTrait(2.0)]*nloci),
         'trait2w': moving_optima,
         #'mutrates_s': [trait_mu/float(nloci)]*nloci,
         'mutrates_s': [trait_mu/float(nloci),0.0],
         #'mutrates_n': [float(nloci)*theta/float(4*N)]*nloci,
         'mutrates_n': [0.,float(nloci)*theta/float(4*N)],
         'recrates': [float(nloci)*rho/float(4*N)]*nloci,
         'interlocus': fp11ml.binomial_rec(rng, [0.5]*(nloci-1))
         }

pdict = {**pdict, **loci}
params = fp11.model_params.MlocusParamsQ(**pdict)
pop = fp11.MlocusPop(N, nloci, locus_boundaries)
#This function returns nothing:
wfq.evolve(rng, pop, params)

Refactor long-running tests

Tests of fixation properties are skipped on Travis because they takes minutes to run the simulations. However, the concepts can be tested using populations created on-demand and then evolved for a single generation.

Potential cppimport issue with new Anaconda compilers

When testing out the new compilers in a clean env on Linux, the main package built fine but the unit tests did not. They could not find GSL headers. While this clearly seems like a compiler config issue, the following can be added to the mako headers to make sure that GSL's headers are found:

# Requires Python >= 3.5
import subprocess
GSL=subprocess.run(['gsl-config','--cflags'],stdout=subprocess.PIPE).stdout.strip(b'-I').decode().strip()
cfg['include_dirs'].extend([GSL, fp11.get_includes(), fp11.get_fwdpp_includes() ])

Curiously, the unit tests executed, suggesting no run-time linker issues.

Sorting of fixations

The simulations store fixations sorted according to position. I believe the approach taken is naive and will lead to a problem where fixation times are mis-entered vis-a-vis the mutation that fixes. One solution is to use std::upper_bound instead of std::lower_bound. Another is to refactor the internal storage of fixations, which should be done upstream in fwdpp

Related: Issue #8

Multiple populations

Hi Kevin!

We are thinking about using fwdpy11 for some forward simulations of Neanderthal demography. I was wondering if there is support for multiple populations with migration? I saw that it's on the long term todo (http://fwdpy11.readthedocs.io/en/dev/pages/todo.html), but Jeff mentioned there may be something on a dev branch that supports this.

Thanks!
Arun

Simplify back-end for genetic value calculations

There's quite a bit of wrapper code in use to interface between Python, the end user, and the fwdpp back-end. It also seems a bit silly to have two additive functions, etc., that only differ in whether they're used for fitness or for trait value.

Ideally, this stuff would all get moved into a single module and some of the redundancy mentioned above gets abstracted away.

Upstream changes in molpopgen/fwdpp#79 should make a lot of this possible.

New mutation type

In order to prevent a huge increase in the code base, we should replace the use of fwdpp's popgenmut with an equivalent type that also allows for multiple effects per mutation.

Once fwdpp/0.6 is used as the base, we can make this change in a way that is transparent to users.

The recent merge of #74 was in anticipation of this change.

Memory use during compilation

The current development branch fails to build on RTD due to excessive memory use. The culprit is fwdpy11.fwdpy11_types, which is massive. The memory use comes in large part from excessive template instantiation. Some of it is also unnecessary header inclusion.

  • Remove global include of fwdpp/sugar.hpp from the module.

fwdpy11.model_params objects are non-pickleable

It would be useful if these objects can be pickled, so that they can be sent to other process by, for e.g., concurrent.futures.

The limitation seems to be that some of the C++ types used to parameterize a simulation are non-pickleable.

Ideally, we'd be able to both pickle and copy.deepcopy these objects.

Thanks to @vsbuffalo for pointing this out.

fwdpy11.model_params is non-idiomatic

The implementation of these classes is quite messy and reflects a lack of understanding of Python's OO idioms and how to properly use class properties.

A do-over is needed.

Refactor build system to use Makefile

Issues like #48 illustrate the problems with build systems based on distutils, where managing the source/header dependency relationship properly is hard or impossible.

The solution is to subclass the build class to run ./configure and make via a subprocess.

Need an initialization module for GSL error handling

We currently rely on GSL's default error handling. Doing so is bad, and will crash the interpreter when an error occurs within GSL. As we're about to start adding more features depending on linear algebra, we need to fix this, which requires a small module that will be loaded from init.py.

Set mutation position to max double when flagging for recycling

When a mutation is flagged for recycling, all we do is set its count to zero. However, when a user is doing something like tracking a variant's frequency over time, this will lead to a zero value immediately after fixation. This happens because the mutation object is still present, meaning that its "key" still exists and is find-able. To prevent these oddities, we can set the position to max double, which will create a non-discoverable key.

Note: orignally, the thought was to use NaN, but that causes operator== to start failing for mutations.

Fixation storage is inconsistent

Different types of simulations store fixations either sorted or unsorted (according to position). This should be fixed, and be sorted, too.

Thanks to @vsbuffalo for suffering through this.

Get rid of placement new

pybind11 >= 2.2.0 has deprecated the new of placement new, but the warning only shows up in debug mode.

The relevant dox are here.

Fix registration of NumPy dtypes

In numpy-1.14, dtype field lookup has been changed to be by position/offset rather than by field name. If that version is installed, an ImportError gets raised when importing fwdpy11 because dtype fields are declared in a different order than how they appear in the structs they reflect.

Another user of pybind11 has run into this: pybind/pybind11#1274.

This should be fixed ASAP and a new release of 0.13 should follow.

cc @jeromekelleher

Implementation of mut_lookup property?

Currently, Population.mut_lookup returns a list of (position, index) tuples for each mutation. It may be preferable to return a dict with (position, list of indices) key/value pairs.

Additionally, it may be useful to have separate function returning a list of keys for mutations at a given position. The function should return None if no extant keys exist for a given position.

cc @DL42

Remove Python data types from Population objects.

The C++ and Python classes contain popdata and popdata_user. Both are instances of pbybind11::object. The latter is redundant with the concept of a temporal sampler and should be removed. The former should be replaced by something like unique_ptr<ModelData> where ModelData is an abstract class exposed to Python as an ABC.

The reason to make these changes is that holding Python objects on the C++ side is a barrier to releasing the GiL during simulations. (See #50)

A minimal API for ModelData could look like:

struct ModelData
{
virtual std::string serialize() const = 0;
virtual void deserialize(std::istream &) = 0;
virtual unique_ptr<ModelData> clone() const = 0;
};

The intent of such structures is to allow tracking of more complex model features. For example, the details of a landscape. However, one could argue that this violates the single responsibility criterion, and thus these types should simply be eliminated. (For e.g., the features of a landscape are properties of the landscape and not the population.) Such removal would indeed be simpler...

Allow for C++ implementations of temporal samplers.

We should allow for "power users" to write such samplers entirely in C++. This is easy in principle, but brings up supporting C++ and Python samplers transparently. For "evolve" functions, we'd need to set up code paths overloaded to separate out these two concepts.

This change could allow simulations to be run with the GiL released (#50) and C++ samplers could still grab the GiL to modify and Python objects.

Consider removing '-g0' from default build

I've been trying to do some profiling, and the lack of debug symbols is causing some unexpected havoc with Linux perf. I don't really want to enable the assertions, since I would like to get an accurate measurement of the code in production.

Unless the debugging symbols make the shared objects very large indeed, it would be better for me if they were left in by default. That way I can watch a live execution with 'perf top' and see exactly where the time is being spent.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.