williamslab / papi Goto Github PK

2.0 1.0 0.0 133 KB

License: GNU General Public License v3.0

Python 97.33% Shell 2.67%

papi's Introduction

PAPI

PAPI is a tool for inferring the admixture proportions and admixture times for the parents of an admixed sample given unphased local ancestry tracts.

Installation

We recommend using anaconda to create a virtual environment using the included file papi_spec-file.txt.

conda create --name papi --file papi_spec-file.txt

Usage

Clone and navigate to the papi directory

cd src
source activate papi
usage: inference.py [-h] --inputfile INPUTFILE --ind IND [--tracefile [TRACEFILE]] --outfile OUTFILE [--mode [MODE]] [--typ [TYP]] [--err]

optional arguments:
  -h, --help            show this help message and exit
  --inputfile INPUTFILE, -i INPUTFILE
                        input tracts file
  --ind IND, -ind IND   individual on which to run papi
  --tracefile [TRACEFILE], -t [TRACEFILE]
                        optional argument, used to store trace output when run in mcmc mode. If not provided, MCMC solver will find MAP estimate
  --outfile OUTFILE, -o OUTFILE
                        required default output file
  --mode [MODE], -m [MODE]
                        inference mode-'pymc' or 'scipy-optimize'
  --typ [TYP], -typ [TYP]
                        model to use-'bino','hmm', or 'full'
  --err, -err

Input

An example input tracts file is provided in examples/tracts.txt that has the following structure

[[('10', 0.004568105), ('00', 0.46384804), ('10', 42.318381695), ('00', 27.1541), ... ], ...]
[[('10', 33.363797840000004), ('11', 18.6777), ('10', 8.2969), ('11', 14.64119999999999), ... ], ...]
...
...
...

Each line represents the tracts of an individual as a nested list of lists; each nested list corresponds to a chromosome. The first element of each tuple e.g ('10', 0.004568105) represents a single tract that is, in this case, heterozygous for the two ancestry states 1 and 0, while the second element represents the length of the tract in centiMorgans.

Example

Using the included tracts file, the simplest way of running PAPI is as follows:

python src/inference.py -i examples/tracts.txt -ind 1 -o test -m scipy-optimize -typ full

which will output GD estimates for the first line of the tracts file in test.scipy.map corresponding the the first individual. The parameters under which these tracts were simulated can be found in the corresponding headers.txt file. Note that the hyperparameter tau can be optionally specifed with -tau - it is set by default to 7 if unspecified.

Output

The output is in the form of a text file with a single line that looks like this

5.899999999999999689e-01 4.199999999999999845e-01 1.000000000000000000e+00 7.000000000000000000e+00

The first two floats are the admixture proportion estimates, while the latter two correpsond to admixture time estimates.

papi's People

Contributors

Stargazers

Watchers

papi's Issues

AssertionError running PAPI

Hello,

I'm running PAPI on a custom tracts input file and I'm getting the following error:

Running inference in scipy-optimize mode
Traceback (most recent call last):
  File "/tigress/VONHOLDT/Simona/bin/papi/src/inference.py", line 262, in <module>
    MAPestimate_scipy = estimate_MAP(d_tracts,typ=args.typ,err=args.err,tau=args.tau)
  File "/tigress/VONHOLDT/Simona/bin/papi/src/inference.py", line 217, in estimate_MAP
    res = scipy.optimize.minimize(
  File "/tigress/VONHOLDT/Simona/miniconda3/envs/papi/lib/python3.8/site-packages/scipy/optimize/_minimize.py", line 617, in minimize
    return _minimize_lbfgsb(fun, x0, args, jac, bounds,
  File "/tigress/VONHOLDT/Simona/miniconda3/envs/papi/lib/python3.8/site-packages/scipy/optimize/lbfgsb.py", line 306, in _minimize_lbfgsb
    sf = _prepare_scalar_function(fun, x0, jac=jac, args=args, epsilon=eps,
  File "/tigress/VONHOLDT/Simona/miniconda3/envs/papi/lib/python3.8/site-packages/scipy/optimize/optimize.py", line 261, in _prepare_scalar_function
    sf = ScalarFunction(fun, x0, args, grad, hess,
  File "/tigress/VONHOLDT/Simona/miniconda3/envs/papi/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 76, in __init__
    self._update_fun()
  File "/tigress/VONHOLDT/Simona/miniconda3/envs/papi/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 166, in _update_fun
    self._update_fun_impl()
  File "/tigress/VONHOLDT/Simona/miniconda3/envs/papi/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 73, in update_fun
    self.f = fun_wrapped(self.x)
  File "/tigress/VONHOLDT/Simona/miniconda3/envs/papi/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 70, in fun_wrapped
    return fun(x, *args)
  File "/tigress/VONHOLDT/Simona/bin/papi/src/inference.py", line 218, in <lambda>
    fun=lambda params, D: -lik_func(params,d_tracts,tau),
  File "/tigress/VONHOLDT/Simona/bin/papi/src/inference.py", line 77, in lik_func
    l = md.computeLoglikelihood_binomial(D_flat,params[:2],alpha) + md.computeLoglikelihood_cnsPM(D_dicts,params)
  File "/projects/VONHOLDT/Simona/bin/papi/src/models.py", line 176, in computeLoglikelihood_cnsPM
    assert len(D) == 22
AssertionError

Command used:
python src/inference.py --inputfile ../3_PAPI_inputs/PAPI_input_tracts.txt -ind 1 --outfile ind1_papi_output --mode scipy-optimize --typ full -tau 7

Fist line of the input file (ind1):
ind1_tracts.txt

Could someone help me understanding what the error means?

Moreover, I have to specify the -tau even though it is written that will be automatically set as 7 is not specified, otherwise I get another error:

Running inference in scipy-optimize mode
Traceback (most recent call last):
  File "/tigress/VONHOLDT/Simona/bin/papi/src/inference.py", line 262, in <module>
    MAPestimate_scipy = estimate_MAP(d_tracts,typ=args.typ,err=args.err,tau=args.tau)
  File "/tigress/VONHOLDT/Simona/bin/papi/src/inference.py", line 217, in estimate_MAP
    res = scipy.optimize.minimize(
  File "/tigress/VONHOLDT/Simona/miniconda3/envs/papi/lib/python3.8/site-packages/scipy/optimize/_minimize.py", line 617, in minimize
    return _minimize_lbfgsb(fun, x0, args, jac, bounds,
  File "/tigress/VONHOLDT/Simona/miniconda3/envs/papi/lib/python3.8/site-packages/scipy/optimize/lbfgsb.py", line 306, in _minimize_lbfgsb
    sf = _prepare_scalar_function(fun, x0, jac=jac, args=args, epsilon=eps,
  File "/tigress/VONHOLDT/Simona/miniconda3/envs/papi/lib/python3.8/site-packages/scipy/optimize/optimize.py", line 261, in _prepare_scalar_function
    sf = ScalarFunction(fun, x0, args, grad, hess,
  File "/tigress/VONHOLDT/Simona/miniconda3/envs/papi/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 76, in __init__
    self._update_fun()
  File "/tigress/VONHOLDT/Simona/miniconda3/envs/papi/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 166, in _update_fun
    self._update_fun_impl()
  File "/tigress/VONHOLDT/Simona/miniconda3/envs/papi/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 73, in update_fun
    self.f = fun_wrapped(self.x)
  File "/tigress/VONHOLDT/Simona/miniconda3/envs/papi/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 70, in fun_wrapped
    return fun(x, *args)
  File "/tigress/VONHOLDT/Simona/bin/papi/src/inference.py", line 218, in <lambda>
    fun=lambda params, D: -lik_func(params,d_tracts,tau),
  File "/tigress/VONHOLDT/Simona/bin/papi/src/inference.py", line 76, in lik_func
    alpha=100/tau
TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'

Many thanks!!

Add error message when p < 0.05 or > 0.95

Double check if corrective factor of -1 is implemented, and implement if not

Trouble running examples

I'm trying to get PAPI running on the examples:

I was able create a conda environment with:
conda create --name papi --file ./papi/papi_spec-file.txt

But, I am running into a number of different errors. Sorry to combine then into one issue, but it seems like these cmds may have worked in the past, but not anymore.

If I run:
python src/inference.py --inputfile ./examples/tracts.txt --ind 1 --tracefile ./example.trace --outfile OUTFILE --typ bin

I get the output:
Running inference in pymc mode
Warning: gradient not available.(E.g. vars contains discrete variables). MAP estimates may not be accurate for the default parameters. Defaulting to non-gradient minimization 'Powell'.
logp = -249.34: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 46/46 [00:00<00:00, 3164.68it/s]
Only 100 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Initializing NUTS failed. Falling back to elementwise auto-assignment.
Multiprocess sampling (4 chains in 4 jobs)
CompoundStep

Slice: [t2]
Slice: [t1]
Slice: [p2]
Slice: [p1]
Sampling 4 chains, 0 divergences: 100%|███████████████████████████████████████████████████████████████████| 2400/2400 [00:05<00:00, 475.56draws/s]
The number of effective samples is smaller than 25% for some parameters.
Traceback (most recent call last):
File "src/inference.py", line 310, in
trace['t1']=trace['t1']-1
TypeError: 'MultiTrace' object does not support item assignment

If I run python src/inference.py --inputfile ./examples/tracts.txt --ind 1 --tracefile ./example.trace --outfile OUTFILE --typ full

I get the output:
Running inference in pymc mode
Traceback (most recent call last):
File "src/inference.py", line 301, in
pm.DensityDist('likelihood', lambda v: logl(v), observed={'v': theta})
File "/home/kele/mambaforge/envs/papi/lib/python3.8/site-packages/pymc3/distributions/distribution.py", line 47, in new
return model.Var(name, dist, data, total_size)
File "/home/kele/mambaforge/envs/papi/lib/python3.8/site-packages/pymc3/model.py", line 940, in Var
var = MultiObservedRV(name=name, data=data, distribution=dist,
File "/home/kele/mambaforge/envs/papi/lib/python3.8/site-packages/pymc3/model.py", line 1543, in init
self.logp_elemwiset = distribution.logp(**self.data)
File "src/inference.py", line 301, in
pm.DensityDist('likelihood', lambda v: logl(v), observed={'v': theta})
File "/home/kele/mambaforge/envs/papi/lib/python3.8/site-packages/theano/gof/op.py", line 674, in call
required = thunk()
File "/home/kele/mambaforge/envs/papi/lib/python3.8/site-packages/theano/gof/op.py", line 892, in rval
r = p(n, [x[0] for x in i], o)
File "src/inference.py", line 49, in perform
logl = self.likelihood(theta,self.data)
TypeError: lik_func() missing 1 required positional argument: 'tau'

and with:
python src/inference.py --inputfile ./examples/tracts.txt --ind 1 --tracefile ./example.trace --outfile OUTFILE --typ mrkv

Running inference in pymc mode
Warning: gradient not available.(E.g. vars contains discrete variables). MAP estimates may not be accurate for the default parameters. Defaulting to non-gradient minimization 'Powell'.
logp = 141.33: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 136/136 [00:03<00:00, 36.25it/s]
Only 100 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Initializing NUTS failed. Falling back to elementwise auto-assignment.
Multiprocess sampling (4 chains in 4 jobs)
CompoundStep

Slice: [t2]
Slice: [t1]
Slice: [p2]
Slice: [p1]
Sampling 4 chains, 0 divergences: 100%|████████████████████████████████████████████████████████████████████| 2400/2400 [09:25<00:00, 4.25draws/s]
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The number of effective samples is smaller than 10% for some parameters.
Traceback (most recent call last):
File "src/inference.py", line 310, in
trace['t1']=trace['t1']-1
TypeError: 'MultiTrace' object does not support item assignment

Change default behavior of -pymc option

Currently only does mcmc if -t (trace file) is provided