ctlab / gadma Goto Github PK

View Code? Open in Web Editor NEW

46.0 46.0 13.0 372.66 MB

Genetic Algorithm for Demographic Model Analysis

License: Other

Python 99.98% Shell 0.02%

gadma's People

Contributors

Stargazers

Watchers

Forkers

isheshukov tw7649116 noscode n8ron dingearteom sidorinanton iam28th stanislavzlp iruuunechka cafuneandchill estsov computerscienceiscool quangsang52sh

gadma's Issues

facilitating 3 population model convergence

Hi, Ekaterina -

Thanks again for this incredibly useful program!

I had a great experience using for 2-population models. Ultimately, I want to implement for a 3-population model, but GADMA has not been able to print any models yet (192 individuals total, and 6.7million SNPs). To facilitate convergence, I changed my final structure to match the initial structure: [1,1,1], but GADMA has been running 30 processes for a month now and hasn't been able to print a model. Do you recommend that I implement the Bayesian optimization ensemble that is shown in the example of inference with four and five populations? Or perhaps I should consider downprojecting my data? Might you have other recommendations for how to assist in convergence?

Thanks so much!

Installing gadma on Macs with the M1 chip

Hello,

I was wondering if anyone has successfully installed gadma on a mac with an M1 (or M1 Ultra) chip? I have had no luck even after creating an conda environment.

Right now it seems I am stuck with this error after executing gadma:

File "/Users/frankburbrink/miniconda3/envs/gadma_try/lib/python3.10/site-packages/moments/init.py", line 13, in
from timestamp import Timestamp
ImportError: cannot import name 'Timestamp' from 'timestamp' (unknown location)

Any help would be appreciated!

Thanks!

Frank

Jupyter notebooks are shown incorrectly in the documentation

Some cells are missed in rst versions of example notebooks.

dadi code for 3 population model

Hello @noscode

I have another question regarding GADMA, sorry for being such a high-maintenance user. The results are sort of making sense so I'm actually quite optimistic when looking at the plot for the best-performing model! I've successfully finished an initial run for a 3-population model but the dadi code generated by GADMA is unfortunately not working. But the results are starting to make sense so I'm actually quite optimistic when looking at the plot for the best-performing model! I've attached the model code (changed to .txt as github wouldn't let me attach it otherwise)
best_logLL_model_dadi_code.py.txt

unfortunately, when I try to run this model code, it runs into an undefined variable _Nanc_size. Obviously this should be the ancestral population size according to the variable name. I've tried adding it using the value for Nanc at the end of the script but then the model crashes with a masked array when I run it like this.

I ran the model with split fractions enabled. Should I try rerunning it with this set to false? Or could this be a bug in the parsing of the dadi model code?

Cheers

Sam

Folded SFS

Hi,

Thank you for developing GADMA! I have a few questions about using folded SFS.

I am confused about whether I should input an already folded SFS and also use the Outgroup: False option in the param file or if I can provide an unfolded SFS (e.g. polarized by reference/alternative instead of ancestral/derived), and the option Outgroup: False will do the job of folding the sfs. In the example param file it says "# If outgroup is False then SFS will be folded." so I understood the second option would be ok, but I wanted to confirm that.

Also, when I try to provide an already folded SFS, I get this warning:
WARNING:Spectrum_mod:Creating Spectrum with data_folded = True, but mask is not True for all entries which are nonsensical for a folded Spectrum.
I think this is because I am not masking any of the sfs elements, and I should be masking the folded second half of the SFS, right? I am generating the SFS from ANGSD, and converting it to .fs format with a custom script, so my folded SFS just had the same number of elements as the unfolded, but half the elements were zeros. Should I mask those zeros with a "1" in the third line of the file? Or can I ignore this warning?

Sorry for a lot of questions together, but I think they are all related. Thanks again for a great tool!

example Error

hi, when i run gadma ,i got a error
"$ gadma -i data.txt -o gadma_out

Warning: Theta0 is not specified. It would be 1.0.

Warning: Time for one generation is not specified. Time will be in genetic units.

Traceback (most recent call last):
File "/public1/home/Sh1ne/anaconda2/bin/gadma", line 11, in
load_entry_point('gadma==1.0.0', 'console_scripts', 'gadma')()
File "/public1/home/Sh1ne/anaconda2/lib/python2.7/site-packages/gadma-1.0.0-py2.7.egg/gadma/core.py", line 150, in main
params = options.parse_args()
File "/public1/home/Sh1ne/anaconda2/lib/python2.7/site-packages/gadma-1.0.0-py2.7.egg/gadma/options.py", line 911, in parse_args
options_storage.check()
File "/public1/home/Sh1ne/anaconda2/lib/python2.7/site-packages/gadma-1.0.0-py2.7.egg/gadma/options.py", line 509, in check
self.input_file, self.ns, self.pop_labels)
File "/public1/home/Sh1ne/anaconda2/lib/python2.7/site-packages/gadma-1.0.0-py2.7.egg/gadma/support.py", line 250, in load_spectrum
return READ_ALLOWED_EXTENSIONS[ext](filename, proj, pop_labels)
File "/public1/home/Sh1ne/anaconda2/lib/python2.7/site-packages/gadma-1.0.0-py2.7.egg/gadma/support.py", line 219, in read_dadi_file
for x in xrange(number_of_pops)
ValueError: invalid literal for int() with base 10: 'ASW'"

how to sovle this problem?

sciPy error: dimension mismatch

It is actually a moments issue, but I face it when tried to run GADMA.

With scipy == 1.5.1 (default version) it returns an error
ValueError: dimension mismatch

Downgrading scipy to the version 1.4.1 solved the problem.
Please, update the requirements file.

Error running gadma

Hello! Love the idea of the software but I'm struggling to get gadma running.

After getting an error with my own data, I've downloaded the data from Portik et al from their github here.

I've set up a params file following this and just adapting the population names (North and South instead of CVLN, CVLS)

I am getting the same error as with my own data:
Traceback (most recent call last): File "/home/ubuntu/anaconda3/bin/gadma", line 8, in <module> sys.exit(main()) File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/gadma/core/core.py", line 145, in main print_runs_summary(start_time, shared_dict, settings_storage) File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/gadma/core/draw_and_generate_code.py", line 257, in print_runs_summary x_translated = engine.model.translate_values( File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/gadma/models/demographic_model.py", line 143, in translate_values tr_value = var.translate_value_into(units=units, File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/gadma/utils/variables.py", line 280, in translate_value_into raise ValueError("Set Nanc value for translation") ValueError: Set Nanc value for translation

Do you have any pointers for me to debug this? Thank you very much in advance for this very promising software!

Estimating Theta0?

Hi, Ekaterina Noskova,
I have a problem when I using gadma, in the manual, Theta0 = 4uL, but I don't know how to set L in config file, can u give some advices or get this value through other ways ?

Yours,
ChenSh1ne

IndexError: string index out of range

Hey,

I am new to GADMA. I have installed GADMA2 and trying to run it on vcf file but I get the error pasted bellow. Could you please help. My parameter file is attached too.

UserWarning: Parameters will be in genetic units (Relative parameters). Engines dadi and moments require mutation rate and sequence length for unit translation (/usr/local/lib/python3.6/site-packages/gadma/cli/settings_storage.py:1352)
Data reading
False
Traceback (most recent call last):
File "/usr/local/bin/gadma", line 11, in
sys.exit(main())
File "/usr/local/lib/python3.6/site-packages/gadma/core/core.py", line 69, in main
settings_storage.read_data()
File "/usr/local/lib/python3.6/site-packages/gadma/cli/settings_storage.py", line 891, in read_data
engine.data = self.data_holder
File "/usr/local/lib/python3.6/site-packages/gadma/engines/engine.py", line 209, in data
self.set_data(new_data)
File "/usr/local/lib/python3.6/site-packages/gadma/engines/engine.py", line 191, in set_data
self.inner_data = self.read_data(data)
File "/usr/local/lib/python3.6/site-packages/gadma/engines/engine.py", line 121, in read_data
return cls._read_data(data_holder)
File "/usr/local/lib/python3.6/site-packages/gadma/engines/dadi_moments_common.py", line 58, in _read_data
data = read_vcf_data(cls.base_module, data_holder)
File "/usr/local/lib/python3.6/site-packages/gadma/engines/dadi_moments_common.py", line 733, in read_vcf_data
flanking_info=[None, None],
File "/usr/local/lib64/python3.6/site-packages/moments/Misc.py", line 562, in make_data_dict_vcf
g1, g2 = gt[0], gt[2]
IndexError: string index out of range
GADMA_param_HiC_scaffold_1.txt

Final estimates for parameters

Hello Ekaterina,

I am trying to understand where the final values for each parameter is stored at the end of the run and the correct way to convert them. I read your documentation but I want to confirm.

Because GADMA plots the best AIC in moments, then I suspect they must be in the best_aic_model_moments_code.py file.

The values appear to be in genetic units, but when I convert them to physical units they don't seem to match the figure generated in moments (attached).

For example, the output of best_aic_model_moments_code.py looks like this:

_import moments
import numpy as np

def model_func(params, ns):
nu_1, nu_2, t1, nu11, nu12, m1_12, m1_21, nu12_1, nu12_2, t2, nu21, nu22, nu23, m2_12, m2_13, m2_21, m2_23, m2_31, m2_32 = params
_Nanc_size = 1.0 # This value can be used in splits with fractions
sts = moments.LinearSystem_1D.steady_state_1D(np.sum(ns))
fs = moments.Spectrum(sts)
fs = moments.Manips.split_1D_to_2D(fs, ns[0], ns[1] + ns[2])
migs = np.array([[0, m1_12], [m1_21, 0]])
fs.integrate(tf=t1, Npop=[nu11, nu12], m=migs, dt_fac=0.01)
fs = moments.Manips.split_2D_to_3D_2(fs, ns[1], ns[2])
nu1_func = lambda t: nu11 + (nu21 - nu11) * (t / t2)
nu2_func = lambda t: nu12_1 * (nu22 / nu12_1) ** (t / t2)
nu3_func = lambda t: nu12_2 + (nu23 - nu12_2) * (t / t2)
migs = np.array([[0, m2_12, m2_13], [m2_21, 0, m2_23], [m2_31, m2_32, 0]])
fs.integrate(tf=t2, Npop=lambda t: [nu1_func(t), nu2_func(t), nu3_func(t)], m=migs, dt_fac=0.01)
return fs

data = moments.Spectrum.from_file('/mendel-nas1/fburbrink/gadma_runs/FL-E-CW_down_project_2.sfs')
ns = data.sample_sizes

p0 = [0.07493031299805837, 0.46806203498649623, 3.4646636202407444, 0.01, 0.01, 7.444716354383534, 7.121960644951036, 0.020336124180594915, 0.01, 0.1973464288033633, 4.696024261710003, 100.0, 1.8363622668199782, 1.3979258635425715, 0.1751065891038577, 0.20829024608677718, 0.46111620479204896, 0.0, 0.8243559947229655]
lower_bound = [0.01, 0.01, 1e-15, 0.01, 0.01, 0.0, 0.0, 0.01, 0.01, 1e-15, 0.01, 0.01, 0.01, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
upper_bound = [100.0, 100.0, 5.0, 100.0, 100.0, 10.0, 10.0, 100.0, 100.0, 5.0, 100.0, 100.0, 100.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0]
model = model_func(p0, ns)
ll_model = moments.Inference.ll_multinom(model, data)
print('Model log likelihood (LL(model, data)): {0}'.format(ll_model))

theta = moments.Inference.optimal_sfs_scaling(model, data)
print('Optimal value of theta: {0}'.format(theta))

Nanc = 98499.36371461207
mu = 1e-09
L = 819436
theta0 = 4 * mu * L
Nanc = int(theta / theta0)
print('Size of ancestral population: {0}'.format(Nanc))

plot_ns = [4 for _ in ns] # small sizes for fast drawing
gen_mod = moments.ModelPlot.generate_model(model_func,
p0, plot_ns)
moments.ModelPlot.plot_model(gen_mod,
save_file='model_from_GADMA.png',
fig_title='Demographic model from GADMA',
draw_scale=True,
pop_labels=['FL', 'E', 'CW'],
nref=98499,
gen_time=0.002,
gen_time_units='thousand years',
reverse_timeline=True)_

Combining the params with p0 I get this:

params vals
1 nu_1 0.07493031
2 nu_2 0.46806203
3 t1 3.46466362
4 nu11 0.01000000
5 nu12 0.01000000
6 m1_12 7.44471635
7 m1_21 7.12196064
8 nu12_1 0.02033612
9 nu12_2 0.01000000
10 t2 0.19734643
11 nu21 4.69602426
12 nu22 100.00000000
13 nu23 1.83636227
14 m2_12 1.39792586
15 m2_13 0.17510659
16 m2_21 0.20829025
17 m2_23 0.46111620
18 m2_31 0.00000000
19 m2_32 0.82435599

To check this and compare it to the model that gadma draws from moments, I use your conversion from the model for time:

Time | T | Generations divided by 2 * Nref

Therefore, real time should be T * 2 * Nref

And using the value of nref above and t1 then I convert these genetic units into physical units by:
(2*98499)*3.4646636202407444 = 682531.8.

However, the model for the earliest split looks to be 1443, thousands of years ago, or 1,443,00 years ago (see attached figure, note I may need to reproject the SFS).

Is this off because I need to assume 4Ne, for diploid, and multiply by 4 (that's a bit closer)?

There must be something I am doing wrong in these conversions or I am using the wrong starting values.

I will also need to correct the migration and Ne values as well.

Any help is much appreciated.

Thanks!!

Frank

MomentsLD — ValueError: Can specify only recombination or bp bins, not both

Hey Ekaterina,

I'm trying to run some two-population demographic using the momentsLD engine and am getting the following traceback:

Traceback (most recent call last):                                                                                                                                                 
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/multiprocessing/pool.py", line 125, in worker                                                                               
    result = (True, func(*args, **kwds))
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/site-packages/gadma/engines/moments_ld_engine.py", line 30, in _read_data_one_job
    moments.LD.Parsing.compute_ld_statistics(
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/site-packages/moments/LD/Parsing.py", line 1259, in compute_ld_statistics
    raise ValueError("Can specify only recombination or bp bins, not both")
ValueError: Can specify only recombination or bp bins, not both
"""                                         

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/santang3/.conda/envs/gadma/bin/gadma", line 10, in <module>
    sys.exit(main())                        
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/site-packages/gadma/core/core.py", line 69, in main
    settings_storage.read_data()
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/site-packages/gadma/cli/settings_storage.py", line 891, in read_data
    engine.data = self.data_holder
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/site-packages/gadma/engines/engine.py", line 209, in data
    self.set_data(new_data)
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/site-packages/gadma/engines/engine.py", line 191, in set_data
    self.inner_data = self.read_data(data)
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/site-packages/gadma/engines/engine.py", line 121, in read_data
    return cls._read_data(data_holder)
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/site-packages/gadma/engines/moments_ld_engine.py", line 185, in _read_data
    region_stats = cls._get_region_stats(data_holder)
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/site-packages/gadma/engines/moments_ld_engine.py", line 171, in _get_region_stats
    result = pool.map(_read_data_one_job, all_kwargs)
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value                       
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/site-packages/gadma/engines/moments_ld_engine.py", line 30, in _read_data_one_job
    moments.LD.Parsing.compute_ld_statistics(
  File "/home/santang3/.conda/envs/gadma/lib/python3.8/site-packages/moments/LD/Parsing.py", line 1259, in compute_ld_statistics
    raise ValueError("Can specify only recombination or bp bins, not both")

Any idea what might be going on here? Here is my parameters file, which I modified from the example given here on on GitHub:

Input data : ./CM019101.1_4fold.vcf, ./pops.txt
Engine : momentsLD
initial_structure: 2,1
Output directory : gadma_momentsLD_results
ancestral_size_as_parameter: True
recombination_maps: ./maps
Mutation rate: 1.2e-8
Number of repeats : 10
Number of processes : 10
draw_models_every_n_iteration: 100
region_len: 2000000
ld_kwargs: {"report": True}
Sequence_length: 11067652

For context, I'm testing this out using a single chromosome with ~34K 4-fold sites. I have a genetic map for the chromosome in the ./maps folder, the first few lines of which look like:

pos Map(cM)
16995   0
17012   3.139587207101613e-5
17063   1.2558348829472266e-4
19371   0.004388034826302345
19415   0.004469294730494511

Thanks for your help!

James

Error when trying to plot from best_logLL_model_moments_code.py

Using python3 and the command python3 best_logLL_model_moments_code.py I get the following error:

'File " best_logLL_model_moments_code.py", line 7
def generated_model((nu21, nu31, nu41, nu51, t1, t2, t3, t4), ns):
^
SyntaxError: invalid syntax

File contents:

#current best params = [4286.26466398839, 428626.4663984032, 2925.898563085724, 30281.262262553322, 60.27017335837271, 11270.983729593085, 4578.233459954076, 402.4881470236583, 62.209760218362085]
import matplotlib
matplotlib.use("Agg")
import moments
import numpy as np

def generated_model((nu21, nu31, nu41, nu51, t1, t2, t3, t4), ns):
	theta1 = 1
	sts = moments.LinearSystem_1D.steady_state_1D(sum(ns), theta=theta1)
	fs = moments.Spectrum(sts)

	T = t1
	after = [nu21]
	growth_funcs = [lambda t: after[0]]
	list_growth_funcs = lambda t: [ f(t) for f in growth_funcs]
	fs.integrate(Npop=list_growth_funcs, tf=T, dt_fac=0.1, theta=theta1)

	before = after
	T = t2
	after = [nu31]
	growth_funcs = [lambda t: before[0] * (after[0] / before[0]) ** (t / T)]
	list_growth_funcs = lambda t: [ f(t) for f in growth_funcs]
	fs.integrate(Npop=list_growth_funcs, tf=T, dt_fac=0.1, theta=theta1)

	before = after
	T = t3
	after = [nu41]
	growth_funcs = [lambda t: before[0] + (after[0] - before[0]) * (t / T)]
	list_growth_funcs = lambda t: [ f(t) for f in growth_funcs]
	fs.integrate(Npop=list_growth_funcs, tf=T, dt_fac=0.1, theta=theta1)

	before = after
	T = t4
	after = [nu51]
	growth_funcs = [lambda t: before[0] * (after[0] / before[0]) ** (t / T)]
	list_growth_funcs = lambda t: [ f(t) for f in growth_funcs]
	fs.integrate(Npop=list_growth_funcs, tf=T, dt_fac=0.1, theta=theta1)

	return fs
data = moments.Spectrum.from_file('/Users/ryanc/Documents/Pink_pigeon/Population_genetics/Demography/GADMA/data/output_MK25_25prune/dadi/pop1-75.sfs')

popt = [99.99999999989834, 0.6826220013122478, 7.064720598558761, 0.014061234684068955, 2.6295585114675073, 1.068117304658926, 0.09390184194765547, 0.014513746839066071]
ns = [75]
model = generated_model(popt, ns)
N_A = 4286.264664
ll_model = moments.Inference.ll(N_A * model, data)
print('Model log likelihood (LL(model, data)): {0}'.format(ll_model))
print('Drawing model to model_from_GADMA.png')
model = moments.ModelPlot.generate_model(generated_model, popt, ns)
moments.ModelPlot.plot_model(model, 
	save_file='model_from_GADMA.png',
	fig_title='Demographic model from GADMA',
	pop_labels=['pop1'],
	nref=None,
	draw_scale=False,
	gen_time=5.6,
	gen_time_units='Years',
	reverse_timeline=True)

Add custom model specification with GADMA as a library

GADMA reads custom models for different engines. For example, one could take model specified for dadi and ask GADMA to find its parameters. GADMA has its own interface for model specification. It will be great to add opportunity to use this specification for custom models as then one custom model could be used for several engines at once.

Using linear extrapolation in dadi

Dear Ekaterina,

I'm running GADMA on my dataset using dadi as engine, and I have this error popping up:

WARNING:Numerics:Extrapolation may have failed. Check resulting frequency spectrum for unexpected results.

I previously ran dadi without GADMA and this problem can be fixed by using a linear (that is more stable) rather than a logarithmic extrapolation, however, I don't see were I can specify this option in GADMA. Is there a way to add it in the param_file ?

Thank you for your help and for this amazing tool !

Aude

Error - Set Nanc value for translation

Hi again,

I am trying to use gadma with a new dataset but I have this error when the program starts to find models.

I really appreciate your help.

[000:01:00]
All best by log-likelihood models
Number log-likelihood Model
UserWarning: Additional evaluation for theta. Nothing to worry if this warning is seldom. (/usr/local/lib/python3.8/dist-packages/gadma/engines/dadi_moments_common.py:138)
Traceback (most recent call last):
File "/usr/local/bin/gadma", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/gadma/core/core.py", line 145, in main
print_runs_summary(start_time, shared_dict, settings_storage)
File "/usr/local/lib/python3.8/dist-packages/gadma/core/draw_and_generate_code.py", line 257, in print_runs_summary
x_translated = engine.model.translate_values(
File "/usr/local/lib/python3.8/dist-packages/gadma/models/demographic_model.py", line 143, in translate_values
tr_value = var.translate_value_into(units=units,
File "/usr/local/lib/python3.8/dist-packages/gadma/utils/variables.py", line 280, in translate_value_into
raise ValueError("Set Nanc value for translation")
ValueError: Set Nanc value for translation

Custom Model Upper Lower Bounds Issue

Hello,

I am having an issue using a custom model with dadi. I specified the custom model in a python file with the function name model_func(params, ns, pts).

When I try to run it, it gives me the following error:

Error: Lengths of lower and upper bounds should be equal.

However, as far as I can tell, the length of my upper and lower bounds are of equal lengths (both 8 parameters) in the parameters file:

Lower bounds : [0.01, 0.01, 0.01, 0.0, 0.0, 0.0, 0.0, 0.0]
Upper bounds : [10, 10, 10, 5.0, 5.0, 1.0, 10.0, 10.0]

I also tried using the parameter identifiers instead and it gives me a different error:

Parameter identifiers : ['n','n','n','t','t','s','m','m']
TypeError: object of type 'NoneType' has no len()

I have attached my custom model and parameters file in case you need to see them.

Could you please look into this issue?

By the way, thank you for making this software package. It greatly simplifies using dadi and moments for users, and its functionality is amazing.

Thanks,

-Bradley

params_test.txt
Models_3D_gadma.zip

running test on installation issue (yaml?)

Hello
Ive been trying to install gadma several different ways using conda or pip and am running into this issue that i can't quite figure out. In all cases I get to the point where i receive the following errors
File "~/software/gadma_env/lib/python3.8/site-packages/gadma/core/core.py", line 51, in main
settings_storage, args = arg_parser.get_settings()

File "~/software/gadma_env/lib/python3.8/site-packages/gadma/cli/arg_parser.py", line 73, in test_args
settings_storage = SettingsStorage.from_file(TEST_SETTINGS)

several of these types of errors with the final AttributeError:
"load()" has been removed, use

yaml = YAML(typ='rt')
yaml.load(...)

Any suggestions?

IndexError when No migrations : True

Disabling migration results in the following error:
Traceback (most recent call last):
File "/anaconda3/envs/gadma/lib/python2.7/site-packages/gadma-1.0.0-py2.7.egg/gadma/core.py", line 44, in run_genetic_algorithm
ga_instance.run(shared_dict=shared_dict)
File "/anaconda3/envs/gadma/lib/python2.7/site-packages/gadma-1.0.0-py2.7.egg/gadma/genetic_algorithm.py", line 719, in run
self.print_and_draw_best_model()
File "/anaconda3/envs/gadma/lib/python2.7/site-packages/gadma-1.0.0-py2.7.egg/gadma/genetic_algorithm.py", line 470, in print_and_draw_best_model
support.print_model_code(self.out_dir, model, self.params, prefix=prefix + 'best_' + best_by.lower() + '_model')
File "/anaconda3/envs/gadma/lib/python2.7/site-packages/gadma-1.0.0-py2.7.egg/gadma/support.py", line 403, in print_model_code
os.path.join(moments_out_dir, prefix + '_moments_code' + suffix + '.py'))
File "/anaconda3/envs/gadma/lib/python2.7/site-packages/gadma-1.0.0-py2.7.egg/gadma/demographic_model.py", line 2204, in moments_code_to_file
par_labels[ms_index + 1] + ', 0]])\n')
IndexError: list index out of range

Error: GA number 1: list index out of range

Estimating current Ne population

Dear Ekaterina,

I as able to run GADMA successfully and even if the generated plot give me a global estimate of the current Ne of my population, I would like to have access to the exact number. However, I checked in the manual and I didn't find any explanation on how to use my estimates to compute the current Ne values (outside of the plot).

For instance here, my best model gave the following parameters:

Run 5 -2700.30 [Nanc = 617187] [ [ 1 pop split 3.67% (s1) [0.037(s1*_Nanc_size), 0.963((1-s1)*_Nanc_size)] ], [ 1000.0(t1), [61718746.147(nu11), 24096350.787(nu12)], [[0, 0.00e+00(m1_12)], [4.93e-10(m1_21), 0]], [Exp(dyn11), Sud(dyn12)] ] ] f (theta = 32419.38)

And µ and L were :

mu = 1.2e-08
L = 1094325

Thank you for your help,

Aude

Automatically check for args option when gadma.optimize_ga is using

Sometimes gadma.optimize_ga is using the following way:

popt = gadma.Inference.optimize_ga(
                                   data,
                                   func_ex,
                                   engine='dadi',
                                   args=pts,
                                   p_ids = p_ids,
                                   X_init= [p0],
                                   lower_bound=lower_bound,
                                   upper_bound=upper_bound,
                                   verbose=1)

In that case there is an error:
UserWarning: Error occurred during caching: evaluate() takes 3 positional arguments but 5 were given.

The correct way of usage is: args=(pts,),. It would be great to add an extra check when args are misspecified.

The error for custom model and relative parameters set to True

When custom model is given there is an error when GADMA is run. It happens in parent process that prints current progress. We need to add additional checks for custom model to prevent translation of the units. If relative parameters is set to False then everything works fine.

Units of time out by factor of 10

Hi @noscode

I noticed that the timescale in drawings didn't seem right. After looking at the units in these lines of code I think there is a factor of 10 issue. Ie .. 'thousand years' should be 0.001 not 0.01

GADMA/gadma/cli/settings_storage.py

Lines 402 to 403 in 20924a7

    
           d = {'generations': 1, 'years': 1, 'thousand years': 0.01, 
        
                'thousands of years': 0.01, 'kya': 0.01}

Specifying the number of time periods (final structure) for a single population

Hi,

I am running GADMA (with dadi) to estimate changes in population size for a single population. So far, I have used this intial and final structure and have obtained realistic looking results:

Initial structure: [1]
Final structure: [7]

However, I chose to model 7 time periods somewhat arbitrarily. Is there a rule of thumb of what Final structure to input based on the data? e.g. effective sequence length, number of samples etc. ?

If there are too many time intervals then I assume there is a risk of overfitting? Or would inputting a large final structure value not matter as models with an excess number of time periods would not be selected?

Apologies if this is mentioned somewhere in the documentation or elsewhere.

Many thanks,
Mike

Installation problem

Hello,

I am trying to install gadma in a conda env using:

conda install -c bioconda gadma

I keep getting this error:

feature:/osx-64::__osx==10.16=0
feature:|@/osx-64::__osx==10.16=0
gadma -> matplotlib-base -> __osx[version='>=10.11|>=10.12']

Your installed version is: 10.16

I am unsure how to solve this, but I am using osX 11.7 .
My version of matplotlib-base is 3.6.3 (py310he725631_0; conda-forge).

Any help would be appreciated.

Thanks.

Frank

Lower bound of split in inference of model with structure

Hello,

I've been trying to run GADMA2 to infer the demographic history of two closely related species.

I was looking for a way of setting a lower bound for the divergence time that is higher than the default. I see that there is an option for the upper bound (Upper bound of first split), but not one for the lower bound of this parameter, which I guess is the general one for time intervals (=1e-15?).

I have tried adding to the parameter file something like:

Lower bound: 10
Parameter identifiers: T

and to the extra parameter file something like:

min_T: 10

But I don't see any change in the inferred times, as I guess these options only work for custom model inference.

Is there a way I could work around this issue? How could I set a lower bound higher than the default for the split time?

Thank you in advance for your answer and please forgive me if I've missed a very obvious way of solving this issue.

All the best,

Enrico

Finding theta and calculating theta (determining L)

Dear @noscode,

I have two questions that I hope you can help with, please.

Where in the GADMA output can I find the estimated theta if I ran a model with "theta0 : None"? I then want to use this calculated theta to rescale the output of the model, but I am not sure where the theta calculated by GADMA is. I attach my parameters file and output files.
If I wanted to calculate theta independently, to then provide this theta value to GADMA, which of the following would be the correct value of L to use? I expand on this below:

I have low-coverage WGS data for 40 African buffalo (genome size ~2.7 Gb) from three different populations. The aligned reads cover a genome territory of ~2.6 Gb (calculated from the BAM files using CollecWgsMetrics in Picard). I then used ANGSD with various filters and produced an input file (via the realSFS dadi command) for realSFS2dadi.pl, which produced a dadi format SNP file with ~86 million sites (86 Mb) for use in GADMA. However, many of these sites would of course be linked, so I thinned the SNPs by selecting every 1000th site, so that I can make the assumption of unlinked loci. I therefore ended up with ~86 thousand sites (86 Kb) that I used as input for GADMA.

So getting back to my question, would the value for L to use in the formula theta = 4uL, be 2.6 Gb, or 86 Mb? Or something else entirely?

Kind regards
Deon

GADMA.log.txt
best_logLL_model_moments_code.py.txt
111_suddenonly_nomig_params.txt

mapping values are not allowed here

Hey,

My parameter file is:

Output directory: GADMA/Output_sort_all_seq
Input data: input.vcf, pop_order_new.txt
Outgroup: False
Sequence length:{HiC_scaffold_1: 100107381, HiC_scaffold_2: 105422710, HiC_scaffold_3: 84653302, HiC_scaffold_4: 71730761, HiC_scaffold_5: 82087087, HiC_scaffold_6: 72648741, HiC_scaffold_7: 206147042, HiC_scaffold_8: 75724504, HiC_scaffold_9: 42998131, HiC_scaffold_10: 56992115, HiC_scaffold_11: 68020579, HiC_scaffold_12: 69753810, HiC_scaffold_13: 174872996, HiC_scaffold_14: 125785404, HiC_scaffold_15: 95956831, HiC_scaffold_16: 92867347, HiC_scaffold_17: 25658408, HiC_scaffold_18: 72251902, HiC_scaffold_19: 70704222, HiC_scaffold_20: 58087477, HiC_scaffold_21: 66824428, HiC_scaffold_22: 65113277, HiC_scaffold_23: 99334673, HiC_scaffold_24: 206517395, HiC_scaffold_25: 119051098, HiC_scaffold_26: 169401614, HiC_scaffold_27: 144118974}

but I get the error:

File "/usr/local/bin/gadma", line 11, in
sys.exit(main())
File "/usr/local/lib/python3.6/site-packages/gadma/core/core.py", line 51, in main
settings_storage, args = arg_parser.get_settings()
File "/usr/local/lib/python3.6/site-packages/gadma/cli/arg_parser.py", line 127, in get_settings
settings_storage = SettingsStorage.from_file(args.params, args.extra)
File "/usr/local/lib/python3.6/site-packages/gadma/cli/settings_storage.py", line 1017, in from_file
return obj.update_from_file(param_file, extra_param_file)
File "/usr/local/lib/python3.6/site-packages/gadma/cli/settings_storage.py", line 972, in update_from_file
ruamel.yaml.RoundTripLoader)
File "/usr/local/lib/python3.6/site-packages/ruamel/yaml/main.py", line 951, in load
return loader._constructor.get_single_data()
File "/usr/local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 111, in get_single_data
node = self.composer.get_single_node()
File "/usr/local/lib/python3.6/site-packages/ruamel/yaml/composer.py", line 78, in get_single_node
document = self.compose_document()
File "/usr/local/lib/python3.6/site-packages/ruamel/yaml/composer.py", line 101, in compose_document
node = self.compose_node(None, None)
File "/usr/local/lib/python3.6/site-packages/ruamel/yaml/composer.py", line 138, in compose_node
node = self.compose_mapping_node(anchor)
File "/usr/local/lib/python3.6/site-packages/ruamel/yaml/composer.py", line 218, in compose_mapping_node
item_value = self.compose_node(node, item_key)
File "/usr/local/lib/python3.6/site-packages/ruamel/yaml/composer.py", line 111, in compose_node
if self.parser.check_event(AliasEvent):
File "/usr/local/lib/python3.6/site-packages/ruamel/yaml/parser.py", line 140, in check_event
self.current_event = self.state()
File "/usr/local/lib/python3.6/site-packages/ruamel/yaml/parser.py", line 603, in parse_block_mapping_value
if self.scanner.check_token(ValueToken):
File "/usr/local/lib/python3.6/site-packages/ruamel/yaml/scanner.py", line 1764, in check_token
self._gather_comments()
File "/usr/local/lib/python3.6/site-packages/ruamel/yaml/scanner.py", line 1806, in _gather_comments
self.fetch_more_tokens()
File "/usr/local/lib/python3.6/site-packages/ruamel/yaml/scanner.py", line 283, in fetch_more_tokens
return self.fetch_value()
File "/usr/local/lib/python3.6/site-packages/ruamel/yaml/scanner.py", line 656, in fetch_value
self.reader.get_mark(),
ruamel.yaml.scanner.ScannerError: mapping values are not allowed here
in "GADMA_param_sort_all1.par", line 4, column 59

Could you please help?

Thanks
Anubhab

Error when using BO

Hello Ekaterina,

I am getting the following error when I try to run GADMA using the moments engine with 5 populations using a custom model.

  File "/scrfs/storage/btm002/home/nmt/4.gadma/.gadma2/lib/python3.9/site-packages/gadma/core/core.py", line 51, in main
    settings_storage, args = arg_parser.get_settings()
  File "/scrfs/storage/btm002/home/nmt/4.gadma/.gadma2/lib/python3.9/site-packages/gadma/cli/arg_parser.py", line 127, in get_settings
    settings_storage = SettingsStorage.from_file(args.params, args.extra)
  File "/scrfs/storage/btm002/home/nmt/4.gadma/.gadma2/lib/python3.9/site-packages/gadma/cli/settings_storage.py", line 1043, in from_file
    return obj.update_from_file(param_file, extra_param_file)
  File "/scrfs/storage/btm002/home/nmt/4.gadma/.gadma2/lib/python3.9/site-packages/gadma/cli/settings_storage.py", line 1031, in update_from_file
    self.__setattr__(attr_name, loaded_dict[key])
  File "/scrfs/storage/btm002/home/nmt/4.gadma/.gadma2/lib/python3.9/site-packages/gadma/cli/settings_storage.py", line 546, in __setattr__
    get_global_optimizer(value)
  File "/scrfs/storage/btm002/home/nmt/4.gadma/.gadma2/lib/python3.9/site-packages/gadma/optimizers/global_optimizer.py", line 313, in get_global_optimizer
    raise ValueError(f"Optimizer '{id}' is not registered")
ValueError: Optimizer 'SMAC_BO_combination' is not registered

I am using the following GADMA version:

gadma --version
GADMA version 2.0.0rc26 by Ekaterina Noskova

Do you know what the issue could be? I have attached my parameters file here.

Thank you for your time.

-Bradley

nmt_gadma_params_K5.txt

Add citations at the end or at the begining of GADMA run

There are a plenty of papers that should be cited when one is using GADMA. They can be found in the documentation but it will be more useful to put this information into GADMA itself.

global optimizer falls with an error

Global optimizer that I get with get_global_optimizer('Genetic_algorithm') falls when I call opt.optimize in place where function is cached.
Link to a notebook with an issue:
https://disk.yandex.ru/d/w6KyNvY11JKtbw

ZeroDivisionError

Hello,
When I try to run GADMA, there is an error occurd:

My parameter file is :

Wrong output for one population with [1] structure

In case of demographic inference of one population with structure [1] the wrong value of fitness function if given to the parent process that prints current success. The value of the likelihood is multiplied by -1 and if the structure is increased everything is not showing.

We have to check the callbacks of optimizations when number of variables is equal to 0.

Mistake in manual re migration rates?

Hello,
In the GADMA manual on page 16 it says:
mij — migration rate from population i to population j.
I believe this is incorrect, in the moments manual page 11 it says the opposite:
The migration rates are gathered in matrix m, m[i, j] being the rate of migration from
pop 2 into pop 1.

Please correct me if I am mistaken
Thanks,
James

Keyerror

Hello,

I'm trying to run GADMA with dadi engine on a sf file of 2 pops and GADMA pops this error that I don't understand :

gadma -p param_file_Toronto_downsampled -o ./test
UserWarning: Parameters will be in genetic units (Relative parameters). Engines dadi and moments require mutation rate and sequence length for unit translation (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/cli/settings_storage.py:1360)
UserWarning: Code for momentsLD will not be generated as: VCF input data is required. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/cli/settings_storage.py:1477)
Data reading
False
WARNING:Spectrum_mod:Creating Spectrum with data_folded = True, but data has non-zero values in entries which are nonsensical for a folded Spectrum.
WARNING:Spectrum_mod:Creating Spectrum with data_folded = True, but mask is not True for all entries which are nonsensical for a folded Spectrum.
UserWarning: Spectrum file /scratch/projects/trifolium/glue/demography/glue_demography/results/gadma/Toronto/Toronto_4fold_r_u_dadi_format_good.fs is in an old format - without population labels, so they will be taken from the corresponding parameter: RURAL, URBAN. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/engines/dadi_moments_common.py:348)
Number of populations: 2
Projections: [30, 30]
Population labels: ['RURAL', 'URBAN']
Outgroup: False
--Successful data reading--

--Successful arguments parsing--

Parameters of launch are saved in output directory: /scratch/projects/trifolium/glue/demography/glue_demography/results/gadma/test/params_file
All output is saved in output directory: /scratch/projects/trifolium/glue/demography/glue_demography/results/gadma/test/GADMA.log

--Start pipeline--
Run launch number 1
RuntimeWarning: Mean of empty slice. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3474)
UserWarning: Additional evaluation for theta. Nothing to worry if this warning is seldom. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/engines/dadi_moments_common.py:138)

[000:01:00]
All best by log-likelihood models
Number  log-likelihood  Model
UserWarning: Additional evaluation for theta. Nothing to worry if this warning is seldom. (/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/engines/dadi_moments_common.py:138)
Traceback (most recent call last):
  File "/home/caizergu/.conda/envs/gadma_test2/bin/gadma", line 8, in <module>
    sys.exit(main())
  File "/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/core/core.py", line 145, in main
    print_runs_summary(start_time, shared_dict, settings_storage)
  File "/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/core/draw_and_generate_code.py", line 242, in print_runs_summary
    theta = engine.get_theta(x, *settings.get_engine_args())
  File "/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/engines/dadi_engine.py", line 169, in get_theta
    return super(DadiEngine, self).get_theta(values, pts)
  File "/home/caizergu/.conda/envs/gadma_test2/lib/python3.10/site-packages/gadma/engines/dadi_moments_common.py", line 141, in get_theta
    theta = self.saved_add_info[key]
KeyError: ((0.4519932843626896, 0.0004799626451119927, 6.59285705597225, 5.5977869812679915, 'Sud', 'Sud', 0, 0), (30, 40, 50))

I join the files I used to this message (not that I had to add the .txt extension so that github would let me attach them).

Thank you for your help,

Aude

[param_file_Toronto_downsampled.txt](https://github.com/ctlab/GADMA/files/9270591/param_fil
Toronto_4fold_r_u_dadi_format_good.fs.txt
e_Toronto_downsampled.txt)

Add documentation about sequence length as dict

For momentsLD sequence length should be set as dictionary with length for each chromosome. There is not enough documentation about it (Issue #56). We need to add it.

Mask migration error

Hi @noscode,

I'm trying to run a 3-population model with migration after the first split, but no migration between any of the populations after the second split. Between the manual and this post: #24, I came up with the following set-up for the Migration masks flag:

Migration masks: [ [[1, 1], [1, 1]], [[0, 0, 0], [0, 0, 0], [0, 0, 0]] ]

However, when I try to run this model, I get several "Run X failed due to the following exception:" messages, which then means the main run fails:

�[91mMain run failed due to following exception:�[0m multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/djager/miniconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/home/djager/miniconda3/lib/python3.7/site-packages/gadma/core/core.py", line 42, in job raise e File "/home/djager/miniconda3/lib/python3.7/site-packages/gadma/core/core.py", line 37, in job obj.run(settings.get_optimizers_init_kwargs()) File "/home/djager/miniconda3/lib/python3.7/site-packages/gadma/core/core_run.py", line 575, in run result = self.run_with_increase(initial_kwargs) File "/home/djager/miniconda3/lib/python3.7/site-packages/gadma/core/core_run.py", line 416, in run_with_increase restore_file, struct, only_models, x_transform = next(options) StopIteration """

The model works with and without migration and only throws this error when I add the Migration masks parameter. I'm not sure whether this is because the notation is incorrect, or whether is there an issue with my set-up. I attach my parameters file, the log file and the *.err file where PBSpro outputs error messages.

I'm using GADMA version 2.0.0rc10. I got a warning during installation that the ruaml.yaml version I have (0.15.46) might be a problem and I should update to version >= 0.15.48, but I was unable to do the update. However, it just seems to give a warning about arrays and the other models without migration masks are running fine as far as I can tell.

Any insight you might be able to give would be much appreciated!

211_migMask.err.txt
211_migMask_params.txt
GADMA.log

When to use Split fractions = False?

Hello,
I'm evaluating a two population split (best model pasted below), and I want to make sure I'm interpreting my results correctly. I believe this model suggests a small ancestral group with ~42k individuals that split ~2k years ago into two groups with sudden size changes (pop1= 59k and pop2 > 4million). Since then, pop1 has been exponentially decreasing in size.

Run 4 -5103.00 [Nanc = 42518] [ [ 1 pop split [4251867.877(nu_1), 183918.37(nu_2)] ], [ 2271.654(t1), [59348.457(nu11), 4251867.877(nu12)], [[0, 0.00e+00(m1_12)], [1.18e-04(m1_21), 0]], [Exp(dyn11), Sud(dyn12)] ] ] f (theta = 3314.47)

This is strange to me because how common is it that a population explodes like that suddenly? Is sudden growth after divergence a realistic conclusion? After looking into this, I noticed that the default Split fractions = False. So I tried rerunning with Split fractions = True to see if this made more logical sense (new best model pasted below).

Run 3 -5103.17 [Nanc = 42537] [ [ 1 pop split 22.21% (s1) [0.222(s1*_Nanc_size), 0.778((1-s1)*_Nanc_size)] ], [ 2261.231(t1), [252097.377(nu11), 4253773.808(nu12)], [[0, 0.00e+00(m1_12)], [1.18e-04(m1_21), 0]], [Sud(dyn11), Sud(dyn12)] ] ] f (theta = 3315.95)

So now, the ancestral group with ~42k individuals splits into 2 populations that make sense (pop1=22% and pop2=78% of ancestral group). However, now pop1 is not exponentially decreasing, rather it is increasing (from ~9k individuals to ~252k individuals). I tried to rerun the models a few more times, but it always shows that with Split fractions = True, my pop1 is huge after divergence then dramatically decreases. But with Split fractions = False, my pop1 starts out small then increases.

So my question is- when is it a realistic scenario to use the default Split fractions = False? It seems odd to think about an ancestral population diverging and then both new populations starting out suddenly big. Or is there an explanation for this? And for my scenario, would you recommend to make conclusions for pop1 based on Spit fractions = True?

Thank you in advance!

Model with no population split?

Hello,
Does GADMA consider models with no population split?

I am comparing two fish populations from distant locations that show little genetic differentiation, and I'm not actually sure if they are divergent populations. When I run GADMA, the best model shows a recent split with migration. I've tried running with only Initial structure [1,1], and I tried with both Initial structure [1,1] and Final structure[2,1], and both outputs show a recent split with migration. I'm wondering if a model with no split was considered? Or is my best model better than one with no split?

Thanks in advance!

how to modify the plot?

Hi everyone

at the end of the Gadma run I got this image (see attachment) where population B and C seems lumped, and it is not possible to see the arrows of the gene flow. I would like to kindly ask if there is some way to change this image (maybe modifying the .py code file) to solve this issue?

best regards

Nicolas

Missed update on the main output for demographic inference of one population with structure [1]

A special case of demographic model with single parameter of N_anc and moments engine. It should be one iteration with implicit evaluation of N_anc but there is a full number of iterations. Everything works fine but there is no update in main output.

TypeError: unsupported operand type(s) for /: 'NoneType' and 'float'

Hi Ekaterina,

I am sorry for bothering you so much. However, I ran into another error when trying to run the moments engine with 3 populations.

TypeError: unsupported operand type(s) for /: 'NoneType' and 'float'

The full traceback can be found at the bottom of the attached *.err.txt file, and my params file and fs file are also attached. Do you know what could be wrong? It seems like the error is occurring during the search when either theta or theta0 is NoneType.

The model seems to run for a while before encountering this error. I have tried using both the optimize_powell and BFGS optimizers.

Thank you for your time.

Here is the end of the traceback:

    return super(MomentsEngine, self).get_N_ancestral(values, dt_fac)
  File "/scrfs/storage/btm002/home/nmt/4.gadma/.gadma2/lib/python3.9/site-packages/gadma/engines/dadi_moments_common.py", >
    return self.get_N_ancestral_from_theta(theta)
  File "/scrfs/storage/btm002/home/nmt/4.gadma/.gadma2/lib/python3.9/site-packages/gadma/engines/dadi_moments_common.py", >
    return theta / theta0
TypeError: unsupported operand type(s) for /: 'NoneType' and 'float'

nmt_gadma_params_K3.txt
R-nmt_gadma2_K3_2.217345.err.txt
nmt_K3.fs.txt

Check why tests fail with new moments engine

moments was updated and now some tests fail. For example, fsc file is read and mask is now different. Should check that and update tests.

bootstrap for linked SNPs

Hello,

I am a bit confused about how to work with the bootstrap data.

I am generating a 3dsfs from whole genome resequencing data in angsd/winsfs in both the dadi format (using realsfs dadi) and moments format using a conversion script.

Would it be ok to generate the bootstrap data using the command below in angsd:
realSFS pop.saf.idx pop.saf.idx pop.saf.idx -bootstrap 100

Then convert all the boostraps to the moments format and include them in the directory for bootstrap data?

Would you recommend this or generating the 3dsfs from a set of unlinked SNPs or selecting a set of unlinked SNP in the dadi SNP file? or would either be fine?

Thank you,

Marie

Theta0, popt, and divergence Times

Hello,

I am having an issue plotting the GADMA results. Mainly, the divergence times are pretty far off where I expected them to be. I am not 100% sure that I am doing it correctly, so I have explained what I did below. Could you please let me know if I am doing something incorrectly?

I ran GADMA with the default theta0 (1.0). Per the GADMA manual, I then tried to re-scale popt based on the new theta0. However, theta0 seems really small compared to the examples in the manual. My calculated theta0 (4 * u * L) is:

theta0 = 0.00041795, where u = 6.93972E-10 and L = 150564.2833
I calculated u per the GADMA manual as well, where the estimated divergence time is ~18.6 million years, generation time is 10 years, and the sequence divergence (Dxy, averaged among all ddRAD loci, calculated using DNAsp) = 0.002581689.
The parameters for the best GADMA model (3 populations) are:

s0, nu21, nu22, nu31, nu32, s1, nu41, nu42, nu43, t1, t2, t3, m1_12, m1_21, m2_12, m2_21, m3_12, m3_13, m3_21, m3_23, m3_31, m3_32 = params

I have tried using the popt parameters as found in the best_aic_model_moments_code.py file (with theta0 not changed and = 1.0), and I got a divergence time that is too long (~39 million years) when using popt with theta0 = 1.0.
However, if I try to re-scale the popt parameters based on theta0 = ~0.0004, I get impossible divergence times. I tried re-scaling popt like so:

s0 and s1 stayed the same  
nu and t parameters got divided by 0.0004 (alpha = 0.0004 / 1.0 = 0.0004)
m parameters got multiplied by 0.0004 (alpha = 0.0004 / 1.0 = 0.0004)

The divergence time was then >800 million years, which is not possible.

When plotting the results, I calculated Nref by doing:

theta = moments.Inference.optimal_sfs_scaling(model, data)
Nref = theta / theta0
I then plotted using moments.ModelPlot.plot_model() with the above Nref (=693039.355637993).

Is there anything that I am doing wrong or missing?

My params, extra_params, GADMA.log, python, and input FS files are attached in case you need them.

output_files.zip

DS-MX-ON.zip

Thanks for your time.

-Bradley

Add documentation about migration rates

There is no information about physical units of the migration rates in GADMA. Issue #78
We need to add this information in the documentation.

failed to generate code

Hi @noscode ,
I am having problems when working with vcf files and popmaps.
Every time that I run gadma I return:

Run 1 warning: failed to generate code due to the following exception: 'generator' object has no attribute 'remove'

You can find the picture of the best model in the output directory.

Also, when checking the output directory, the parameters files (params_file and extra_params_file) never change

Any insight you might be able to give would be much appreciated!

Greetings,
Alex

restrict the number of migration

Hi Ekaterina,

The GADMA is really helpful to infer the best model. But sometimes, I want to restrict the number of migration, and the time interval of the migration. Is there any why to do that?
For example, I have there population. And I want only one migration event. And It should be happend after the second split. And furthermore, It should happend between the second and the third population.
Is there anyway to do that (both the three condition or one of them)?

Best,

Yudong Cai

	d = {'generations': 1, 'years': 1, 'thousand years': 0.01,
	'thousands of years': 0.01, 'kya': 0.01}

ctlab / gadma Goto Github PK

gadma's People

Contributors

Stargazers

Watchers

Forkers

gadma's Issues

Recommend Projects

Recommend Topics

Recommend Org