angsd / ngsrelate Goto Github PK

View Code? Open in Web Editor NEW

39.0 39.0 11.0 8.53 MB

License: GNU General Public License v2.0

C++ 90.44% Makefile 1.26% R 6.34% Shell 1.95%

ngsrelate's Introduction

===== Program for analysing NGS data.

http://www.popgen.dk/angsd

Installation:

Using a local folder containing htslib

#download htslib
git clone --recurse-submodules https://github.com/samtools/htslib.git;
#download angsd
git clone https://github.com/angsd/angsd.git;

#install htslib
cd htslib
make

#install angsd
cd ../angsd
make HTSSRC=../htslib

Systemwide installation of htslib

git clone https://github.com/angsd/angsd.git;
cd angsd; make HTSSRC=systemwide

Using htslib submodule

git clone https://github.com/angsd/angsd.git;
cd angsd; make

Notes

I've switched over to using htslib for parsing single reads (to allow for CRAM reading, while avoid having to write my own CRAM parser). I'm still using my own readpools. Users should therefore also download and install htslib.
If you are on a mac computer and the compilation process complains about a missnig crybtolib library then do 'make CRYPTOLIB=""'

Program has a paper

http://www.biomedcentral.com/1471-2105/15/356/abstract

ngsrelate's People

Contributors

Stargazers

Watchers

Forkers

rareseas aalbrechtsen idamoltke genomicsiter nicklohr tayyub-png niki3502 stephenturner difiore novapyth abigailramsoe

ngsrelate's Issues

minor request to update readme

Hi,
Thank you for maintaining this software. In my first attempt at running the program I received an error when trying to use a .vcf file. After searching through the issues posts I came across this thread which resolved my problem: if invoking the -T parameter to read from the GT field of the .vcf file, you also need to invoke the -c parameter.
Perhaps I'm mistaken with these parameters, but if this is in fact true, it would be helpful to make a minor update to the readme/documentation when using vcf files to remind users to use those flags (if needed). Maybe something like this?:

./ngsrelate  -h my.VCF.gz -O vcf.res
./ngsrelate  -h my.VCF.gz -O vcf.res -T GT -c    ## if specifying values from GT field in .vcf

Cheers,
Devon

Problem reading full chunk error

Hi,

I am using ngsRelate version 2, and getting an error that hasn't already been raised in issues. I followed the template on the GitHub page. My code:

angsd -b samples4relatedness.txt -out angsd4relatedness -gl 2 -domajorminor 1 -snp_pval 1e-6 -domaf 1 -minmaf 0.05 -doGlf 3 -P 15

zcat angsd4relatedness.mafs.gz | cut -f5 | sed 1d > freq.txt

ngsRelate -g angsd4relatedness.glf.gz -n 100 -p 15 -f freq.txt -O output.res

I get the following error:

        -> Frequency file: 'freq.txt' contain 1458673 number of sites
        -> Problem reading full chunk

Thanks in advance,
Adam

Allow GL PL choosing together with AF or recalculated freq and minfreq

ngsrelate memory-issue?

Hello,
I try to run ngsrelate on a some large files and cannot get the run complete due to RAM overflow.

files:
73G test_gwas.glf.gz
10G freq

command:

/home/mmoser/NgsRelate/ngsRelate -g test_gwas.glf.gz -n 22 -f freq > gl.res

error:

 -> Frequency file: 'freq' contain 1177564412 number of sites
/opt/sge/default/spool/binfservas08/job_scripts/283373: line 3: 124295 Killed                  /home/mmoser/NgsRelate/ngsRelate -g test_gwas.glf.gz -n 22 -f freq > gl.res

Is there anyway to minimize memory usage for ngsrelated?

Thanks,
Michel

use SM tag if using vcf/bcf

running program with -L information, without MAF info

Hi Kristian,

Similar to issue #20 I'm having a related problem getting ngsRelate to run with a beagle file as the input data type. When I run:

./ngsRelate -g my.beagle.gz -n 184 -L 1831344 -O myluNGSrelate.res -p 22

I receive the following error:

        -> Seed is: 1972650776
        -> Allele frequencies file (-f) is not provided. Only summary statistitics based on 2dsfs will be reported
        -> Problem reading full chunk

Unlike the earlier post, I didn't import the mafs.gz information from the ANGSD output - I was just supplying the number of sites with the -L argument. There are 555 columns in the beagle.gz file, which I believe is what I'd expect with 184 individuals. Just to confirm: the value I used as input in the -L argument (1831344) represents the number of sites, which should be one less than the total number of lines in the .beagle.gz file (1831345), correct?

The same .beagle file has been used in other applications in the ANGSD family (ngsAdmix and PCAngsd), so I do not suspect that the file is corrupt.

Thanks for any troubleshooting advice you can offer.

Devon

Problem with guess in emStep: -nan -nan -nan -nan

I am receiving the following error when I run ngsRelate with the glf.gz file as input:

Problem with guess in emStep: -nan -nan -nan -nan

The command that I am using is:
./ngsRelate -g ~/genotyping/angsdmPCR_noMAF.glf.gz -p 10 -n 284 -f ~/genotyping/freqmPCR_noMAF -O res_bams_mPCR_noMAF

I have attached a copy of the glf file. Please let me know if you have any idea what the issue could be.

Many thanks

angsdmPCR_noMAF.glf.gz

make exit code if all sites are missing.

"Problem with guess in emStep"

Hello,

I ran the following command:

/usr/local/softw/NgsRelate/ngsRelate -h infile.vcf -T GT -O play.res

Got the following output:

-> Seed is: 101258788
-> Will use TAG: 'GT' from the VCF file
-> Will use TAG: 'AFngsrelate' in the VCF file as allele frequency if present. Otherwise allele frequencies are estimated from the data
-> readbcfvcf seek:(null) nind:2
-> [file='1000G_CEU_MAF0.45_200K_merged_CEPH.vcf'][chr='(null)'] Read 82626 records 82437 of which were SNPs number of sites with data:82437
-> nind:104 overall_number_of_sites:82437
-> Done reading data from file: 2.38 3.00
-> Starting analysis now
-> length of joblist:5356

Problem with guess in emStep: 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

I looked at the sourcecode where the error is but could not figure out the cause. Can you please help?

Online help is not the same as the print out on terminal for version 23

Some key commands are missing e.g. -G so perhaps the online manual on github should be updated?

allow for basic filtering with bcf input (single chromsomes etc)

rmTrans option

I'm quite interested in using the remove transitions option mentioned in the options list, but I cannot seem to get it to work. I tried various configurations, e.g. :

./ngsrelate extract_freq .mafs.gz .pos.glf.gz -rmTrans
./ngsrelate extract_freq .mafs.gz .pos.glf.gz -rmTrans 1
./ngsrelate -f .freq -g .pos.glf.gz -rmTrans
./ngsrelate -f .freq -g .pos.glf.gz -rmTrans 1

However, they all appear give the same results as without -rmTran (e.g. identical number of sites per pairwise comparison and essentially identical values for metrics). Have I missed something? I can rerun ANGSD with -rmTrans, but it would be quite handy to include this option for ease.

Cheers,
Nathan

Wrong results?

I ran this command:
/usr/local/softw/NgsRelate/ngsRelate -h input.vcf -T GT -O results.res -c 1

Here is the relevant part of the result

a b nSites J9 J8 J7 J6 J5 J4 J3 J2 J1 rab Fa Fb theta inbred_relatedness_1_2 inbred_relatedness_2_1 fraternity identity zygosity 2of3_IDB FDiff loglh nIter coverage 2dsfs R0 R1 KING 2dsfs_loglike 2dsfsf_niter

100 101 26688 0.154319 0.125469 0.000000 0.016806 0.092420 0.000582 0.102905 0.000000 0.507499 0.716727 0.610986 0.616725 0.636528 0.558951 0.553709 0.000000 0.507499 0.507499 0.774252 -0.008112 -44505.357314 5000 0.323738 5.113872e-01,5.571375e-02,1.177190e-02,5.602682e-02,6.669632e-02,6.594131e-02,1.065887e-02,6.341072e-02,1.583932e-01 0.336312 0.253095 0.058306 -42959.275380 10

indiv.s 100 and 101 are siblings. Theoretically, for a sibling pair I would expect k0=0.25, k1=0.5, k2=0.25

However that is not what I see. Similarly for theta, which should be 0.25 but calculated as 0.636528.

Can you please help?

I tried -i 10000 but didn't help.

Thanks a lot.

nIter = -1? Actually bestoptimll = -1?

I think ngsrelate 2 may've been updated but the tutorial has not. The tutorial talks about why "nIter" might = -1, but in my results, I see a lot of "bestoptiml" = -1 results.

If there are a ton of -1 results, does that mean anything about the relatedness results? Is there any reason to change the -i or -t options to extent the number of iterations or restrict the tolerance? For that matter, what are the default values for those options? Thanks!

Runtime

Hello,

I am running the following using a vcf file output from ANGSD (output was bcf and converted to vcf.gz). AF was calculated from ANGSD.

I have ~11 million sites total, 7 million above 1% MAF, and 6 million above 5% MAF.

NgsRelate-2.0/ngsRelate -h GPs/Filt6b_Post.vcf.gz -A AF -O Filt6b_Post.vcf.res

What would you recommend in order to have a reasonable runtime?

[Question] Merging ANGSD Outputs

Hi everyone,

Firstly, thanks for this software, it works amazingly.

And to my question:
I was wondering if it was possible to merge GLF files from two different ANGSD runs, with identical settings, and then run this merged GLF through NGSRelate?

I want to do this because it seems inefficient and costly ( running in cloud) to re-run ~100 samples plus whatever new dataset I want to check relatedness with.

The ANGSD command in question.

./angsd -b Embryos.beagle-test.bamslist -out BeagleTest-embryos -gl 2 -domajorminor 3 -snp_pval 1e-6 -domaf 1 -minmaf 0.05 -doGlf 2 -nThreads 6 -nLines 140 -checkBamHeaders 0 -sites 1240K_hg38_sites.txt

Let's say I have two sets of data.

References.glf.gz
References.mafs.gz

NewSamples.glf.gz
NewSamples.mafs.gz

Is this as trivial as:

zcat References.glf.gz NewSamples.glf.gz .... | gzip - >merge.glf.gz

Can I avoid the frequency files by just passing -L ?
Also, I will be testing this with the beagle format...

Any feedback would be much appreciated!

PL field in VCF

Hi,

I would like to apply NgsRelate to a vcf obtained with ATLAS (task=call method=MLE). I am able to process this vcf with bcftools and the analysis by NgsRelate using GT tag works. However, I would like to test it parsing PL, but I got this error

Problem with guess in emStep: nan nan nan nan nan nan nan nan nan

This is how PL is reported in my vcf

##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Phred-scaled normalized genotype likelihoods">
1 10234 . C T . . DP=3;AC=2;AN=6 GT:GQ:AD:GL:PL:AB:AI ./.:.:.:.:.:.:. 1/1:3:0,1:-10,-0.30103,0:103,3,0:0:0.5 0/0:3:1,0:0,-0.300987,-4:0,3,40:1:0.5

Could you please tell me if this is the correct format required by NgsRelate?
Thanks for your help.

BW,
Maria Angela

extract_freq_bim error

Hi,

I would like to use NgsRelate on a set of 8 samples, with allele frequencies from a population in PLINK format.
I used the latest github package, and followed guidelines reported here http://www.popgen.dk/software/index.php?title=NgsRelate&oldid=694#Run_example_2:_using_NGS_data_with_population_frequencies_estimated_from_genetic_data_from_PLINK_files

This is the last command line I launched
/home/bin/NgsRelate/ngsRelate extract_freq_bim angsd_out.glf.pos.gz plink.out.bim plink.out.frq > freq

I got this error message

posfile:angsd_out.glf.pos.gz bimfile:plink.out.bim ffile:plink.out.frq
ngsRelate: filereaders.cpp:212: posMap getBim(char*, char*): Assertion `rMap.find(rs)==rMap.end()' failed.
Aborted (core dumped)

Could you please help me understanding what's wrong?
Thanks for your attention
Best wishes,
Maria Angela

Problem with guess in emStep

Hello
I am trying to bootstrap my relatedness estimations.
I am using the command: ~/software/NgsRelate/ngsRelate -f freq -g angsdput.glf.gz -O test -L 521328 -n 81 -a ind1 -b ind2 -B 100

For most of the files (I am looping over ~200 individual pair combinations which are a subset of the whole callset contained in the .glf and .maf files) I get this error:

Problem with guess in emStep: -nan -nan -nan -nan -nan -nan -nan -nan -nan

I get the error when I loop over a list, and also when I specify individuals as a single command.

For 5 of the comparisons it seems to work fine (estimation has been made 101 times, as I guess the bootstrapping is zero indexed...). Can't work out what is going on.

Please could I trouble you for some assistance?

I have attached a Drive link to a compressed folder with the data and the analysis script (bash wrapper):
https://drive.google.com/open?id=1eraeCDVdOa-idliPg00TgMDkeRLM8zTz

Thank you!

Tristan

Interpreting the inbreeding only Output.

Hello,

I am analyzing a population of 72 individuals, 18 adults and 54 of their offspring, and I am trying to determine of there was any inbreeding before moving on to the rest of my analysis. I did the inbreeding-only option "-F 1", but I am a little confused about how to interpret the results. Are Z0 and Z1 values of 1 and 0 indicative of no inbreeding? If so, what do numbers other than those mean? An explanation of the other values would also be appreciated.

Thanks

Make option to print IDs

Hej Thorfinn. Note så vi husker det: Det ville være et hit at kunne få den til også at printe individ IDs (hvis brugeren giver dem) :) Ida

check output is formattet correctly correct order

Coverage

Hello,

NGsrelate provides as output a "coverage" statistic.
Could you please tell me more about how it's calculated?
Seems like this is the proportion of sites sequenced in the pair of sample?

Thanks!

Muriel

verbose output

Thank you for continuing to update the program! I was exploring some parameters, and I found I couldn't get the verbose option to work. Using -v gives a message of "invalid option" so it would be great if you could check that. I also tried to add an integer (-v 1) but it didn't produced the same message.

Thanks!
Nathan

unexpected high theta values

Hi,

I am testing NgsRelate2 on some related samples from the 1000 Genomes Project, specifying allele frequencies from European 1000 Genomes population. It worked very well.
Then I generated simulated samples with a coverage ranging from 1 to 5X and in all cases only KING was able to correctly predict the degree of relationship. On the other side, theta values resulted very high (ranging from 0.76 to 0.82), suggesting identical individuals.
Could you please help me to understand the reason I am getting these results?

Thank you,
BW

Maria Angela

elevated KING relatedness

Hello,

I was looking at my out file from ngsrelate. After looking at my KING values, more than 3/4 of my samples are half-sib or more. I plotted R0-R1 and R1-KING.

they look very strange to me. When I ran angsd I ended up with my MAF frequencies in column 6 (below). Publications that use ngsrelate don't do a good job about their parameters or what value of relatedness they are using. I was wondering if you had any insight into my problem? I think I might be running the script incorrectly, but have no way of testing it.

I know this isn't an issue with the program, but appreciate any advice.

--Josh Hallas

ANGSD script:

angsd -nThreads 16 -bam md.list -ref finalmuledeergenome_filtered.fasta -out md_jul18 \
      -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -C 50 -baq 1 \
      -minMapQ 20 -minQ 20 -minInd 74 -setMaxDepth 400 -doCounts 1 \
      -GL 2 -doGlf 2 -doMajorMinor 1 -doMaf 1 -minMaf 0.05 -SNP_pval 1e-6 \
      -doGeno 16 -doPost 1 -doDepth 1 -dumpCounts 2

MAF file:

chromo  position        major   minor   ref     knownEM pK-EM   nInd
HiCscaffold1pilon       53902   G       A       G       0.087058        0.000000e+00    160
HiCscaffold1pilon       53926   A       G       A       0.112653        0.000000e+00    160
HiCscaffold1pilon       104909  C       T       C       0.108280        0.000000e+00    160
HiCscaffold1pilon       105737  C       T       C       0.089310        0.000000e+00    160
HiCscaffold1pilon       105746  C       T       C       0.167795        0.000000e+00    160
HiCscaffold1pilon       105753  G       T       G       0.110993        0.000000e+00    160

ngsRelate script:

NGSRELATE -G md_jul18.beagle.gz -f relatedness_md.freq -z relatedness_sample_ID.list -n 160 -O out.ld -p 4 -i 50000

Interpretation of relatedness (rab) coefficient

Hi,

I used your software following your guidelines. I also used the R script you provided to plot results. Please find attached the relatedness plot (these results were quite expected from a previous analysis).
NgsRelate_relatedness.pdf

Could you please tell me if the first range on the right, from 0.5 to 1.0, corresponds to first degree relationship or monozygotic twins?

Thank you.
Best,
Maria Angela

possible to convert beagle to binary beagle (doGlf 2 to doGlf 3) or use beagle file as input?

Hello!
I am using NgsRelate as part of a large project, and I was wondering if there is a way for me to use the beagle files I have already generated rather than running ANGSD again just to get a "binary beagle" (output of -doGlf 3) file for NgsRelate input. The other things I am doing tend to use the beagle output from -doGlf 2, so I already have these made.

I know I can just have ANGSD calculate genotype likelihoods again and output them in the -doGlf 3 format, but my project uses many species with full whole genomes and this will add up to a lot of time and computation to repeat calculations that have already been done.

Based on the NgsRelate documentation it seems like the "binary beagle" format is the only one that will work, is that correct?

If that's the case, is there a way to convert a beagle file (output of -doGlf 2 in ANGSD) to a binary beagle file (output of -doGlf 3 in ANGSD)? The two file formats carry the same information, it's just that one is binary, right? (As far as I can tell from the ANGSD documentation about the -doGlf 3 option, which is very brief).

Thanks!
-Teresa

make threadpool to facilitate high number of cores

Issue with glf file - too many sites

Hi,
I am trying to make a kinship matrix from 47 individuals. I am running the following code:

angsd -b bams_qst_adults -gl 2 -domajorminor 1 -snp_pval 1e-6 -domaf 1 -minmaf 0.05 -doGlf 3 -out kinMAT
zcat kinMAT.mafs.gz | cut -f5 |sed 1d > freq
./ngsRelate -g kinMAT.glf.gz -n 47 -f freq -O kinMAT

Which produces the following error file:

./ngsRelate -g kinMAT.glf.gz -n 47 -f freq -O kinMAT
-> Seed is: 1211251308
-> Frequency file: 'freq' contain 139458 number of sites
-> Too many sites in glf file. Looks out of sync, or make sure you supplied correct number of individuals (-n)
-> Or that the number of sites provided (-L) it is correct

I double checked that my file list contains 47 individuals. Any assistance you can provide would be greatly appreciated.
Thanks!

How does NgsRelate handle "missing data?"

Hi, I was wondering how NgsRelate handles missing data. I am interested in the pairwise SFS's that NgsRelate creates. In the beagle file input, there may be some loci where both individuals of a pair have missing data, i.e. that the genotype likelihoods are 0.33/0.33/0.33 for both individuals.
How does this influence the result? Could missing data end up making individuals look more similar to each other (because they have the "same" genotype at many places) than they would if they had higher coverage?
NgsDist has an option "--pairwise_del: pairwise deletion of missing data." to get rid of sites where the three genotypes are the same. Is there something similar that can be done with NgsRelate?
Thanks!
-Teresa

No output

autocorrelation?

Hello and thank you for this nifty program! I'm using it to determine relatedness among three sympatric populations (within ~5mi radius) across two years and find an extremely high level of relatedness among all individuals. I'm using the r0, r1 and king-robust values for visualization but I'm concerned at the seemingly perfect correlation between these values among the individuals. Below is the angsd code used to generate the likelihoods. The reference genome is large at >2Gb, so I mapped the low coverage reads (1-3x) to the first half of the genome.

#map fastqs to reference:
bwa-mem2 mem -t32 noCont.fasta "$in1" "$in2" | samtools view -bS - | samtools sort - > sample."$z".bam

# did not mark duplicates, could this drive the difference?

#angsd.
angsd -b 176.temp -gl 2 -nInd 176 -minInd 158 -doCounts 1 -setMaxDepth 1580 -rf chrom5.rf.list -setMinDepth 320 -setMinDepthInd 1 -setMaxDepthInd 10 -domajorminor 1 -snp_pval 1e-6 -domaf 1 -minmaf 0.05 -doGlf 3 -nThreads 8 -out chrom5_90pc

#ngsrelate.
ngsRelate -g chrom5_90pc.glf.gz -n 176 -z 176.20201.list -f 179_chrom5.freq -O 179_chrom5_out

fix makefile

Problem with guess in emStep: 0.350616 0.350616 ... (unrelated to previous posts)

Hi,

I am trying to run ngsRelate with a population of 6 ind. using the ANGSD -doGlf3 output, i.e. the glf.gz and mafs.gz files.

Unfortunately, when running
zcat WES_All.mafs.gz | cut -f6 | sed 1d > freq
ngsRelate -g WES_All.glf.gz -n 6 -f freq -O WES.ngsRelate

ngsRelate -g WES_All.glf.gz -n 6 -L 8397285 -O WES.ngsRelate

I receive the output (or something along the lines of):

-> Seed is: 1908108041
-> Frequency file: 'freq' contain 8397285 number of sites
-> nind:6 overall_number_of_sites:8397285
-> Done reading data from file: 10.68 11.00
-> Starting analysis now
-> length of joblist:15
Problem with guess in emStep: 0.351857 0.351857 0.351857 0.351857 0.351857 0.351857 0.351857 0.351857 0.351857

Here, the values are all the same and not "nan" or "0", as previous users have reported.

Also, when I run the first of the above commands with -F 1, everything completes

-> Seed is: 103441962
-> Frequency file: 'freq' contain 8397285 number of sites
-> nind:6 overall_number_of_sites:8397285
-> Done reading data from file: 10.91 11.00
-> Starting analysis now
-> length of joblist:6
[ALL done] cpu-time used = 68.76 sec (filereading took: 10.91 sec)
[ALL done] walltime used = 41.00 sec (filereading took: 11.00 sec)

but all inbreeding values are 0.

Ind Z=0 Z=1 loglh nIter coverage sites
5 1.000000 0.000000 -6179642.753674 22 0.999557 8393569
4 1.000000 0.000000 -6080941.970638 -1 0.999600 8393929
0 1.000000 0.000000 -6069533.186939 -1 0.999605 8393972
1 1.000000 0.000000 -6159287.329214 -1 0.999566 8393640
2 1.000000 0.000000 -6116124.318296 -1 0.999586 8393811
3 1.000000 0.000000 -6105983.497060 -1 0.999584 8393792

I was wondering what is going wrong here (don't believe in a bug ...) or whether you have encountered such results before.

Best

Add a LICENSE file

The original and v2 paper both note that ngsRelate is GPL.

Something wrong with freq

Hi,

I am trying to run the following command: ./ngsRelate -g der.glf.gz -n 32 -f derfreq -O masters

I have followed the instructions from angsd to convert the necessary files and installed ngsRelate, however, when I run the command I get the following error:

-> Seed is: 1469169591

something is wrong with derfreq

There is no other information provided. Why would I be getting this error?
Thank you

Bad results starting from beagle files?

Hello,
I am trying to run ngsRelate starting from a beagle file (option -G) obtained with ANGSD and a bit more than 14 M SNPs.
Here is the head of the beagle (first 8 SNPs with 10 individuals)
marker allele1 allele2 Ind0 Ind0 Ind0 Ind1 Ind1 Ind1 Ind2 Ind2 Ind2 Ind3 Ind3 Ind3 Ind4 Ind4 Ind4 Ind5 Ind5 Ind5 Ind6 Ind6 Ind6 Ind7 Ind7 Ind7 Ind8 Ind8 Ind8 Ind9 Ind9 Ind9 pcla8_s000007_103 1 0 0.999494 0.000506 0.000000 0.000708 0.999292 0.000000 0.000000 1.000000 0.000000 0.999491 0.000509 0.000000 0.999992 0.000008 0.000000 0.004439 0.995561 0.000000 0.999871 0.000129 0.000000 0.000000 1.000000 0.000000 0.998993 0.001007 0.000000 0.999746 0.000254 0.000000 pcla8_s000007_131 0 3 0.999746 0.000254 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.999873 0.000127 0.000000 0.999998 0.000002 0.000000 0.000060 0.999940 0.000000 0.999491 0.000509 0.000000 0.000000 1.000000 0.000000 0.999489 0.000511 0.000000 0.999746 0.000254 0.000000 pcla8_s000007_140 3 1 0.999936 0.000064 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.999747 0.000253 0.000000 0.999984 0.000016 0.000000 0.000000 1.000000 0.000000 0.998991 0.001009 0.000000 0.000000 1.000000 0.000000 0.984350 0.015650 0.000000 0.999873 0.000127 0.000000 pcla8_s000007_224 0 3 0.998000 0.002000 0.000000 0.997995 0.002005 0.000000 0.999968 0.000032 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000005 0.999995 0.000000 0.998000 0.002000 0.000000 0.000000 0.999998 0.000002 0.000000 0.000506 0.999494 pcla8_s000007_248 1 0 0.969147 0.030853 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.999992 0.000008 0.000000 0.999998 0.000002 0.000000 0.000000 1.000000 0.000000 0.995953 0.004047 0.000000 0.000038 0.999962 0.000000 0.995985 0.004015 0.000000 0.999490 0.000510 0.000000 pcla8_s000007_257 1 2 0.984251 0.015749 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.999984 0.000016 0.000000 0.999984 0.000016 0.000000 0.000001 0.999999 0.000000 0.995953 0.004047 0.000000 0.017529 0.982471 0.000000 0.995985 0.004015 0.000000 0.999489 0.000511 0.000000 pcla8_s000007_258 0 1 0.992033 0.007967 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.999984 0.000016 0.000000 0.999984 0.000016 0.000000 0.000000 1.000000 0.000000 0.995981 0.004019 0.000000 0.017524 0.982476 0.000000 0.995989 0.004011 0.000000 0.999492 0.000508 0.000000 pcla8_s000007_275 2 3 0.984213 0.015787 0.000000 0.000006 0.999994 0.000000 0.000000 1.000000 0.000000 0.998990 0.001010 0.000000 0.999984 0.000016 0.000000 0.000000 1.000000 0.000000 0.991990 0.008010 0.000000 0.001770 0.998230 0.000000 0.999743 0.000257 0.000000 0.999872 0.000128 0.000000

here is the freq file obtained from .mafs.gz for the same sites
0.199546 0.200035 0.200432 0.350231 0.201001 0.198874 0.198675 0.200459

The log file looks like -> Seed is: 396728323 -> Frequency file: '/mnt/lustre/scratch/jbledoux/ANGSD/25-10-2022/Altare/Altare_freq' contain 14460747 number of sites -> Beagle - Reading from: /mnt/lustre/scratch/jbledoux/ANGSD/07-10-2022/Altare/1_Altare_SNP-no-missing_filtering-SITES.beagle.gz. Assuming 10 Ind and 14460747 sites -> Beagle - done processing 14460747 sites -> nind:10 overall_number_of_sites:14460747 -> Done reading data from file: 119.44 120.00 -> Starting analysis now -> length of joblist:10 [ALL done] cpu-time used = 232.02 sec (filereading took: 119.44 sec) [ALL done] walltime used = 166.00 sec (filereading took: 120.00 sec)

and results are for instance when estimating inbreeding only Ind Z=0 Z=1 loglh nIter coverage sites 0 1.000000 0.000000 -7134469.657723 -1 0.686355 9925209 1 1.000000 0.000000 -7033090.765785 -1 0.683340 9881608 2 1.000000 0.000000 -7169492.084132 -1 0.692961 10020737 3 1.000000 0.000000 -7247286.059557 -1 0.696883 10077449 4 1.000000 0.000000 -7132969.390517 21 0.694411 10041702 5 1.000000 0.000000 -7292105.232651 18 0.697278 10083157 6 1.000000 0.000000 -7451324.151384 -1 0.693961 10035196 7 1.000000 0.000000 -7224298.675603 -1 0.692677 10016624 8 1.000000 0.000000 -7045421.842850 -1 0.685920 9918913 9 1.000000 0.000000 -7471602.681979 -1 0.693180 10023897

It looks like the job does not start and I can not figure out why? I make test and everything seem ok.
Any suggestion is very welcome.
Thank you
Jean-Baptiste

please describe the new output

Hi folks - New ngsRelate version from June 2018 generates a lot more output than the previous one - looks super fun but I could not find any description of what all that stuff is... thanks a lot in advance!
Misha

output progress with many threads

cleanup spill files

correct inbreeding and relatedness estimate using fst

It is possible to correct inbreeding and relatedness estimates in structured populations using fst estimates. This should be implemented and documented.

add an extra \n at end of program to the stdout

check ngsrelatev1 with the 3jaqs only still work

make proper documentation about which version is running

fix output if there is no sites in the pairwise analysis

Problem with guess in emStep: -nan

Hi, I've seen this issue come up several times but none of the current responses have seemed relevant to fixing mine. I'm running from a angsd output on one chromosome across 548 individuals.

runing ngsrelate
        -> Seed is: 1858380949
        -> Frequency file: '.../relatedness_scaf8_autosomes_freq' contain 206165 number of sites
        -> nind:548 overall_number_of_sites:206165
        -> Done reading data from file: 12.78 13.00
        -> Starting analysis now
        -> length of joblist:149878
        ->Sites with both 129 and 268 having data: 0
Problem with guess in emStep:  -nan  -nan  -nan  -nan  -nan  -nan  -nan  -nan  -nan

My command is
ngsRelate -g ${angsd_dir}/${prefix}.rad.${loc}_${prefix}.glf.gz -f $angsd_dir/${prefix}_${loc}_freq -p $nt -n 548 -O $angsd_dir/${prefix}_${loc}_newres

Please let me know what other information you require as I am unsure where to start, or where to send my freq and glf file, as it is confidential.

Thank you so much!

angsd / ngsrelate Goto Github PK

ngsrelate's Introduction

Installation:

Notes

Program has a paper

ngsrelate's People

Contributors

Stargazers

Watchers

Forkers

ngsrelate's Issues

Recommend Projects

Recommend Topics

Recommend Org