Comments (10)
Not a bad idea to prune, but in my experience doesn't really matter
unless your SNP density is very high (>500K SNPs) for example.
Nick
On Tue, Aug 30, 2016 at 12:17 AM, biozzq [email protected] wrote:
Hi,
My SNPs data came from whole genome re-sequencing data. Before running
D-test and F4ratioTest, do I need prune SNPs with high LD ?Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#7, or mute the thread
https://github.com/notifications/unsubscribe-auth/AQn_hydZb8382OvSQChfOnL7nhrtEBi6ks5qk67WgaJpZM4JwIW2
.
from admixtools.
Hi Nick,
Thanks. My SNPs data contains more than 40M sites, but D-test
just used about 15M sites (reported by #_of_SNPs_that_all_populations_have_data
) to calculate D-stat
and Z
, I just wonder the detailed filtering strategies deposited in qpDstat
.
Best
from admixtools.
You definitely need to prune. The results will improve and the program run
much faster.
In my experience on human data you get diminishing returns over about 100K
snps. I would aim for 500K
which is more than enough. convertf has some pruning parameters, as does
Plink.
Nick
On Tue, Aug 30, 2016 at 10:12 AM, biozzq [email protected] wrote:
Hi Nick,
Thanks. My SNPs data contains more than 40M sites, but D-test just used
about 15M sites (reported by #_of_SNPs_that_all_populations_have_data) to
calculate D-stat and Z, I just wonder the detailed filtering strategies
deposited in qpDstat.Best
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#7 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQn_h_rQi2zxP14ouV5gOeXa7zbnv45gks5qlDplgaJpZM4JwIW2
.
from admixtools.
Hi Nick
Many thanks. Yes, convertf
has many parameters to filter SNPs data. For me, I have removed non bi-allelic markers as well as markers with more than 10% missing data or minor allele smaller than 0.05 , which resulted in the input file for qpDstat
(Ps, I will prune my data for LD using plink later according to your suggestion) . The input file for qpDstat
contains more than 40M markers, but just about 16M markers (in #_of_SNPs_that_all_populations_have_data
column) have been taken into account in calculation. So I am very curious about why so many markers have been removed (about 24M).
In addition, I want to know whether the parameters deposited in convertf
play a role in qpDstat
, such as numchrom
, maxmissfracsnp
and maxmissfracind
etc. If these parameters fit for qpDstat
, can I just specify them in the parfile when run qpDstat
?
Further, whether the default maxmissfracsnp
means one individual or one percent.
Best
from admixtools.
Hi Nick,
I found the parameters deposited in convertf
also work in qpDstat
. If the maxmissfracsnp
means one individual, I think this threshold will remove many markers more than 24M, so I think it means one percent. I hope you can help me to use admixtools correctly.
Best
from admixtools.
I recommend that you use convertf to make a file with far
fewer SNPs.
decimate: 40 will retain 1/40 of the snps and might be a good start.
I much prefer to run ADMIXTOOLS without fancy options, so I know
what I am getting (and the programs will run much faster on smaller
datasets.
N
On Thu, Sep 1, 2016 at 12:09 AM, biozzq [email protected] wrote:
Hi Nick,
I found the parameters deposited in convertf also work in qpDstat. If the
maxmissfracsnp means one individual, I think this threshold will remove
many markers more than 24M, so I think it means one percent. I hope you can
help me to use admixtools correctly.Best
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#7 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQn_h_iW6J393IEHW1UPbXxjB9OpX20jks5qllACgaJpZM4JwIW2
.
from admixtools.
Hi @bumblenick
I also have a try with convertf
to filter my data, but no matter which value specified for maxmissfracsnp
, such as 0, 1, or 1000, it just give me the same results. So I want to know how can I know the parameters deposited in convertf
do make sense.
Many thanks.
from admixtools.
Hi @bumblenick
I am still confused with the parameters deposited in convertf
. I have a test run and the results did not have any changes.
the par file,
indivname: test.pedind
snpname: test.pedsnp
genotypename: test.bed
outputformat: PACKEDANCESTRYMAP
genooutfilename: example.packedancestrymapgeno
snpoutfilename: example.snp
indoutfilename: example.ind
numchrom: 20
maxmissfracsnp: 1
zerodistance: YES
linux command tail test.pedsnp
29 29:52623758 0 52623758 A G
29 29:52623804 0 52623804 T C
29 29:52623846 0 52623846 C T
29 29:52624636 0 52624636 G T
29 29:52624674 0 52624674 A G
29 29:52624726 0 52624726 G C
29 29:52624739 0 52624739 A G
29 29:52624827 0 52624827 G A
29 29:52624969 0 52624969 C T
29 29:52625377 0 52625377 A G
linux command tail example.snp
29:52623758 29 0 52623758 A G
29:52623804 29 0 52623804 T C
29:52623846 29 0 52623846 C T
29:52624636 29 0 52624636 G T
29:52624674 29 0 52624674 A G
29:52624726 29 0 52624726 G C
29:52624739 29 0 52624739 A G
29:52624827 29 0 52624827 G A
29:52624969 29 0 52624969 C T
29:52625377 29 0 52625377 A G
From above, I found the numchrom
did not remove any markers.
I need your help!
from admixtools.
Hi @bumblenick,
Can you give me some suggestions. I am still unsure about these parameters.
Thanks.
from admixtools.
- numchrom has nothing to do with removing markers, but is for non-human
organisms
(specifying number of autosomes) - simplest way of removing markers in convertf decimate: 80
(40M markers -> ~500K - also badsnpname: lets you specify snps you don't want.
- And if you don't like convertf parameters, plink has sophisticated
pruning facilities
too.
Nick
On Mon, Sep 12, 2016 at 6:02 AM, biozzq [email protected] wrote:
Hi @bumblenick https://github.com/bumblenick,
Can you give me some suggestions. I am still unsure about these parameters.
Thanks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQn_h-BhH_SPMjNV0OfFBMEWFjlvqfjIks5qpSMvgaJpZM4JwIW2
.
from admixtools.
Related Issues (20)
- Warning while using mergeit
- examples/qpWave.log is missing HOT 1
- Convertf error "fatalx: no valid samples!" HOT 1
- Using f3 to test gene flow from ghost species HOT 3
- Negative outgroup f3 statistics
- zsh: segmentation fault (core dumped) HOT 1
- qpadm with single sample - is it possible to run? HOT 2
- Issue with directories with spaces during install
- qpF4Ration
- warning: bad chrom HOT 2
- Compilation issues on the latest macOS Monterey (M1 Mac) HOT 1
- qpAdm: command not found
- qpWave - Segmentation fault (core dumped) HOT 9
- qpDstat Segmentation fault (core dumped)
- What‘s mean the "best" in the result from the qpDstat?
- something about qpGraph
- qpAdm - "pop: ??? has sample size 1 and inbreed set" error message HOT 2
- Not enough RAM for qpfstats? HOT 1
- only 71 lines and truncated
- Nothing happens when using convertf for PACKEDPED to PACKEDANCESTRYMAP HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from admixtools.