Code Monkey home page Code Monkey logo

Comments (10)

bumblenick avatar bumblenick commented on July 20, 2024

Not a bad idea to prune, but in my experience doesn't really matter
unless your SNP density is very high (>500K SNPs) for example.

Nick

On Tue, Aug 30, 2016 at 12:17 AM, biozzq [email protected] wrote:

Hi,

My SNPs data came from whole genome re-sequencing data. Before running
D-test and F4ratioTest, do I need prune SNPs with high LD ?

Thanks!


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#7, or mute the thread
https://github.com/notifications/unsubscribe-auth/AQn_hydZb8382OvSQChfOnL7nhrtEBi6ks5qk67WgaJpZM4JwIW2
.

from admixtools.

biozzq avatar biozzq commented on July 20, 2024

Hi Nick,

Thanks. My SNPs data contains more than 40M sites, but D-test just used about 15M sites (reported by #_of_SNPs_that_all_populations_have_data) to calculate D-stat and Z, I just wonder the detailed filtering strategies deposited in qpDstat.

Best

from admixtools.

bumblenick avatar bumblenick commented on July 20, 2024

You definitely need to prune. The results will improve and the program run
much faster.
In my experience on human data you get diminishing returns over about 100K
snps. I would aim for 500K
which is more than enough. convertf has some pruning parameters, as does
Plink.

Nick

On Tue, Aug 30, 2016 at 10:12 AM, biozzq [email protected] wrote:

Hi Nick,

Thanks. My SNPs data contains more than 40M sites, but D-test just used
about 15M sites (reported by #_of_SNPs_that_all_populations_have_data) to
calculate D-stat and Z, I just wonder the detailed filtering strategies
deposited in qpDstat.

Best


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#7 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQn_h_rQi2zxP14ouV5gOeXa7zbnv45gks5qlDplgaJpZM4JwIW2
.

from admixtools.

biozzq avatar biozzq commented on July 20, 2024

Hi Nick

Many thanks. Yes, convertf has many parameters to filter SNPs data. For me, I have removed non bi-allelic markers as well as markers with more than 10% missing data or minor allele smaller than 0.05 , which resulted in the input file for qpDstat (Ps, I will prune my data for LD using plink later according to your suggestion) . The input file for qpDstat contains more than 40M markers, but just about 16M markers (in #_of_SNPs_that_all_populations_have_data column) have been taken into account in calculation. So I am very curious about why so many markers have been removed (about 24M).

In addition, I want to know whether the parameters deposited in convertf play a role in qpDstat, such as numchrom, maxmissfracsnp and maxmissfracind etc. If these parameters fit for qpDstat, can I just specify them in the parfile when run qpDstat?

Further, whether the default maxmissfracsnp means one individual or one percent.

Best

from admixtools.

biozzq avatar biozzq commented on July 20, 2024

Hi Nick,

I found the parameters deposited in convertf also work in qpDstat. If the maxmissfracsnp means one individual, I think this threshold will remove many markers more than 24M, so I think it means one percent. I hope you can help me to use admixtools correctly.

Best

from admixtools.

bumblenick avatar bumblenick commented on July 20, 2024

I recommend that you use convertf to make a file with far
fewer SNPs.
decimate: 40 will retain 1/40 of the snps and might be a good start.

I much prefer to run ADMIXTOOLS without fancy options, so I know
what I am getting (and the programs will run much faster on smaller
datasets.
N

On Thu, Sep 1, 2016 at 12:09 AM, biozzq [email protected] wrote:

Hi Nick,

I found the parameters deposited in convertf also work in qpDstat. If the
maxmissfracsnp means one individual, I think this threshold will remove
many markers more than 24M, so I think it means one percent. I hope you can
help me to use admixtools correctly.

Best


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#7 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQn_h_iW6J393IEHW1UPbXxjB9OpX20jks5qllACgaJpZM4JwIW2
.

from admixtools.

biozzq avatar biozzq commented on July 20, 2024

Hi @bumblenick

I also have a try with convertf to filter my data, but no matter which value specified for maxmissfracsnp, such as 0, 1, or 1000, it just give me the same results. So I want to know how can I know the parameters deposited in convertf do make sense.

Many thanks.

from admixtools.

biozzq avatar biozzq commented on July 20, 2024

Hi @bumblenick

I am still confused with the parameters deposited in convertf. I have a test run and the results did not have any changes.
the par file,
indivname: test.pedind
snpname: test.pedsnp
genotypename: test.bed
outputformat: PACKEDANCESTRYMAP
genooutfilename: example.packedancestrymapgeno
snpoutfilename: example.snp
indoutfilename: example.ind
numchrom: 20
maxmissfracsnp: 1
zerodistance: YES

linux command tail test.pedsnp
29 29:52623758 0 52623758 A G
29 29:52623804 0 52623804 T C
29 29:52623846 0 52623846 C T
29 29:52624636 0 52624636 G T
29 29:52624674 0 52624674 A G
29 29:52624726 0 52624726 G C
29 29:52624739 0 52624739 A G
29 29:52624827 0 52624827 G A
29 29:52624969 0 52624969 C T
29 29:52625377 0 52625377 A G

linux command tail example.snp
29:52623758 29 0 52623758 A G
29:52623804 29 0 52623804 T C
29:52623846 29 0 52623846 C T
29:52624636 29 0 52624636 G T
29:52624674 29 0 52624674 A G
29:52624726 29 0 52624726 G C
29:52624739 29 0 52624739 A G
29:52624827 29 0 52624827 G A
29:52624969 29 0 52624969 C T
29:52625377 29 0 52625377 A G

From above, I found the numchrom did not remove any markers.

I need your help!

from admixtools.

biozzq avatar biozzq commented on July 20, 2024

Hi @bumblenick,

Can you give me some suggestions. I am still unsure about these parameters.

Thanks.

from admixtools.

bumblenick avatar bumblenick commented on July 20, 2024
  1. numchrom has nothing to do with removing markers, but is for non-human
    organisms
    (specifying number of autosomes)
  2. simplest way of removing markers in convertf decimate: 80
    (40M markers -> ~500K
  3. also badsnpname: lets you specify snps you don't want.
  4. And if you don't like convertf parameters, plink has sophisticated
    pruning facilities
    too.

Nick

On Mon, Sep 12, 2016 at 6:02 AM, biozzq [email protected] wrote:

Hi @bumblenick https://github.com/bumblenick,

Can you give me some suggestions. I am still unsure about these parameters.

Thanks.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQn_h-BhH_SPMjNV0OfFBMEWFjlvqfjIks5qpSMvgaJpZM4JwIW2
.

from admixtools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.