Code Monkey home page Code Monkey logo

sofos's Introduction

SoFoS

Rescale and smooth genetic polymorphism data to match a common sample size.

CircleCI codecov

Usage

sofos [OPTION]... [FILE] > [OUTPUT]

With no FILE or when FILE is -, read standard input.

Input format is a VCF/BCF file with genotypes.

Argument Description
-a number -b number shape parameters of beta prior
-n integer number of gene copies in posterior resample
-f -u generated (f)olded or (u)nfolded distributions
-t -r use AA (t)ag or (r)eference allele as ancestral
-e number probability of ancestral allele misassignment
-p [or] -pp use GP tag to estimate allele frequencies
-z number add extra invariant sites to manage ascertainment bias
-P number average ploidy of samples (used with -z)
-q -v (q)uiet progress info or be (v)erbose
-h print usage information

Defaults: sofos -u -a 1.0 -b 1.0 -n 10 -P 2

Notes

  • Unless otherwise stated -f enables -r and -u enables -t.
  • -p specifies that GP contains probabilities in the range 0 and 1.
  • -pp specifies that GP contains phred-scaled probabilities.
  • -e is only used for generating unfolded spectra.
  • -n is the resampled population size

sofos's People

Contributors

reedacartwright avatar jporton avatar jgarciamesa avatar rossibarra avatar

Stargazers

ONT_HiFi_HiC avatar Dr. K. D. Murray avatar Shujun Ou avatar  avatar YuFeng avatar  avatar Li Wang avatar Wenbin Mei avatar Brock A Harpur avatar Jia-Xing Yue avatar Moi avatar Jinliang Yang avatar Inti Pedroso avatar Jiawen Geng avatar peterdfields avatar Tyler Kent avatar

Watchers

 avatar James Cloos avatar  avatar

sofos's Issues

multiallelic data

Hello,

I am trying to apply SoFoS on triallelic data and calculate folded SFS on a designated allele. I used the v2.0 on such data and encounter an error:

#SoFoS v2.0
#date=2021-12-15T17:41:07-0600
#epoch=1639611667475
#path=synLTR.triAllelic.vcf
#alpha=1
#beta=1
#size=10
#folded=0
#refalt=1
[E::bcf_calc_ac] Incorrect allele ("2") in Ki11 at 1:1

If all loci are triallelic homozygous, i.e., 0/0 (ref), 1/1, and 2/2, and I want to estimate folded SFS on allele 1, is there a way to do so? Is it equivalent to converting 2s to 0s, and estimating the folded SFS for 0s and 1s?

Thanks,
Shujun

Ascertainment Bias

The assumption is that non-variant sites are included in the input. Does a beta prior still fit a situation when the user only provides sites with polymorphic data?

Multidimensional SFS with SoFos

Hi,
I was wondering if SoFos can produce multidimensional SFS and what data could it require. Can you give me some suggestions?

Very many thanks.

The meaning of the -n parameter

Hello,

Thank you for developing SoFoS which is very helpful to control population size especially for small samples!

I just want to make a note here that the -n parameter also means the resampled population size (especially true for diploids). For example, if pop 1 has 37 samples and pop 2 has 100 samples, setting n = 30 controls both populations to have a resampled size of 30. It took me some readings including all posted issues, a manuscript draft, and some twitter discussions to come to realize the meaning of -n. Hopefully, this note will help others.

Best,
Shujun

CirclCI

  • Add .circleci/config.yml to repo.

-r parameter is not taken when "AC=" is presented in vcf INFO

Hello @jporton,

I used -r to specify using the REF as the ancestral state, and found SoFoS was not taking this parameter for some occasions. This seems to be related to the INFO column of the vcf file where there is AC (allele count) information. When removing it or replacing the AC information, it takes the -r parameter again.

This test.vcf.zip can reproduce the issue:

sofos -n 10 -u -r -a 1.0 -b 1.0 test.vcf 
#SoFoS v2.0
#date=2021-12-18T16:33:38-0600
#epoch=1639866818307
#path=test.vcf
#alpha=1
#beta=1
#size=10
#folded=0
#refalt=1
#
## [0.0006s elapsed] 60 sites processed
Number,Prior,Observed,Posterior
0,5.4545454545454959,0,0.69853641197197858
1,5.4545454545454959,1,1.3048755954406486
2,5.4545454545454959,1,1.8488491162524154
3,5.4545454545454959,1,2.4702060322842092
4,5.4545454545454959,0,3.2635682519285871
5,5.4545454545454959,1,4.2688676722197485
6,5.4545454545454959,0,5.4985717806044025
7,5.4545454545454959,0,6.9696393737318694
8,5.4545454545454959,0,8.7375634335076597
9,5.4545454545454959,0,10.95262220018968
10,5.4545454545454959,56,13.986700131869155

sed 's/AC=/AA=/' test.vcf | sofos -n 10 -u -r -a 1.0 -b 1.0 - 
#SoFoS v2.0
#date=2021-12-18T16:34:02-0600
#epoch=1639866842418
#path=-
#alpha=1
#beta=1
#size=10
#folded=0
#refalt=1
#
[W::vcf_parse] INFO 'AA' is not defined in the header, assuming Type=String
## [0.0006s elapsed] 60 sites processed
Number,Prior,Observed,Posterior
0,5.4545454545455394,0,16.557384357674806
1,5.4545454545455394,55,18.025665060544522
2,5.4545454545455394,2,12.323574756722911
3,5.4545454545455394,1,6.569310900606677
4,5.4545454545455394,0,2.9275361416968226
5,5.4545454545455394,0,1.1301463637682869
6,5.4545454545455394,0,0.41433400150816324
7,5.4545454545455394,0,0.22850857449687428
8,5.4545454545455394,0,0.30356657899338751
9,5.4545454545455394,1,0.54636205470466404
10,5.4545454545455394,1,0.97361120928371347

Any ideas? Thanks!

Shujun

How to install

Hi,
I download SoFos from Github, but I don't know how to install, can you give me some suggestions?

Thanks.

Missing Data

Should sites with 100% missing data be skipped or included in the posterior based only on prior?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.