Code Monkey home page Code Monkey logo

Comments (9)

ANGSD avatar ANGSD commented on June 26, 2024

default qscore filter is 13. You can see all default options in the .arg file that is generated at runtime. Let me know if this doesn't help.

from angsd.

mlucenaperez avatar mlucenaperez commented on June 26, 2024

Thank you very much for your quick answer.
I know that the default is 13, therefore, I was expecting that the .qs file starts in 13 in the first test, as it does in your example in the webpage (http://popgen.dk/angsd/index.php/Alleles_counts) and in 20 in the second one, as those are my threshold in each case. However, it seems to filter something, but I still have bases with 0, 1, 2 ... qscore, which is puzzling for me.

from angsd.

ANGSD avatar ANGSD commented on June 26, 2024

Ok got it. I hadn't added the minQ as filter in the doQsDist. I was thinking that if users wanted the qscore they didn't want any filtering. But I agree this is counter intuitive. the -minQ is implemented in the downstream analysis by setting the bases with qscore<minQ to N, in effect discarding them, so this would not have impacted anything but the doqsdist. Thanks for taking your time to report this. I'm closing this issue, feel free to reopen if needed.

from angsd.

mlucenaperez avatar mlucenaperez commented on June 26, 2024

Thank you very much, that is great, but I still have some doubts. The example above have exactly same parameters but the minQ filter, and it's working over the same files. Therefore, if I understood it correctly, the counts should be the same, but this is not the cases.

default Q filter:
qscore counts perc
0 238923500 0.000933898301529085
1 17250164 0.0010013253183805
...

minQ 20:
qscore counts perc
0 232658276 0.000909539560816795
1 17142450 0.000976555171490038
...

It seems that is filtering some positions. I have checked everything I could think of, but I can't get to an answer.
Any idea?
I am working with angsd version: 0.911-25-g3fbb94f (htslib: 1.3-64-g5285dc0) build(Apr 20 2016 15:29:14).

from angsd.

ANGSD avatar ANGSD commented on June 26, 2024

yes, something wasn't properly defined in the previous version. I tested on the latest version and here it gave consistent results:

./angsd -i ../smallBam/smallNA12761.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -docounts 1 -doqsdist 1 -minQ 0 -out q0
./angsd -i ../smallBam/smallNA12761.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -docounts 1 -doqsdist 1 -minQ 20 -out q20

paste <(tail -n32 q0.qs) q20.qs
19 63887 qscore counts
20 76837 20 76837
21 87523 21 87523
22 107979 22 107979
23 124504 23 124504
24 150261 24 150261
25 182710 25 182710
26 215713 26 215713
27 286981 27 286981
28 356611 28 356611
29 482008 29 482008
30 641792 30 641792
31 831939 31 831939
32 975929 32 975929
33 1004887 33 1004887
34 880805 34 880805
35 848191 35 848191
36 766193 36 766193
37 339670 37 339670
38 93653 38 93653
39 27678 39 27678
40 10538 40 10538
41 7040 41 7040
42 3547 42 3547
43 2614 43 2614
44 1164 44 1164
45 240 45 240
46 88 46 88
47 8 47 8
48 0 48 0
49 0 49 0
50 3 50 3

please try the newest version.

from angsd.

mlucenaperez avatar mlucenaperez commented on June 26, 2024

Sorry for the delay, I was running some other analysis (quite heavies) and I didn't want to collapse my server. Anyway, I still have the same problem:

The only difference between the test is minQ filter:

Test 4:
/opt/angsd/angsd -b my_bamlist.bamlist -ref myref_genome.fa -out test_4.qc -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -baq 1 -C 50 -minMapQ 20 -minQ 20 -minInd 10 -doQsDist 1 -doDepth 1 -doCounts 1

Test 5:
/opt/angsd/angsd -b my_bamlist.bamlist -ref myref_genome.fa -out test_5.qc -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -baq 1 -C 50 -minMapQ 20 -minInd 10 -doQsDist 1 -doDepth 1 -doCounts 1

tail -n18 test_4.qc >tail_test_4
tail -n18 test_5.qc >tail_test_5

However I get different counts...

paste tail_test_4 tail_test_5
20 870242210 20 871126635
21 1085098460 21 1086066241
22 1451075433 22 1451773493
23 2200543571 23 2201277526
24 2351279079 24 2351913369
25 2832592548 25 2832976237
26 4404551408 26 4405280569
27 7154630120 27 7155142714
28 13337770090 28 13338473918
29 24792360040 29 24793478649
30 40253999875 30 40255135801
31 56024030134 31 56025276812
32 52544597326 32 52545623732
33 27902517850 33 27902958736
34 9638071465 34 9638182002
35 2259010528 35 2259030756
36 253580791 36 253582723
37 22 37 22

Is there anything very obvious that I am misunderstanding?
Any idea?

Thanks in advance.

PD: the qs file starts where it should ;).

from angsd.

ANGSD avatar ANGSD commented on June 26, 2024

If you disable the -minInd filter then you should get the same.

On 25 Jul 2016, at 13:23, Maria Lucena Perez [email protected] wrote:

Sorry for the delay, I was running some other analysis (quite heavies) and I didn't want to collapse my server. Anyway, I still have the same problem:

The only difference between the test is minQ filter:

Test 4:
/opt/angsd/angsd -b my_bamlist.bamlist -ref myref_genome.fa -out test_4.qc -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -baq 1 -C 50 -minMapQ 20 -minQ 20 -minInd 10 -doQsDist 1 -doDepth 1 -doCounts 1

Test 5:
/opt/angsd/angsd -b my_bamlist.bamlist -ref myref_genome.fa -out test_5.qc -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -baq 1 -C 50 -minMapQ 20 -minInd 10 -doQsDist 1 -doDepth 1 -doCounts 1

tail -n18 test_4.qc >tail_test_4
tail -n18 test_5.qc >tail_test_5

However I get different counts...

paste tail_test_4 tail_test_5
20 870242210 20 871126635
21 1085098460 21 1086066241
22 1451075433 22 1451773493
23 2200543571 23 2201277526
24 2351279079 24 2351913369
25 2832592548 25 2832976237
26 4404551408 26 4405280569
27 7154630120 27 7155142714
28 13337770090 28 13338473918
29 24792360040 29 24793478649
30 40253999875 30 40255135801
31 56024030134 31 56025276812
32 52544597326 32 52545623732
33 27902517850 33 27902958736
34 9638071465 34 9638182002
35 2259010528 35 2259030756
36 253580791 36 253582723
37 22 37 22

Is there anything very obvious that I am misunderstanding?
Any idea?

Thanks in advance.

PD: the qs file starts where it should ;).


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub #43 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AGDo7gmdIoQBM2GH0VQSPsUV7rCT-j2dks5qZJzLgaJpZM4JJWEG.

from angsd.

mlucenaperez avatar mlucenaperez commented on June 26, 2024

Now I get it. Perfect, thank you very much.

paste tail_test_6 tail_test_7
20 872857218 20 872857218
21 1088228539 21 1088228539
22 1453975234 22 1453975234
23 2204108211 23 2204108211
24 2354610003 24 2354610003
25 2835526041 25 2835526041
26 4409761678 26 4409761678
27 7161122193 27 7161122193
28 13349012950 28 13349012950
29 24811315496 29 24811315496
30 40280518766 30 40280518766
31 56053569574 31 56053569574
32 52564309911 32 52564309911
33 27909956546 33 27909956546
34 9639996231 34 9639996231
35 2259368112 35 2259368112
36 253615271 36 253615271
37 35 37 35

from angsd.

ANGSD avatar ANGSD commented on June 26, 2024

super, im closing this issue, feel free to reopen if needed

from angsd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.