Code Monkey home page Code Monkey logo

spip's People

Contributors

raphaelleman avatar thibautlavole avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spip's Issues

GenomeAssenbly

Hi,

There is a spelling error in GenomeAssenbly it should be GenomeAssembly.

Bug with a "big" allele

Hello,

we encountered a bug while running SPIP on a ES VCF.

The error was not really explicit :

2023-03-27 14:44:29 Score Calculation...
Erreur dans data.frame(..., check.names = FALSE) :
les arguments impliquent des nombres de lignes différents : 2429, 2430
Appels : cbind -> cbind -> data.frame
De plus : Il y a eu 50 avis ou plus (utilisez warnings() pour voir les 50 premiers)
Exécution arrêtée
spip exit code : 1

Sorry for the french R messages, but I know you can read it ;)
We had some troubles to identify why it stopped like that. But we did identified the VCF entry which caused the issue :

the original VCF line was :

chr6 32517766 rs143960829 G A,GTGTTGTTTTCAGACATGGCTCTACTAACAGCTTCTTTCCCCCTCTTTCAGGGACTCAGATGAAAGCAGTACAGGAAGAAGAAAAACAAGTTGCTAAGTCTCCCTGAGCCAATACTACTGCAGAATCCTGCAGAGACACAA 22250.88 SnpCluster AC=2,1;AF=0.250,0.125;AN=8;BATCH_AC=12,1;BATCH_AF=0.261,0.022;BATCH_AN=46;BATCH_GTC=1/2:1|0/0:15|1/1:4|0/1:3;BATCH_GTN=23;BATCH_VARC=8,1;BaseQRankSum=0.722;CTRL_AC=36,0;CTRL_AF=0.180,0.000;CTRL_AN=200;CTRL_GTC=1/1:13|0/0:77|0/1:10;CTRL_GTN=100;CTRL_VARC=23,0;ClippingRankSum=0.00;DB;DP=100;ExcessHet=0.0002;FS=1.448;InbreedingCoeff=0.4592;MQ=43.30;MQRankSum=-1.000e+00;QD=23.94;ReadPosRankSum=0.927;SOR=0.847;ANN=A|intron_variant|MODIFIER|HLA-DRB5|HLA-DRB5|transcript|NM_002125|protein_coding|5/5|c.788-14C>T||||||,GTGTTGTTTTCAGACATGGCTCTACTAACAGCTTCTTTCCCCCTCTTTCAGGGACTCAGATGAAAGCAGTACAGGAAGAAGAAAAACAAGTTGCTAAGTCTCCCTGAGCCAATACTACTGCAGAATCCTGCAGAGACACAA|intron_variant|MODIFIER|HLA-DRB5|HLA-DRB5|transcript|NM_002125|protein_coding|5/5|c.788-15_788-14insTTGTGTCTCTGCAGGATTCTGCAGTAGTATTGGCTCAGGGAGACTTAGCAACTTGTTTTTCTTCTTCCTGTACTGCTTTCATCTGAGTCCCTGAAAGAGGGGGAAAGAAGCTGTTAGTAGAGCCATGTCTGAAAACAACA||||||;gnomADex_AC=.,.;gnomADex_AN=.;gnomADex_nhomalt=.,.;gnomADge_AC=823,.;gnomADge_AN=36652;gnomADge_nhomalt=2,. GT:AD:DP:GQ:PGT:PID:PL 1/2:0,29,0:36:99:.:.:2265,879,780,1195,0,1115 0/1:5,25,0:30:99:0|1:32517690_A_C:1077,0,132,1092,210,1302 0/0:17,17,17:34:.:.:.:0,0,0,0,0,0 0/0:0,0,0:0:.:.:.:0,0,0,0,0,0

A big one for sure, so we simplified it in a version which still didn't work, and is repeatable and readable :

chr6 32517766 rs143960829 G A,GTGTTGTTTTCAGACATGGCTCTACTAACAGCTTCTTTCCCCCTCTTTCAGGGACTCAGATGAAAGCAGTACAGGAAGAAGAAAAACAAGTTGCTAAGTCTCCCTGAGCCAATACTACTGCAGAATCCTGCAGAGACACAA 22250.88 SnpCluster AC=2,1;AF=0.250,0.125 GT 1/2 0/1 0/0 0/0

After some tests, we tryed to modify manually (just for the tests) the size of the second alt allele obtaining :

chr6 32517766 rs143960829 G A,GTGTTGTTTTCAGACATGGCTCTACTAACAGCTTCTTTCCCCCTCTTTCAGGGACTCAGATGAAAGCAGTACAGGAAGAAGAAAAACAAGTTGCTAAGTCTCCCTGAGCCAATACTACTGCAGAATCCTGCAGAGACA 22250.88 SnpCluster AC=2,1;AF=0.250,0.125 GT 1/2 0/1 0/0 0/0

And this finally worked, with a 3 bases difference.

So is there a defined limit in the alt allele size ? Is there also an issue with the multiallelic site ?
If the limit is known, maybe it could be nice to perform a check of the alt allele length before running the process and crashing the whole job ?

Thanks for your work.

variant not scored

Hello,

I'm trying to use SPiP, and I'm wondering why there are some variants not scored even if a transcript is found by the software :

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CKD54 varID Interpretation InterConfident chr strand gNomen varType ntChange ExonInfo transcript gene NearestSS DistSS RegType seqPhysio seqMutated SPiCEproba SPiCEinter_2thr deltaMES mutInPBarea deltaESRscore posCryptMut sstypeCryptMut probaCryptMut classProbaCryptMut nearestSStoCrypt nearestPosSStoCrypt nearestDistSStoCrypt posCryptWT probaCryptWT classProbaCryptWT posSSPhysio probaSSPhysio classProbaSSPhysio probaSSPhysioMut classProbaSSPhysioMut
chr1 13656 . CAG C . . . . 0/0 NR_046018:g.13656_13658:delinsC NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Thank you for your help !
Fabienne

Probleme dans la v2 avec soumission sous format vcf pour certain variants

Erreur dans la v2 pour certains variants soumis sous format vcf. Certains fonctionnent parfaitement, d'autre pas.
ma ligne de commande est :
Rscript /ngs/programs/SPiP/SPiPv2.0_main.r -I test.hg19.txt -O ./output_test_txt.v2.txt --GenomeAssenbly hg19 --transcriptome /ngs/bio-databases/SPiP/transcriptome_hg19.RData

input vcf :
#CHROM POS ID REF ALT QUAL FILTER INFO
chr7 44298527 NM_172079:c.221-2A>G T C . . .

Sortie SPiP v2 :
CHROM POS ID REF ALT QUAL FILTER INFO varID Interpretation InterConfident SPiPscore strand gNomen varType ntChange ExonInfo exonSize transcript gene NearestSS DistSS RegType SPiCEproba SPiCEinter_2thr deltaMES BP mutInPBarea deltaESRscore posCryptMut sstypeCryptMut probaCryptMut classProbaCryptMut nearestSStoCrypt nearestPosSStoCrypt nearestDistSStoCrypt posCryptWT probaCryptWT classProbaCryptWT posSSPhysio probaSSPhysio classProbaSSPhysio probaSSPhysioMut classProbaSSPhysioMut
chr7 44298527 NM_172079:c.221-2A>G T C . . . no sequence:44298527 NA -1 -1 NA NA NA NA NA 0 NA NA NA -1 NA NA NA -1 NA NA -1 NA NA NA NA NA NA NA NA NA NA NA 0 NA NA NA

Sortie SPiP v1.1 :
#CHROM POS ID REF ALT QUAL FILTER INFO varID Interpretation InterConfident chr strand gNomen varType ntChange ExonInfo transcript gene NearestSS DistSS RegType seqPhysio seqMutated SPiCEproba SPiCEinter_2thr deltaMES mutInPBarea deltaESRscore posCryptMut sstypeCryptMut probaCryptMut classProbaCryptMut nearestSStoCrypt nearestPosSStoCrypt nearestDistSStoCrypt posCryptWT probaCryptWT classProbaCryptWT posSSPhysio probaSSPhysio classProbaSSPhysio probaSSPhysioMut classProbaSSPhysioMut
chr7 44298527 NM_172079:c.221-2A>G T C . . .1 NM_001220:g.44298527:A>G Alter by SPiCE 98.67 % [96.17 % - 99.55 %] chr7 - 44298527 substitution A>G Intron 3 (4078) NM_001220 CAMK2B acceptor -2 IntronCons CTCTGGCCTGCCCACCCGGGCCCTACCCAGCCCGATCCTCTGGGCAGCCCTAGGGCTTACACCGCTGGTGGTGTGCCTGGACAGGTACGGAGGCAGGCAGGAGGTGTGTCCTGGGCTATTGCCAGCCCCTCATGGCTTCTCTGTCCCCACAGTGCGTCTCCACGACAGCATCTCCGAGGAGGGCTTCCACTACCTGGTCTTCGATCTGTAAGTTCCAGAGCTGGGGACTCTCGCTGCACTCACTCCCAGCCTTGGCTCAGGGTGGGATCTGCAGCCTCCCCAGCCCCAGGGAATAGTCCCT CTCTGGCCTGCCCACCCGGGCCCTACCCAGCCCGATCCTCTGGGCAGCCCTAGGGCTTACACCGCTGGTGGTGTGCCTGGACAGGTACGGAGGCAGGCAGGAGGTGTGTCCTGGGCTATTGCCAGCCCCTCATGGCTTCTCTGTCCCCACGGTGCGTCTCCACGACAGCATCTCCGAGGAGGGCTTCCACTACCTGGTCTTCGATCTGTAAGTTCCAGAGCTGGGGACTCTCGCTGCACTCACTCCCAGCCTTGGCTCAGGGTGGGATCTGCAGCCTCCCCAGCCCCAGGGAATAGTCCCT 0.99979 high 0 No NA 0 No site 0 No No site 0 0 0 0 No 0 0 No 0 No

Sortie SPiP v1.3 :
varID Interpretation InterConfident SPiPscore chr strand gNomen varType ntChange ExonInfo exonSize transcript gene NearestSS DistSS RegType seqPhysio seqMutated SPiCEproba SPiCEinter_2thr deltaMES BP mutInPBarea deltaESRscore posCryptMut sstypeCryptMut probaCryptMut classProbaCryptMut nearestSStoCrypt nearestPosSStoCrypt nearestDistSStoCrypt posCryptWT probaCryptWT classProbaCryptWT posSSPhysio probaSSPhysio classProbaSSPhysio probaSSPhysioMut classProbaSSPhysioMut
NM_001220:g.44298527:A>G Alter by SPiCE 98.54 % [94.83 % - 99.82 %] 0.860 chr7 - 44298527 substitution A>G Intron 3 4078 NM_001220 CAMK2B acceptor -2 IntronCons CTCTGGCCTGCCCACCCGGGCCCTACCCAGCCCGATCCTCTGGGCAGCCCTAGGGCTTACACCGCTGGTGGTGTGCCTGGACAGGTACGGAGGCAGGCAGGAGGTGTGTCCTGGGCTATTGCCAGCCCCTCATGGCTTCTCTGTCCCCACAGTGCGTCTCCACGACAGCATCTCCGAGGAGGGCTTCCACTACCTGGTCTTCGATCTGTAAGTTCCAGAGCTGGGGACTCTCGCTGCACTCACTCCCAGCCTTGGCTCAGGGTGGGATCTGCAGCCTCCCCAGCCCCAGGGAATAGTCCCT CTCTGGCCTGCCCACCCGGGCCCTACCCAGCCCGATCCTCTGGGCAGCCCTAGGGCTTACACCGCTGGTGGTGTGCCTGGACAGGTACGGAGGCAGGCAGGAGGTGTGTCCTGGGCTATTGCCAGCCCCTCATGGCTTCTCTGTCCCCACGGTGCGTCTCCACGACAGCATCTCCGAGGAGGGCTTCCACTACCTGGTCTTCGATCTGTAAGTTCCAGAGCTGGGGACTCTCGCTGCACTCACTCCCAGCCTTGGCTCAGGGTGGGATCTGCAGCCTCCCCAGCCCCAGGGAATAGTCCCT 0.99979 high 0 0 No 10 0 No site 0 No No site 0 0 0 0 No 0 0 No 0 No

Si le même variant est soumis sous format txt en v2 :
input txt
gene varID
CAMK2B NM_172079:c.221-2A>G

sortie SPiP v2 :
gene varID Interpretation InterConfident SPiPscore strand gNomen varType ntChange ExonInfo exonSize transcript gene NearestSS DistSS RegType SPiCEproba SPiCEinter_2thr deltaMES BP mutInPBarea deltaESRscore posCryptMut sstypeCryptMut probaCryptMut classProbaCryptMut nearestSStoCrypt nearestPosSStoCrypt nearestDistSStoCrypt posCryptWT probaCryptWT classProbaCryptWT posSSPhysio probaSSPhysio classProbaSSPhysio probaSSPhysioMut classProbaSSPhysioMut
CAMK2B NM_172079:c.221-2A>G Alter by SPiCE 98.41 % [91.47 % - 99.96 %] 1 - 44298527 substitution A>G Intron 3 4078 NM_172079 CAMK2B acceptor -2 IntronCons 0.99979 high 0 0 No 10 0 No site 0 No No site 0 0 0 0 No 0 0 No 0 No

Merci du retour.

INFO field reset by SPIP

In the last version (1.0), if we use an VCF input file, the INFO column is reset, and we lose all previous INFO fields. Only SPIP results are printed.

Is it possible to keep these informations (INFO column), and append SPIP infos (like in v0.5) ?

Add an option to resume an ongoing annotation

Dear Raphael,

Is it possible to implement a way to resume an ongoing annotation? Indeed, when annotating large VCF files, we sometimes have to stop the process and restart it after, and it would be great not to have to restart it from scratch, looking at the output for where we stopped (for both .txt and .vcf outputs).

Could you please implement that?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.