edg1983 / green-varan Goto Github PK

View Code? Open in Web Editor NEW

17.0 17.0 6.0 88.45 MB

Annotate non-coding regulatory vars using our GREEN-DB, prediction scores, conservation and pop AF

License: MIT License

Python 7.32% Shell 2.69% Nim 24.63% Nextflow 13.21% R 51.41% Dockerfile 0.74%

green-varan's People

Stargazers

Watchers

Forkers

ahmedarslan likuokuo snashraf jiayangzhou femolami

green-varan's Issues

Download index file

The file
https://zenodo.org/record/5636209/files/GREEN-DB_v2.5.db.gz.csi
does not exist on zenodo, so "nextflow workflow/download.nf " gives error:
HTTP request sent, awaiting response... 301 MOVED PERMANENTLY Location: /records/5636209/files/GREEN-DB_v2.5.db.gz.csi [following] --2023-12-13 12:41:37-- https://zenodo.org/records/5636209/files/GREEN-DB_v2.5.db.gz.csi Reusing existing connection to zenodo.org:443. HTTP request sent, awaiting response... 404 NOT FOUND 2023-12-13 12:41:37 ERROR 404: NOT FOUND.
How to workaround this?
Thanks

Annotation completed: 0 variants

Hello!
Thank you so much for the GREEN-VARAN. I think it will perfect for what I need.
I've ran the test script for the test data using:

python GREEN-VARAN.py -i test/VCF/GRCh38.test.snpEff.vcf.gz -o test/out/test_standard.vcf --AF_file /PATH/to/gnomad/gnomad.genomes.r3.0.sites.vcf.bgz -b GRCh38 -m annotate -s ReMM -s NCBoost -s LinSight --threads 4

It is completing the compilation with the final message "Annotation completed: 0 variants written in 0:00:00" and giving no clear error. Do you have any idea why that could be happening?
Thank you very much!

GRCh38 resources needed for GRCh37 annotation

Dear Eduardo,

I started a new issue, because I believe it is not related to our earlier discussion. I am now using the Green-varan workflow configuration. While analyzing data from build GRCh37, having the all scores and regions on, the program wants to annotate the vcf file using GRCh38 recources.

Can you advise? See the error message below.

N E X T F L O W ~ version 21.10.6
Launching `./GREEN-VARAN/workflow/main.nf` [lethal_majorana] - revision: 539422f1a2

GREEN-VARAN annotation - N F P I P E L I N E

input file : ./XX.vcf.gz
build : GRCh37
output : ./results
greenvaran config : ./GREEN-VARAN//config/prioritize_smallvars.json
greenvaran dbschema : ./GREEN-VARAN//config/greendb_schema_v2.5.json
resource folder : ./GREEN-VARAN/workflow/../resources

ACTIVE ANNOTATIONS:
Scores : all
Regions : all
AF : true

WARN: Nextflow version 21.10.6 does not match workflow required version: 20.10.0 -- Execution will continue, but things may break!
[- ] process > WRITE_SCORE_TOML -

[- ] process > WRITE_SCORE_TOML -
[- ] process > DOWNLOAD_REGION -
[- ] process > WRITE_REGION_TOML -
[- ] process > WRITE_AF_TOML -
[- ] process > concat_toml -
[- ] process > ANNOTATE:annotate_vcf -
[- ] process > ANNOTATE:index_vcf -
[- ] process > green_varan -
TAD not found at ./GREEN-VARAN/resources/GRCh37/GRCh38_TAD.bed.gz

In addition: A possible cause is that the GRCh37 TAD bed file was missing from my folder. There is an issue with downloading the file and they seem to be missing from the download server. In addition two other files could not be downloaded.
GRCh37_dbSuper | regions | https://zenodo.org/record/5705936/files/GGRCh37_dbSuper.bed.gz.csi
GRCh37_TAD | regions | https://zenodo.org/record/5705936/files/GRCh37_TAD.bed.gz
GRCh37_TAD | regions | https://zenodo.org/record/5705936/files/GRCh37_TAD.bed.gz.csi

SV is not annotated with some regions despite overlap

TL;DR

Why is the SV chr14_73,085,269_73,141,832_INV not annotated with the promoter 580535_pro (chr14_73,135,539_73,138,139) despite the promoter being fully within the SV region?

Thanks for this helpful tool!

Problem

I am having trouble understanding why a particular SV is not annotated with a particular region despite the SV region containing the regulatory region.

The SV I am trying to annotate is an inversion (chr14_73,085,269_73,141,832_INV).

chr14 | 73085269 | MantaINV:113901:0:2:0:0:0 | G | <INV> | 389 | PASS | END=73141832;SVTYPE=INV;SVLEN=56563;CIPOS=0,25;CIEND=-25,0;HOMLEN=25;HOMSEQ=GCCTCCCAAAGTGCTGGGATTACAG;INV3;SVCALLER=MANTA;INV5 | GT:FT:GQ:PL:PR:SR

I checked GREEN-DB, and the promoter 580535_pro (chr14_73,135,539_73,138,139) is fully contained within the inverted region.

580535_pro | chr14 | 73135539 | 73138139 | active_promoter,promoter | promoter | BENGI,DECRES,FOCS,ENCODE-HMM | deep_learning,HMM_prediction,roadmap | 0.129 | 0.620575 | PSEN1 | PSEN1 | 0 | PSEN1 | 0 | GM12878,HMEC,HUVEC,HelaS3,HepG2,K562,H1-hESC,HSMM,NHEK,NHLF | Sporadic,Bilateral tonic-clonic   seizure,Primitive reflex,Alexia,Mutism,Abulia,Abnormality of vision,Memory   impairment,Restlessness,Seizure,Language impairment,Cerebral cortical   atrophy,Inappropriate laughter,Dysphasia,Senile plaques,Abnormality of the cerebral   white matter,Visual agnosia,Neurofibrillary tangles,Abnormal lower motor   neuron morphology,Hypertonia,Parkinsonism,Abnormality of neutrophils,Brain   atrophy,Depressivity,Abnormal brain FDG positron emission   tomography,Grammar-specific speech disorder,Aphasia,Anomia,Dyslexia,Anxiety,Motor   aphasia,Hyperorality,Congestive heart failure,Lack of insight,Poor   speech,Elevated serum creatine kinase,Abnormality of extrapyramidal motor   function,Agnosia,Spoken Word Recognition Deficit,EEG with continuous slow   activity,Dysgraphia,Spastic tetraparesis,Alzheimer disease,EMG   abnormality,Frontotemporal dementia,Neuronal loss in central nervous   system,Restrictive behavior,Semantic dementia,Fasciculations,Thickened nuchal   skin fold,Frontotemporal cerebral atrophy,Echolalia,Inappropriate   behavior,Palmoplantar keratoderma,Astrocytosis,Babinski   sign,Hyperreflexia,Abnormal social behavior,Apraxia,Intellectual   disability,Personality changes,Agitation,Dilated   cardiomyopathy,Perifolliculitis,Optic ataxia,Irritability,Acne   inversa,Frontal lobe dementia,Temporal cortical atrophy,Adult   onset,Polyphagia,Gait   disturbance,Dementia,Heterogeneous,Myopathy,Confusion,Finger   agnosia,Lipoatrophy,Perseveration,Psychosis,Loss of speech,Upper motor neuron   dysfunction,Chronic furunculosis,Syncope,Dystonia,Amyotrophic lateral   sclerosis,Dyscalculia,Collectionism,Deposits immunoreactive to beta-amyloid   protein,Dysphagia,Disinhibition,Recurrent cutaneous abscess   formation,Myoclonus,Sensorineural hearing impairment,Hallucinations,Stereotypy,Oculomotor   apraxia,Ataxia,Emotional blunting,Inappropriate sexual behavior,Lower limb   hyperreflexia,Autosomal dominant inheritance,Apathy,Rapidly   progressive,Gliosis,Dysarthria,Aggressive behavior

However, the SV is not annotated with this promoter. There are some other regulatory regions contained in the inverted region (e.g., 145783_pro at chr14:73,137,653-73,137,980), but these are also not added to the annotations.

The SV is only annotated with 417710_enh (chr14:73,084,745-73,092,455).

Command

greenvaran sv \
-i input.vcf \
-o output.vcf \
-d GRCh38_GREEN-DB.bed.gz \
-s greendb_schema_v2.5.json \
-g gene_list.txt \
-p 1

confused with the annotated results

I was confused with the annotated results, here is the example which labled as greendb_level=3. But I don't know which three levels satisfy threshold. Was these three levels: MAF<1%; overlap with DNase andTFBS; greendb_constraint=0.722409>0.7. The ReMM=0.538 and ncER=85.5791 were not satisfy FDR50 threshold?
Thank you for your kindly help.

AC=1;AF=0.5;AN=2;BaseQRankSum=0.12;ClippingRankSum=-0;DP=9;ExcessHet=3.0103;FS=7.782;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=-0;QD=10.64;ReadPosRankSum=-0;SOR=3.611;VQSLOD=22.3209;culprit=MQ;FATHMM_MKLNC=0.1036;ReMM=0.538;ncER=85.5791;DNase;TFBS;gnomAD_AF=0.005553;gnomAD_AF_afr=0.0053394;gnomAD_AF_amr=0.0048544;gnomAD_AF_nfe=0.0046707;greendb_id=561490_pro,2_biv,505745_pro;greendb_stdtype=bivalent,promoter;greendb_dbsource=FOCS,SegWey,ENCODE-HMM,BENGI,EnsemblRegBuild,FANTOM5;greendb_genes=RBP7;greendb_constraint=0.722409;greendb_level=3;ANN=C|promoter|MODIFIER|RBP7||||||||||||,C|bivalent|MODIFIER|RBP7||||||||||||

Tissue annotation

Dear Edoardo,

How we can add and use tissue expression data in annotated VCF file?

Regards,
Najeeb

No 'chr' in front of chromosome names in vcf

Hi,

Thank you for providing such a wonderful tool, combining so many tools to help in prioritizing non-coding variants.
I am trying to run the GREEN-VARAN on a GRCh37 dataset. However, our reference genome labels the chromosomes as 1, 2, 3, etc.
GREEN-VARAN seems to expect chr1, chr2, chr3, etc. Is there an option to change this? I found the utils.nim file listing STDCHROMS, but changing the list there didn't change anything.

In addition, is there also a GRCh37 test vcf available?

Regards,
Lennart Johansson

workflow green_varan failed

Dear Edoardo,
I want to use GREEN-VARAN workflow to prioritize variants.
I tried to GREEN-VARAN work flow on test vcf file "GRCh38.test.smallvars.tmp.vcf.gz".

/home/moon9319/nextflow /home/moon9319/GREEN-VARAN/workflow/main.nf
-profile local
--input $DataPath$VCF_FILE
--build GRCh38
--out /home/moon9319/SNV/02.WORKFLOW/
--scores best
--regions best
--AF
--greenvaran_config /home/moon9319/GREEN-VARAN/config/prioritize_smallvars.json
--greenvaran_dbschema /home/moon9319/GREEN-VARAN/config/greendb_schema_v2.5.json

############################

executor > local (11)
[99/1943af] process > WRITE_SCORE_TOML (2) [100%] 3 of 3 ✔
[f7/37a2bf] process > WRITE_REGION_TOML (2) [100%] 3 of 3 ✔
[69/207667] process > WRITE_AF_TOML (1) [100%] 1 of 1 ✔
[c6/ec0a86] process > concat_toml [100%] 1 of 1 ✔
[ea/e32acb] process > ANNOTATE:annotate_vcf (1) [100%] 1 of 1 ✔
[5f/8128e9] process > ANNOTATE:index_vcf (1) [100%] 1 of 1 ✔
[0e/609915] process > green_varan (1) [100%] 1 of 1, failed: 1 ✘

[2022-06-13T10:38:41] - INFO: Reading config from file: prioritize_smallvars.json
[2022-06-13T10:38:41] - INFO: N selected chromosomes: 25
[2022-06-13T10:38:41] - INFO: N selected genes: 0
[2022-06-13T10:38:41] - INFO: Update existing gene annotations: true
[2022-06-13T10:38:41] - INFO: Filter mode active: false
[2022-06-13T10:38:41] - INFO: === Start processing VCF ===

Command error:
[E::bcf_hdr_read] Input is not detected as bcf or vcf format
/project/alfredo/GAU_tools/GREEN-VARAN/src/greenvaran.nim(40) greenvaran
/project/alfredo/GAU_tools/GREEN-VARAN/src/greenvaran.nim(37) main
/project/alfredo/GAU_tools/GREEN-VARAN/src/greenvaran/smallvars.nim(99) main
/project/alfredo/software/nim_packages/pkgs/hts-0.3.21/hts/vcf.nim(238) open
Error: unhandled exception: [hts-nim/vcf] error reading VCF header from 'GRCh38.test.smallvars.tmp.vcf.gz' [OSError]

########################################

but I think there are any problem in Annotate step without error-message.
tmp vcf file(input of green_varan) in ANNOTATE:annotate_vcf is empty.
what shold I do to fix it ?

Thank you!

=== GNOMAD AF === Cannot get property 'file' on null object

Hi,

I got an error as below when I ran workflow/main.nf. Do you know what has gone wrong.

N E X T F L O W ~ version 22.10.2
Launching /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/workflow/main.nf [irreverent_swanson] DSL2 - revision: ad29e3fa07
Available datasets. Star indicate corresponding file is available locally
=== SCORES ===
CADD: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_CADD.tsv.gz
DANN: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_DANN.tsv.gz
Eigen: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_Eigen.tsv.gz
ExPECTO: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_ExPECTO.tsv.gz
FATHMM_MKLNC: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_FATHMM-MKL_NC.tsv.gz
FATHMM_XF: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_FATHMM-XF_NC.tsv.gz
FIRE: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_FIRE.tsv.gz
GWAVA: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_gwava.bed.gz
LinSight: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_LinSight.bed.gz
NCBoost: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_NCBoost.tsv.gz
ncER: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_ncER_perc.bed.gz
PhyloP100: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_PhyloP100.bed.gz
ReMM: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_ReMM.tsv.gz
=== REGIONS ===
TFBS: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_TFBS.merged.bed.gz
DNase: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_DNase.merged.bed.gz
UCNE: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_UCNE.bed.gz
dbSuper: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/resources/GRCh37/GRCh37/GRCh37_dbSuper.bed.gz
=== GNOMAD AF ===
Cannot get property 'file' on null object

Access to undefined parameter

Dear Edoardo,

I have been attempting to run the Nextflow workflow on a VCF file (and on the test file provided within the package), however regardless of version of nextflow (I have used both v20.10.0 and 23.04.3), I receive the following errors:

WARN: Access to undefined parameter annotations -- Initialise it to a default value eg. params.annotations = some_value
Cannot get property 'GRCh38' on null object

I am not sure where I need to amend the code to fix this issue (I have used the same code given in the example workflow) and did check one of the other issues raised on gitHub, however the provided command on the "workflow green_varan failed #8" issue thread did not run for me either.

I would appreciate any support with this matter.

Thank you,
Safaa

greendb_level issue

I want to use GREEN-VARAN workflow to annotate regulation variants. Rare variant (population AF < 1%) overlapping one of GREEN-DB regions was defined as greendb_level=1, whereas the variant with population AF > 1% was also defined as greendb_level=1 in my VCF file. Any change of GREEN-VARAN compare to previous version in your paper?

This is one example:AC=1;AF=0.125;AN=2;BaseQRankSum=-1.981;DP=18;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.125;MQ=40;MQRankSum=0;QD=6.73;ReadPosRankSum=0;SOR=0.892;ReMM=0.486;ncER=74.0135;FATHMM_MKLNC=0.9039;TFBS;gnomAD_AF=0.7122;gnomAD_AF_afr=0.3236;gnomAD_AF_amr=0.7951;gnomAD_AF_nfe=0.8539;greendb_id=42036_pro;greendb_stdtype=promoter;greendb_dbsource=ENCODE-HMM;greendb_genes=AL669831.3,OR4F16,CICP3;greendb_level=1;ANN=A|promoter|MODIFIER|CICP3||||||||||||,A|promoter|MODIFIER|AL669831.3||||||||||||,A|promoter|MODIFIER|OR4F16||||||||||||

test run succeed but actual run failed

Dear developer,

I have successfully config everything and ran the test file but failed to run my one file. Do you know what has gone wrong? Many thanks in advance!

test run
[2023-11-15T12:59:21] - INFO: Reading config from file: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/config/prioritize_smallvars.json
[2023-11-15T12:59:21] - INFO: N selected chromosomes: 25
[2023-11-15T12:59:21] - INFO: N selected genes: 23
[2023-11-15T12:59:21] - INFO: Update existing gene annotations: true
[2023-11-15T12:59:21] - INFO: Output to: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/test/out/test_smallvars.vcf
[2023-11-15T12:59:21] - INFO: Filter mode active: false
[2023-11-15T12:59:21] - INFO: === Start processing VCF ===
[2023-11-15T12:59:21] - WARN: gnomAD_AF field defined in config not present in the VCF header
[2023-11-15T12:59:21] - WARN: gnomAD_AF_nfe field defined in config not present in the VCF header
[2023-11-15T12:59:21] - WARN: TFBS field defined in config not present in the VCF header
[2023-11-15T12:59:21] - WARN: DNase field defined in config not present in the VCF header
[2023-11-15T12:59:21] - WARN: UCNE field defined in config not present in the VCF header
[2023-11-15T12:59:21] - WARN: FATHMM_MKLNC field defined in config not present in the VCF header
[2023-11-15T12:59:21] - WARN: ncER field defined in config not present in the VCF header
[2023-11-15T12:59:21] - WARN: ReMM field defined in config not present in the VCF header
[2023-11-15T12:59:21] - WARN: Prioritize is not active and all variants will get level zero
[2023-11-15T12:59:47] - INFO: 10000 vars analyzed, last batch in 0.0 min 25.39 sec
[2023-11-15T13:00:14] - INFO: 20000 vars analyzed, last batch in 0.0 min 27.15 sec
[2023-11-15T13:00:21] - INFO: 12497 variants annotated with greendb information
[2023-11-15T13:00:21] - INFO: 35 vars of interest based on the input gene list if any
[2023-11-15T13:00:21] - INFO: 24000 variants written to output
[2023-11-15T13:00:21] - INFO: All done - Completed in 0.0 min 59.84 sec

actual run
[2023-11-15T12:48:29] - INFO: Reading config from file: /proj/sens2023551/wgs_nes/GREEN-VARAN-1.2/config/prioritize_smallvars.json
[2023-11-15T12:48:29] - INFO: N selected chromosomes: 25
[2023-11-15T12:48:29] - INFO: N selected genes: 1
[2023-11-15T12:48:29] - INFO: Update existing gene annotations: true
[2023-11-15T12:48:29] - INFO: Output to: /proj/sens2023551/wgs_nes/Green_annotation/All_G1A_gatkcomb_rhocall_vt_af_frqf_cadd_vep_parsed_ranked.selected.green.vcf.gz
[2023-11-15T12:48:29] - INFO: Filter mode active: false
[2023-11-15T12:48:29] - INFO: === Start processing VCF ===
[2023-11-15T12:48:29] - WARN: gnomAD_AF field defined in config not present in the VCF header
[2023-11-15T12:48:29] - WARN: gnomAD_AF_nfe field defined in config not present in the VCF header
[2023-11-15T12:48:29] - WARN: TFBS field defined in config not present in the VCF header
[2023-11-15T12:48:29] - WARN: DNase field defined in config not present in the VCF header
[2023-11-15T12:48:29] - WARN: UCNE field defined in config not present in the VCF header
[2023-11-15T12:48:29] - WARN: FATHMM_MKLNC field defined in config not present in the VCF header
[2023-11-15T12:48:29] - WARN: ncER field defined in config not present in the VCF header
[2023-11-15T12:48:29] - WARN: ReMM field defined in config not present in the VCF header
[2023-11-15T12:48:29] - WARN: Prioritize is not active and all variants will get level zero
[2023-11-15T12:48:29] - WARN: No ANN or BCSQ field detected in header so ANN will be created
[2023-11-15T12:48:29] - INFO: 0 variants annotated with greendb information
[2023-11-15T12:48:29] - INFO: 0 vars of interest based on the input gene list if any
[2023-11-15T12:48:29] - INFO: 0 variants written to output
[2023-11-15T12:48:29] - INFO: All done - Completed in 0.0 min 0.07 sec

Only 2 annotated variants in the output of the test vcf

Hello! Thank for having developed GREEN-VARAN
I've tried to run the workflow with the test data using:

main.nf -profile local --input ./test/VCF/GRCh38.test.smallvars.vcf.gz
--build GRCh38 --
out results --scores best --regions best --AF
--greenvaran_config config/prioritize_smallvars.json
--greenvaran_dbschema config/greendb_schema_v2.5.json

It seems to work (no error message) but when I look at the annotated vcf I've got only two annotated variants whereas the initial vcf contains several thousands of variants.
I've also tried with a vcf I want to annotate and the output vcf is empty (it contains only the header).

Do you have an idea of what goes wrong?
Thank you

test error

Hi Edoardo,
I've downloaded the binaries and all the GRCh38 files
When I try to run the test I get the following error:
`$ ./greenvaran smallvars -i test/VCF/GRCh38.test.smallvars.vcf.gz -o test/out/test_smallvars.vcf --db resources/GRCh38/GRCh38_GREEN-DB.bed.gz --dbschema config/greendb_schema_v2.5.json --config config/prioritize_smallvars.json --genes test/VCF/genes_list_example.txt
/ )( _ ( )( )( ( \ ___ / )( \ / \ ( _ \ / \ ( ( \
( ( \ ) / ) ) ) ) / /()\ / // \ ) // / /
_/(_)()()_)) __/ _/_/(_)_/_/_))

                                          _.-~`  `~-.
              _.--~~~---,.__          _.,;; .   -=(@'`\\
           .-`              ``~~~~--~~` ';;;       ____)
        _.'            '.              ';;;;;    '`_.'
     .-~;`               `\           ' ';;;;;__.~`
   .' .'          `'.     |           /  /;''
    \/      .---'' ``)   /'-._____.--'\  \\
   _/|    (`        /  /`              `\ \__

', /- \ \ __/ (_ /-\-\-
;'-..___) | /--- -. .'
jgs `~~~~``

[2023-12-18T13:10:55] - INFO: Reading config from file: config/prioritize_smallvars.json
[2023-12-18T13:10:55] - INFO: N selected chromosomes: 25
[2023-12-18T13:10:55] - INFO: N selected genes: 23
[2023-12-18T13:10:55] - INFO: Update existing gene annotations: true
[2023-12-18T13:10:55] - INFO: Filter mode active: false
[2023-12-18T13:10:55] - INFO: === Start processing VCF ===
[2023-12-18T13:10:55] - WARN: gnomAD_AF field defined in config not present in the VCF header
[2023-12-18T13:10:55] - WARN: gnomAD_AF_nfe field defined in config not present in the VCF header
[2023-12-18T13:10:55] - WARN: TFBS field defined in config not present in the VCF header
[2023-12-18T13:10:55] - WARN: DNase field defined in config not present in the VCF header
[2023-12-18T13:10:55] - WARN: UCNE field defined in config not present in the VCF header
[2023-12-18T13:10:55] - WARN: FATHMM_MKLNC field defined in config not present in the VCF header
[2023-12-18T13:10:55] - WARN: ncER field defined in config not present in the VCF header
[2023-12-18T13:10:55] - WARN: ReMM field defined in config not present in the VCF header
[2023-12-18T13:10:55] - WARN: Prioritize is not active and all variants will get level zero
strutils.nim(1159) parseFloat
Error: unhandled exception: invalid float: FO704657.1 [ValueError]
`

what is going wrong?
Thanks

[FEATURE REQUEST] Option to limit size of SVs which are annotated

Hi @edg1983, I was wondering if you could consider adding a parameter to set the maximum size of SV to annotate?

Some callers report extremely large SVs which are likely to be spurious (e.g., an inversion with SVLEN=153641285). When these large SVs get annotated with all overlapping regulatory regions, the output file can be quite cumbersome. It would be nice if I could choose to annotate only SVs less than 1 Mb, for example.

What do you think?