Hello, I installed spliceAI but I got the below error when I ran the

running error about spliceai HOT 7 CLOSED

illumina commented on August 14, 2024

running error

from spliceai.

Comments (7)

kishorejaganathan commented on August 14, 2024

Could you open python, do the following and let me know the output?

import pyfasta
fasta = pyfasta.Fasta('examples/human_g1k_v37.fasta')
print fasta.keys()

from spliceai.

xuguorong2016 commented on August 14, 2024

Yes, looks no any error when I printed the fasta keys. Please see the below messages

import pyfasta
fasta = pyfasta.Fasta('examples/human_g1k_v37.fasta')
print fasta.keys()
['1 dna:chromosome chromosome:GRCh37:1:1:249250621:1', 'GL000192.1 dna:supercontig supercontig::GL000192.1:1:547496:1', 'GL000239.1 dna:supercontig supercontig::GL000239.1:1:33824:1', 'GL000207.1 dna:supercontig supercontig::GL000207.1:1:4262:1', '16 dna:chromosome chromosome:GRCh37:16:1:90354753:1', 'GL000235.1 dna:supercontig supercontig::GL000235.1:1:34474:1', '2 dna:chromosome chromosome:GRCh37:2:1:243199373:1', '13 dna:chromosome chromosome:GRCh37:13:1:115169878:1', 'GL000210.1 dna:supercontig supercontig::GL000210.1:1:27682:1', 'GL000224.1 dna:supercontig supercontig::GL000224.1:1:179693:1', '4 dna:chromosome chromosome:GRCh37:4:1:191154276:1', '18 dna:chromosome chromosome:GRCh37:18:1:78077248:1', 'GL000241.1 dna:supercontig supercontig::GL000241.1:1:42152:1', 'GL000248.1 dna:supercontig supercontig::GL000248.1:1:39786:1', 'GL000208.1 dna:supercontig supercontig::GL000208.1:1:92689:1', 'GL000243.1 dna:supercontig supercontig::GL000243.1:1:43341:1', 'GL000198.1 dna:supercontig supercontig::GL000198.1:1:90085:1', 'GL000238.1 dna:supercontig supercontig::GL000238.1:1:39939:1', '3 dna:chromosome chromosome:GRCh37:3:1:198022430:1', '8 dna:chromosome chromosome:GRCh37:8:1:146364022:1', '7 dna:chromosome chromosome:GRCh37:7:1:159138663:1', '22 dna:chromosome chromosome:GRCh37:22:1:51304566:1', 'GL000219.1 dna:supercontig supercontig::GL000219.1:1:179198:1', 'GL000211.1 dna:supercontig supercontig::GL000211.1:1:166566:1', 'GL000194.1 dna:supercontig supercontig::GL000194.1:1:191469:1', 'GL000246.1 dna:supercontig supercontig::GL000246.1:1:38154:1', 'GL000236.1 dna:supercontig supercontig::GL000236.1:1:41934:1', 'GL000196.1 dna:supercontig supercontig::GL000196.1:1:38914:1', 'MT gi|251831106|ref|NC_012920.1| Homo sapiens mitochondrion, complete genome', 'GL000232.1 dna:supercontig supercontig::GL000232.1:1:40652:1', 'GL000221.1 dna:supercontig supercontig::GL000221.1:1:155397:1', 'GL000216.1 dna:supercontig supercontig::GL000216.1:1:172294:1', 'GL000245.1 dna:supercontig supercontig::GL000245.1:1:36651:1', 'GL000191.1 dna:supercontig supercontig::GL000191.1:1:106433:1', 'GL000209.1 dna:supercontig supercontig::GL000209.1:1:159169:1', '12 dna:chromosome chromosome:GRCh37:12:1:133851895:1', 'GL000220.1 dna:supercontig supercontig::GL000220.1:1:161802:1', 'GL000217.1 dna:supercontig supercontig::GL000217.1:1:172149:1', '5 dna:chromosome chromosome:GRCh37:5:1:180915260:1', '21 dna:chromosome chromosome:GRCh37:21:1:48129895:1', 'GL000203.1 dna:supercontig supercontig::GL000203.1:1:37498:1', 'GL000225.1 dna:supercontig supercontig::GL000225.1:1:211173:1', 'GL000195.1 dna:supercontig supercontig::GL000195.1:1:182896:1', 'GL000240.1 dna:supercontig supercontig::GL000240.1:1:41933:1', 'GL000242.1 dna:supercontig supercontig::GL000242.1:1:43523:1', 'GL000223.1 dna:supercontig supercontig::GL000223.1:1:180455:1', 'GL000200.1 dna:supercontig supercontig::GL000200.1:1:187035:1', '6 dna:chromosome chromosome:GRCh37:6:1:171115067:1', 'GL000247.1 dna:supercontig supercontig::GL000247.1:1:36422:1', 'GL000202.1 dna:supercontig supercontig::GL000202.1:1:40103:1', 'GL000193.1 dna:supercontig supercontig::GL000193.1:1:189789:1', '10 dna:chromosome chromosome:GRCh37:10:1:135534747:1', '20 dna:chromosome chromosome:GRCh37:20:1:63025520:1', 'GL000197.1 dna:supercontig supercontig::GL000197.1:1:37175:1', 'GL000237.1 dna:supercontig supercontig::GL000237.1:1:45867:1', 'Y dna:chromosome chromosome:GRCh37:Y:2649521:59034049:1', 'GL000213.1 dna:supercontig supercontig::GL000213.1:1:164239:1', 'GL000215.1 dna:supercontig supercontig::GL000215.1:1:172545:1', '11 dna:chromosome chromosome:GRCh37:11:1:135006516:1', 'GL000205.1 dna:supercontig supercontig::GL000205.1:1:174588:1', 'GL000222.1 dna:supercontig supercontig::GL000222.1:1:186861:1', '15 dna:chromosome chromosome:GRCh37:15:1:102531392:1', 'GL000199.1 dna:supercontig supercontig::GL000199.1:1:169874:1', 'GL000249.1 dna:supercontig supercontig::GL000249.1:1:38502:1', 'GL000227.1 dna:supercontig supercontig::GL000227.1:1:128374:1', 'GL000218.1 dna:supercontig supercontig::GL000218.1:1:161147:1', '17 dna:chromosome chromosome:GRCh37:17:1:81195210:1', 'GL000212.1 dna:supercontig supercontig::GL000212.1:1:186858:1', 'GL000226.1 dna:supercontig supercontig::GL000226.1:1:15008:1', 'GL000234.1 dna:supercontig supercontig::GL000234.1:1:40531:1', 'GL000214.1 dna:supercontig supercontig::GL000214.1:1:137718:1', 'GL000233.1 dna:supercontig supercontig::GL000233.1:1:45941:1', 'GL000206.1 dna:supercontig supercontig::GL000206.1:1:41001:1', '19 dna:chromosome chromosome:GRCh37:19:1:59128983:1', 'GL000230.1 dna:supercontig supercontig::GL000230.1:1:43691:1', '9 dna:chromosome chromosome:GRCh37:9:1:141213431:1', 'GL000244.1 dna:supercontig supercontig::GL000244.1:1:39929:1', 'X dna:chromosome chromosome:GRCh37:X:1:155270560:1', 'GL000204.1 dna:supercontig supercontig::GL000204.1:1:81310:1', 'GL000231.1 dna:supercontig supercontig::GL000231.1:1:27386:1', '14 dna:chromosome chromosome:GRCh37:14:1:107349540:1', 'GL000201.1 dna:supercontig supercontig::GL000201.1:1:36148:1', 'GL000229.1 dna:supercontig supercontig::GL000229.1:1:19913:1', 'GL000228.1 dna:supercontig supercontig::GL000228.1:1:129120:1']

from spliceai.

kishorejaganathan commented on August 14, 2024

When the software is run on examples/input.vcf, it assumes that the keys of the fasta file are either ['chrX', 'chrY', 'chr1', 'chr2', and so on] or ['X', 'Y', '1', '2', and so on]. Could you try using a more standard fasta file (like the one available in the UCSC genome browser)?

from spliceai.

bw2 commented on August 14, 2024

I'm also having issues with 'chr' prefixes and getting spliceai to run.
I installed tensorflow and spliceai on MacOSX using pip, then downloaded hg38.fa.gz from
http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz

I then tried running on whole_genome_filtered_spliceai_scores.vcf.gz as a simple test (even though it's an hg19 vcf):

$ python -m spliceai -R hg38.fa.gz -I whole_genome_filtered_spliceai_scores.vcf.gz -O temp.vcf
Using TensorFlow backend.
Traceback (most recent call last):
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/lib/python2.7/site-packages/spliceai/__main__.py", line 60, in <module>
    main()
  File "/usr/local/lib/python2.7/site-packages/spliceai/__main__.py", line 50, in main
    ann = annotator(args.R, args.A)
  File "/usr/local/lib/python2.7/site-packages/spliceai/utils.py", line 23, in __init__
    self.ref_fasta = pyfasta.Fasta(ref_fasta)
  File "/usr/local/lib/python2.7/site-packages/pyfasta/fasta.py", line 73, in __init__
    flatten_inplace)
  File "/usr/local/lib/python2.7/site-packages/pyfasta/records.py", line 57, in prepare
    for i, (seqid, seq) in enumerate(seqinfo_generator):
  File "/usr/local/lib/python2.7/site-packages/pyfasta/fasta.py", line 110, in gen_seqs_with_headers
    seqs.append(line)
AttributeError: 'NoneType' object has no attribute 'append'

I realized this error is because spliceai/pyfasta can't handle gzipped fasta files, so I unzipped hg38.fa.gz and ran

$ python -m spliceai -R hg38.fa -I whole_genome_filtered_spliceai_scores.vcf.gz -O temp.vcf
Using TensorFlow backend.
2019-01-27 20:17:20.578701: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
/usr/local/lib/python2.7/site-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
[W::vcf_parse] Contig '10' is not defined in the header. (Quick workaround: index the file with tabix.)
Segmentation fault: 11

I then realized that all spliceai examples use an uncompressed vcf, so I uncompressed whole_genome_filtered_spliceai_scores.vcf.gz and ran python -m spliceai -R hg38.fa -I whole_genome_filtered_spliceai_scores.vcf -O temp.vcf
but still got a segfault:

$ python -m spliceai -R hg38.fa -I whole_genome_filtered_spliceai_scores.vcf -O temp.vcf
Using TensorFlow backend.
2019-01-27 20:22:48.173551: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
/usr/local/lib/python2.7/site-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
[W::vcf_parse] Contig '10' is not defined in the header. (Quick workaround: index the file with tabix.)
Segmentation fault: 11

from spliceai.

kishorejaganathan commented on August 14, 2024

There are a couple of things that you are missing right now:

The VCF that you are using as input (whole_genome_filtered_spliceai_scores.vcf.gz) does not have some lines in the header which are required by pysam. If you could add these lines to the header of the VCF, you will no longer get the segmentation fault (you could also get these lines from the input example).

##assembly=GRCh37/hg19 ##contig=<ID=1,length=249250621> ##contig=<ID=2,length=243199373> ##contig=<ID=3,length=198022430> ##contig=<ID=4,length=191154276> ##contig=<ID=5,length=180915260> ##contig=<ID=6,length=171115067> ##contig=<ID=7,length=159138663> ##contig=<ID=8,length=146364022> ##contig=<ID=9,length=141213431> ##contig=<ID=10,length=135534747> ##contig=<ID=11,length=135006516> ##contig=<ID=12,length=133851895> ##contig=<ID=13,length=115169878> ##contig=<ID=14,length=107349540> ##contig=<ID=15,length=102531392> ##contig=<ID=16,length=90354753> ##contig=<ID=17,length=81195210> ##contig=<ID=18,length=78077248> ##contig=<ID=19,length=59128983> ##contig=<ID=20,length=63025520> ##contig=<ID=21,length=48129895> ##contig=<ID=22,length=51304566> ##contig=<ID=X,length=155270560> ##contig=<ID=Y,length=59373566>

Second, the VCF that you are using (+ the default annotation file) corresponds to hg19/GRCh37. So please download and use the hg19 fasta file instead. I believe UCSC provides them separately for each chromosome, so you might have to concatenate them.

from spliceai.

david-a-parry commented on August 14, 2024

When the software is run on examples/input.vcf, it assumes that the keys of the fasta file are either ['chrX', 'chrY', 'chr1', 'chr2', and so on] or ['X', 'Y', '1', '2', and so on]. Could you try using a more standard fasta file (like the one available in the UCSC genome browser)?

It might be helpful for a lot of people to use pyfaidx (https://github.com/mdshw5/pyfaidx) rather than pyfasta as a lot of people will be using references like those recommended by Heng Li (http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use) or as provided by the GATK resource bundle. These references all contain fields after the contig name in the FASTA headers which cause pyfasta to fail, while pyfaidx handles these as expected.

from spliceai.

kishorejaganathan commented on August 14, 2024

Thanks for the suggestion, the current release (v1.2) uses pyfaidx and also handles the 1, 'chr1' chromosome naming mismatch automatically.

from spliceai.

running error about spliceai HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent