vipints / gfftools-gx Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 16.0 207 KB

gfftools - Galaxy toolshed repository

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

gfftools-gx's People

Contributors

Stargazers

Watchers

Forkers

esjokvist tw7649116 asiergonzalez gaoyubang tomatopepper marysgithub ycliangtaiwan buenoofspain zm-git-dev mingleiyang yufanhui ivandamg singletrips harichhetri wtmbiohacker heebaanjum

gfftools-gx's Issues

ValueError: too many values to unpack

I'm trying to use gff_to_gtf.py to convert Potato genome v.4.03 annotation (http://solanaceae.plantbiology.msu.edu/pgsc_download.shtml; precisely, this file: http://solanaceae.plantbiology.msu.edu/data/PGSC_DM_V403_genes.gff.zip) in gft ,and I'm getting this:

~$ python scripts/GFFtools-GX/gff_to_gtf.py PGSC_DM_V403_genes.gff > PGSC_DM_V403_genes.gtf
Traceback (most recent call last):
File "scripts/GFFtools-GX/gff_to_gtf.py", line 77, in
Transcriptdb = GFFParser.Parse(gff_fname)
File "/home/stefano/scripts/GFFtools-GX/GFFParser.py", line 130, in Parse
ftype, tags = attribute_tags(parts[-1])
File "/home/stefano/scripts/GFFtools-GX/GFFParser.py", line 61, in attribute_tags
key, val = item
ValueError: too many values to unpack

what does it mean?
I just did git clone of the package; am I missing something ?
Thanks
s.

TypeError: cannot perform reduce with flexible type

Hi Vipin,

Thank you very much for the GFFtools scripts. However, when I run the command,

python gff_to_bed.py ref.gff3 > out2.bed

while converting gff3 to bed 12 format, I receive an error below:

Traceback (most recent call last):
  File "gff_to_bed.py", line 115, in <module>
    __main__() 
  File "gff_to_bed.py", line 112, in __main__
    writeBED(Transcriptdb)
  File "gff_to_bed.py", line 86, in writeBED
    score = ent1['score'][0] if ent1['score'].any() else score
  File "/home/.linuxbrew/Cellar/python/2.7.13/lib/python2.7/site-packages/numpy/core/_methods.py", line 38, in _any
    return umr_any(a, axis, dtype, out, keepdims)
TypeError: cannot perform reduce with flexible type

Could I please know how to fix this error?

GFF-to-GBK removed

Hi,

I'm wondering why you removed GFF-to-GBK from your tool list ?
c992745

One of my users want to use something like that.

Thanks by advance

Bug in negative strands of exons

There's a bug in the gff to gtf converter:

Deha2F_6        JGI     exon    544121  545199  .       -       .       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g";
Deha2F_6        JGI     CDS     544121  545199  .       -       2       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g";
Deha2F_6        JGI     stop_codon      544121  544123  .       -       0       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g";
Deha2F_6        JGI     exon    545644  545872  .       -       .       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g";
Deha2F_6        JGI     CDS     545644  545872  .       -       0       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g";
Deha2F_6        JGI     start_codon     545870  545872  .       -       0       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g";

converts to:


Deha2F_6        JGI     exon    544121  545199  .       -       .       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g"; exon_number "1"; gene_name "DEHA2F06138g";
Deha2F_6        JGI     CDS     544121  544123  .       -       0       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g"; exon_number "1"; gene_name "DEHA2F06138g";
Deha2F_6        JGI     start_codon     545870  545872  .       -       0       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g"; exon_number "1"; gene_name "DEHA2F06138g";
Deha2F_6        JGI     exon    545644  545872  .       -       .       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g"; exon_number "2"; gene_name "DEHA2F06138g";
Deha2F_6        JGI     CDS     544121  545199  .       -       2       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g"; exon_number "2"; gene_name "DEHA2F06138g";
Deha2F_6        JGI     stop_codon      544121  544123  .       -       2       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g"; exon_number "2"; gene_name "DEHA2F06138g";

Happens to negative strand genes only. CDS coordinates are completely wrong.

gff-to-gtf - output contains only exons

Hi,

the output of gff-to-gtf.py contains only exon features even if the input file is a complete GFF file with all features...
Could you please fix this error?

Thanks.
Séverine

gff_to_bed TypeError: len() of unsized object

I encountered this issue in converting GCF_000001405.33_GRCh38.p7_genomic.gff.

Traceback (most recent call last):
  File "/.../gff_to_bed.py", line 119, in <module>
    __main__() 
  File "/.../gff_to_bed.py", line 116, in __main__
    writeBED(Transcriptdb)
  File "/.../gff_to_bed.py", line 55, in writeBED
    exon_cnt = len(ent1['exons'][idx])
TypeError: len() of unsized object

There are some cases where a feature has no records that would go into blocks and so is represented by a 0-dimensional numpy array containing nan. I feel like there could be some deeper problems here but in the end the following fix seems to have worked:

def writeBED(tinfo):
    """
    writing result files in bed format 

    @args tinfo: list of genes 
    @type tinfo: numpy object  
    """

    for ent1 in tinfo:
        child_flag = False  

        for idx, tid in enumerate(ent1['transcripts']):
            child_flag = True 
            exon_cnt = 0
            exon_len = ''
            exon_cod = '' 
            rel_start = None 
            rel_stop = None 
            if ent1['exons'][idx].ndim > 0:
                exon_cnt = len(ent1['exons'][idx])
                for idz, ex_cod in enumerate(ent1['exons'][idx]):#check for exons of corresponding transcript  
                    exon_len += '%d,' % (ex_cod[1]-ex_cod[0]+1)
                    if idz == 0: #calculate the relative start position 
                        exon_cod += '0,'
                        rel_start = int(ex_cod[0])-1 
                        rel_stop = int(ex_cod[1])
                    else:
                        exon_cod += '%d,' % (ex_cod[0]-1-rel_start) ## shifting the coordinates to zero 
                        rel_stop = int(ex_cod[1])
...

gff_to_bed int and str arguments to - operator

Hi I encountered this while converting GCF_000001405.33_GRCh38.p7_genomic.gff.

I got an error about a string and int value being passed as operands to the - operator.

In gff_to_bed.py, line 90 is missing some parentheses:

89            out_print = [ent1['chr'], 
90                        '%d' % int(ent1['start'])-1,
91                        '%d' % int(ent1['stop']),

This fixed the issue:

89            out_print = [ent1['chr'], 
90                        '%d' % (int(ent1['start'])-1),
91                        '%d' % int(ent1['stop']),

Thanks for the great tool!

gff-to-gtf off-by-one error?

I noticed some weird error when converting a GFF to GTF with gff_to_gtf.py.

This is the gene model in the original GFF:

scaffold_00034	JGI	exon	1	472	.	-	.	name "CLAGR_004651-RA"; transcriptId 5353
scaffold_00034	JGI	CDS	1	472	.	-	1	name "CLAGR_004651-RA"; proteinId 5305; exonNumber 3
scaffold_00034	JGI	exon	527	1274	.	-	.	name "CLAGR_004651-RA"; transcriptId 5353
scaffold_00034	JGI	CDS	527	1274	.	-	2	name "CLAGR_004651-RA"; proteinId 5305; exonNumber 2
scaffold_00034	JGI	exon	1326	1593	.	-	.	name "CLAGR_004651-RA"; transcriptId 5353
scaffold_00034	JGI	CDS	1326	1593	.	-	0	name "CLAGR_004651-RA"; proteinId 5305; exonNumber 1
scaffold_00034	JGI	start_codon	1591	1593	.	-	0	name "CLAGR_004651-RA"

The exon3 annotation goes from 1…472 on the - strand. After the conversion to GTF this is the result:

scaffold_00034	JGI	exon	0	472	.	-	.	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "1"; gene_name "";
scaffold_00034	JGI	CDS	1	472	.	-	1	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "1"; gene_name "";
scaffold_00034	JGI	start_codon	1591	1593	.	-	1	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "1"; gene_name "";
scaffold_00034	JGI	exon	527	1274	.	-	.	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "2"; gene_name "";
scaffold_00034	JGI	CDS	527	1274	.	-	2	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "2"; gene_name "";
scaffold_00034	JGI	exon	1326	1593	.	-	.	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "3"; gene_name "";
scaffold_00034	JGI	CDS	1326	1593	.	-	0	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "3"; gene_name "";
scaffold_00034	JGI	stop_codon	1	3	.	-	0	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "3"; gene_name "";

Somehow the exon now goes from 0…472 instead. I somehow assume it's a weird behavior if your feature borders to the end of the sequence?

gtf -> gff conversion truncates chromosome names

Thanks very much for making these tools, which have helped me greatly working between different builds/tools/versions of data.

I've just run into a surprising bug. When I convert a gtf file to a gff some of the the long chromosome names are tuncated. An example gtf:

chrUn_AAWZ02036000  anoCar2_ensGene stop_codon  3899    3901    0.000000    -   .   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036000  anoCar2_ensGene CDS 3902    5131    0.000000    -   0   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036000  anoCar2_ensGene exon    3899    5131    0.000000    -   .   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036000  anoCar2_ensGene CDS 25336   25522   0.000000    -   1   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036000  anoCar2_ensGene exon    25336   25522   0.000000    -   .   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036000  anoCar2_ensGene CDS 25602   26479   0.000000    -   0   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036000  anoCar2_ensGene start_codon 26477   26479   0.000000    -   .   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036000  anoCar2_ensGene exon    25602   26479   0.000000    -   .   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036001  anoCar2_ensGene stop_codon  1674    1676    0.000000    -   .   gene_id "ENSACAT00000001077.3"; transcript_id "ENSACAT00000001077.3"; 
chrUn_AAWZ02036001  anoCar2_ensGene CDS 1677    1805    0.000000    -   0   gene_id "ENSACAT00000001077.3"; transcript_id "ENSACAT00000001077.3";

gives rise to

./gtf_to_gff.py test.gtf

##gff-version 3
chrUn_AAWZ02036 anoCar2_ensGene gene    1674    1805    .   -   .   ID=ENSACAT00000001077.3;Name=ENSACAT00000001077.3
chrUn_AAWZ02036 anoCar2_ensGene mRNA    1677    1805    .   -   .   ID=Transcript:ENSACAT00000001077.3;Parent=ENSACAT00000001077.3
chrUn_AAWZ02036 anoCar2_ensGene CDS 1674    1805    .   -   0   Parent=Transcript:ENSACAT00000001077.3
chrUn_AAWZ02036 anoCar2_ensGene three_prime_UTR 1677    1805    .   -   .   Parent=Transcript:ENSACAT00000001077.3
chrUn_AAWZ02036 anoCar2_ensGene exon    1677    1805    .   -   .   Parent=Transcript:ENSACAT00000001077.3
chrUn_AAWZ02036 anoCar2_ensGene gene    3899    26479   .   -   .   ID=ENSACAT00000000307.2;Name=ENSACAT00000000307.2
chrUn_AAWZ02036 anoCar2_ensGene mRNA    3899    26479   .   -   .   ID=Transcript:ENSACAT00000000307.2;Parent=ENSACAT00000000307.2
chrUn_AAWZ02036 anoCar2_ensGene CDS 3899    5131    .   -   0   Parent=Transcript:ENSACAT00000000307.2
chrUn_AAWZ02036 anoCar2_ensGene CDS 25336   25522   .   -   1   Parent=Transcript:ENSACAT00000000307.2
chrUn_AAWZ02036 anoCar2_ensGene CDS 25602   26479   .   -   0   Parent=Transcript:ENSACAT00000000307.2
chrUn_AAWZ02036 anoCar2_ensGene exon    3899    5131    .   -   .   Parent=Transcript:ENSACAT00000000307.2
chrUn_AAWZ02036 anoCar2_ensGene exon    25336   25522   .   -   .   Parent=Transcript:ENSACAT00000000307.2
chrUn_AAWZ02036 anoCar2_ensGene exon    25602   26479   .   -   .   Parent=Transcript:ENSACAT00000000307.2

(note the final three digits of each "chromosome" (actually contig) name is missing. This doesn't seem to effect sequences with smaller names.

vipints / gfftools-gx Goto Github PK

gfftools-gx's People

Contributors

Stargazers

Watchers

Forkers

gfftools-gx's Issues

ValueError: too many values to unpack

TypeError: cannot perform reduce with flexible type

GFF-to-GBK removed

Bug in negative strands of exons

gff-to-gtf - output contains only exons

gff_to_bed TypeError: len() of unsized object

gff_to_bed int and str arguments to - operator

gff-to-gtf off-by-one error?

gtf -> gff conversion truncates chromosome names

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent