Code Monkey home page Code Monkey logo

gfftools-gx's People

Contributors

vipints avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

gfftools-gx's Issues

ValueError: too many values to unpack

I'm trying to use gff_to_gtf.py to convert Potato genome v.4.03 annotation (http://solanaceae.plantbiology.msu.edu/pgsc_download.shtml; precisely, this file: http://solanaceae.plantbiology.msu.edu/data/PGSC_DM_V403_genes.gff.zip) in gft ,and I'm getting this:

~$ python scripts/GFFtools-GX/gff_to_gtf.py PGSC_DM_V403_genes.gff > PGSC_DM_V403_genes.gtf
Traceback (most recent call last):
File "scripts/GFFtools-GX/gff_to_gtf.py", line 77, in
Transcriptdb = GFFParser.Parse(gff_fname)
File "/home/stefano/scripts/GFFtools-GX/GFFParser.py", line 130, in Parse
ftype, tags = attribute_tags(parts[-1])
File "/home/stefano/scripts/GFFtools-GX/GFFParser.py", line 61, in attribute_tags
key, val = item
ValueError: too many values to unpack

what does it mean?
I just did git clone of the package; am I missing something ?
Thanks
s.

TypeError: cannot perform reduce with flexible type

Hi Vipin,

Thank you very much for the GFFtools scripts. However, when I run the command,

python gff_to_bed.py ref.gff3 > out2.bed

while converting gff3 to bed 12 format, I receive an error below:

Traceback (most recent call last):
  File "gff_to_bed.py", line 115, in <module>
    __main__() 
  File "gff_to_bed.py", line 112, in __main__
    writeBED(Transcriptdb)
  File "gff_to_bed.py", line 86, in writeBED
    score = ent1['score'][0] if ent1['score'].any() else score
  File "/home/.linuxbrew/Cellar/python/2.7.13/lib/python2.7/site-packages/numpy/core/_methods.py", line 38, in _any
    return umr_any(a, axis, dtype, out, keepdims)
TypeError: cannot perform reduce with flexible type

Could I please know how to fix this error?

GFF-to-GBK removed

Hi,

I'm wondering why you removed GFF-to-GBK from your tool list ?
c992745

One of my users want to use something like that.

Thanks by advance

Bug in negative strands of exons

There's a bug in the gff to gtf converter:

Deha2F_6        JGI     exon    544121  545199  .       -       .       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g";
Deha2F_6        JGI     CDS     544121  545199  .       -       2       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g";
Deha2F_6        JGI     stop_codon      544121  544123  .       -       0       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g";
Deha2F_6        JGI     exon    545644  545872  .       -       .       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g";
Deha2F_6        JGI     CDS     545644  545872  .       -       0       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g";
Deha2F_6        JGI     start_codon     545870  545872  .       -       0       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g";

converts to:


Deha2F_6        JGI     exon    544121  545199  .       -       .       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g"; exon_number "1"; gene_name "DEHA2F06138g";
Deha2F_6        JGI     CDS     544121  544123  .       -       0       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g"; exon_number "1"; gene_name "DEHA2F06138g";
Deha2F_6        JGI     start_codon     545870  545872  .       -       0       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g"; exon_number "1"; gene_name "DEHA2F06138g";
Deha2F_6        JGI     exon    545644  545872  .       -       .       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g"; exon_number "2"; gene_name "DEHA2F06138g";
Deha2F_6        JGI     CDS     544121  545199  .       -       2       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g"; exon_number "2"; gene_name "DEHA2F06138g";
Deha2F_6        JGI     stop_codon      544121  544123  .       -       2       gene_id "DEHA2F06138g"; transcript_id "DEHA2F06138g"; exon_number "2"; gene_name "DEHA2F06138g";

Happens to negative strand genes only. CDS coordinates are completely wrong.

gff-to-gtf - output contains only exons

Hi,

the output of gff-to-gtf.py contains only exon features even if the input file is a complete GFF file with all features...
Could you please fix this error?

Thanks.
Séverine

gff_to_bed TypeError: len() of unsized object

I encountered this issue in converting GCF_000001405.33_GRCh38.p7_genomic.gff.

Traceback (most recent call last):
  File "/.../gff_to_bed.py", line 119, in <module>
    __main__() 
  File "/.../gff_to_bed.py", line 116, in __main__
    writeBED(Transcriptdb)
  File "/.../gff_to_bed.py", line 55, in writeBED
    exon_cnt = len(ent1['exons'][idx])
TypeError: len() of unsized object

There are some cases where a feature has no records that would go into blocks and so is represented by a 0-dimensional numpy array containing nan. I feel like there could be some deeper problems here but in the end the following fix seems to have worked:

def writeBED(tinfo):
    """
    writing result files in bed format 

    @args tinfo: list of genes 
    @type tinfo: numpy object  
    """

    for ent1 in tinfo:
        child_flag = False  

        for idx, tid in enumerate(ent1['transcripts']):
            child_flag = True 
            exon_cnt = 0
            exon_len = ''
            exon_cod = '' 
            rel_start = None 
            rel_stop = None 
            if ent1['exons'][idx].ndim > 0:
                exon_cnt = len(ent1['exons'][idx])
                for idz, ex_cod in enumerate(ent1['exons'][idx]):#check for exons of corresponding transcript  
                    exon_len += '%d,' % (ex_cod[1]-ex_cod[0]+1)
                    if idz == 0: #calculate the relative start position 
                        exon_cod += '0,'
                        rel_start = int(ex_cod[0])-1 
                        rel_stop = int(ex_cod[1])
                    else:
                        exon_cod += '%d,' % (ex_cod[0]-1-rel_start) ## shifting the coordinates to zero 
                        rel_stop = int(ex_cod[1])
...

gff_to_bed int and str arguments to - operator

Hi I encountered this while converting GCF_000001405.33_GRCh38.p7_genomic.gff.

I got an error about a string and int value being passed as operands to the - operator.

In gff_to_bed.py, line 90 is missing some parentheses:

89            out_print = [ent1['chr'], 
90                        '%d' % int(ent1['start'])-1,
91                        '%d' % int(ent1['stop']),

This fixed the issue:

89            out_print = [ent1['chr'], 
90                        '%d' % (int(ent1['start'])-1),
91                        '%d' % int(ent1['stop']),

Thanks for the great tool!

gff-to-gtf off-by-one error?

I noticed some weird error when converting a GFF to GTF with gff_to_gtf.py.

This is the gene model in the original GFF:

scaffold_00034	JGI	exon	1	472	.	-	.	name "CLAGR_004651-RA"; transcriptId 5353
scaffold_00034	JGI	CDS	1	472	.	-	1	name "CLAGR_004651-RA"; proteinId 5305; exonNumber 3
scaffold_00034	JGI	exon	527	1274	.	-	.	name "CLAGR_004651-RA"; transcriptId 5353
scaffold_00034	JGI	CDS	527	1274	.	-	2	name "CLAGR_004651-RA"; proteinId 5305; exonNumber 2
scaffold_00034	JGI	exon	1326	1593	.	-	.	name "CLAGR_004651-RA"; transcriptId 5353
scaffold_00034	JGI	CDS	1326	1593	.	-	0	name "CLAGR_004651-RA"; proteinId 5305; exonNumber 1
scaffold_00034	JGI	start_codon	1591	1593	.	-	0	name "CLAGR_004651-RA"

The exon3 annotation goes from 1…472 on the - strand. After the conversion to GTF this is the result:

scaffold_00034	JGI	exon	0	472	.	-	.	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "1"; gene_name "";
scaffold_00034	JGI	CDS	1	472	.	-	1	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "1"; gene_name "";
scaffold_00034	JGI	start_codon	1591	1593	.	-	1	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "1"; gene_name "";
scaffold_00034	JGI	exon	527	1274	.	-	.	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "2"; gene_name "";
scaffold_00034	JGI	CDS	527	1274	.	-	2	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "2"; gene_name "";
scaffold_00034	JGI	exon	1326	1593	.	-	.	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "3"; gene_name "";
scaffold_00034	JGI	CDS	1326	1593	.	-	0	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "3"; gene_name "";
scaffold_00034	JGI	stop_codon	1	3	.	-	0	gene_id "CLAGR_004651-RA"; transcript_id "5305"; exon_number "3"; gene_name "";

Somehow the exon now goes from 0…472 instead. I somehow assume it's a weird behavior if your feature borders to the end of the sequence?

gtf -> gff conversion truncates chromosome names

Thanks very much for making these tools, which have helped me greatly working between different builds/tools/versions of data.

I've just run into a surprising bug. When I convert a gtf file to a gff some of the the long chromosome names are tuncated. An example gtf:

chrUn_AAWZ02036000  anoCar2_ensGene stop_codon  3899    3901    0.000000    -   .   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036000  anoCar2_ensGene CDS 3902    5131    0.000000    -   0   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036000  anoCar2_ensGene exon    3899    5131    0.000000    -   .   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036000  anoCar2_ensGene CDS 25336   25522   0.000000    -   1   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036000  anoCar2_ensGene exon    25336   25522   0.000000    -   .   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036000  anoCar2_ensGene CDS 25602   26479   0.000000    -   0   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036000  anoCar2_ensGene start_codon 26477   26479   0.000000    -   .   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036000  anoCar2_ensGene exon    25602   26479   0.000000    -   .   gene_id "ENSACAT00000000307.2"; transcript_id "ENSACAT00000000307.2"; 
chrUn_AAWZ02036001  anoCar2_ensGene stop_codon  1674    1676    0.000000    -   .   gene_id "ENSACAT00000001077.3"; transcript_id "ENSACAT00000001077.3"; 
chrUn_AAWZ02036001  anoCar2_ensGene CDS 1677    1805    0.000000    -   0   gene_id "ENSACAT00000001077.3"; transcript_id "ENSACAT00000001077.3";

gives rise to

./gtf_to_gff.py test.gtf 
##gff-version 3
chrUn_AAWZ02036 anoCar2_ensGene gene    1674    1805    .   -   .   ID=ENSACAT00000001077.3;Name=ENSACAT00000001077.3
chrUn_AAWZ02036 anoCar2_ensGene mRNA    1677    1805    .   -   .   ID=Transcript:ENSACAT00000001077.3;Parent=ENSACAT00000001077.3
chrUn_AAWZ02036 anoCar2_ensGene CDS 1674    1805    .   -   0   Parent=Transcript:ENSACAT00000001077.3
chrUn_AAWZ02036 anoCar2_ensGene three_prime_UTR 1677    1805    .   -   .   Parent=Transcript:ENSACAT00000001077.3
chrUn_AAWZ02036 anoCar2_ensGene exon    1677    1805    .   -   .   Parent=Transcript:ENSACAT00000001077.3
chrUn_AAWZ02036 anoCar2_ensGene gene    3899    26479   .   -   .   ID=ENSACAT00000000307.2;Name=ENSACAT00000000307.2
chrUn_AAWZ02036 anoCar2_ensGene mRNA    3899    26479   .   -   .   ID=Transcript:ENSACAT00000000307.2;Parent=ENSACAT00000000307.2
chrUn_AAWZ02036 anoCar2_ensGene CDS 3899    5131    .   -   0   Parent=Transcript:ENSACAT00000000307.2
chrUn_AAWZ02036 anoCar2_ensGene CDS 25336   25522   .   -   1   Parent=Transcript:ENSACAT00000000307.2
chrUn_AAWZ02036 anoCar2_ensGene CDS 25602   26479   .   -   0   Parent=Transcript:ENSACAT00000000307.2
chrUn_AAWZ02036 anoCar2_ensGene exon    3899    5131    .   -   .   Parent=Transcript:ENSACAT00000000307.2
chrUn_AAWZ02036 anoCar2_ensGene exon    25336   25522   .   -   .   Parent=Transcript:ENSACAT00000000307.2
chrUn_AAWZ02036 anoCar2_ensGene exon    25602   26479   .   -   .   Parent=Transcript:ENSACAT00000000307.2

(note the final three digits of each "chromosome" (actually contig) name is missing. This doesn't seem to effect sequences with smaller names.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.