wurmlab / flo Goto Github PK

View Code? Open in Web Editor NEW

89.0 89.0 28.0 4.56 MB

Same species annotation lift over pipeline.

Ruby 94.43% Shell 5.57%

bioinformatics gene-prediction gff liftover

flo's People

Contributors

Stargazers

Watchers

flo's Issues

cc command not found

Hi there, I'm attempting to run flo on a cluster and I kept getting an error:

For users on HPC clusters, I would suggest changing the install.sh to include make CC=gcc cairo=no errorcheck=no for genometools.

This ended up working for me.

flo does not work if run with '-qMask' blat option is included in the .yaml file

Hi,

I recently started liftover my gff with flo.
Since I concern if repeats in the genome affect the liftover result, I would like to try running flo with target genome with and without RepeatMask.
For the target genome with RepeatMask, I added 'qMask=mygenome.fasta.out' where mygenome.fasta.out is outfile of RepeatMasker.

The flo seems stuck at blat step because target genome is split and mygenome.fasta.out is not.
Could you give me any solutions?

Thanks,
Takashi

Rakefile error when attempting to run flo

Hi,

I was able to fix the previous error I posted about, but reached another in the final steps of processing on a sample run. I am unsure which database is missing, but I did receive the lifted/unlifted gff files which look reasonable so far. Any help finishing the last steps of the analysis would be greatly appreciated, log file copied below:

cat error.log
nohup: ignoring input
/global/u2/a/asession/SCRIPTS/flo/Rakefile:25: warning: Insecure world writable dir /global/common/genepool/usg/languages/R in PATH, mode 040777
mkdir run
cp ./Chr1L.example/Chr1L.v91.fa run/source.fa
cp ./Chr1L.example/Chr1L.v92.fa run/target.fa
faToTwoBit run/source.fa run/source.2bit
faToTwoBit run/target.fa run/target.2bit
twoBitInfo run/source.2bit stdout | sort -k2nr > run/source.sizes
twoBitInfo run/target.2bit stdout | sort -k2nr > run/target.sizes
faSplit sequence run/target.fa 1 run/chunk_
parallel --joblog run/joblog.faSplit -j 1 -a run/joblst.faSplit
43201 pieces of 43961 written
parallel --joblog run/joblog.blat -j 1 -a run/joblst.blat
Loaded 219879705 letters in 1 sequences
Searched 216002468 bases in 43201 sequences
parallel --joblog run/joblog.liftUp -j 1 -a run/joblst.liftUp
Got 43201 lifts in run/chunk_0.fa.lft
Lifting run/chunk_0.fa.psl
parallel --joblog run/joblog.axtChain -j 1 -a run/joblst.axtChain
693297 blocks after duplicate removal
Loaded 219802468 bases of NC_030724.1 from run/target.2bit
Loaded 219879705 bases of chr1L from run/source.2bit
chainPair NC_030724.1-chr1L
Main chaining step done in 2979 milliseconds
747068 blocks after duplicate removal
chainPair NC_030724.1+chr1L
Main chaining step done in 14412 milliseconds
parallel --joblog run/joblog.chainSort -j 1 -a run/joblst.chainSort
chainMergeSort run/*.chn.sorted | chainSplit run stdin -lump=1
mv run/000.chain run/combined.chn.sorted
chainNet run/combined.chn.sorted run/source.sizes run/target.sizes run/combined.chn.sorted.net /dev/null
Got 1 chroms in run/source.sizes, 1 in run/target.sizes
Finishing nets
writing run/combined.chn.sorted.net
writing /dev/null
netChainSubset run/combined.chn.sorted.net run/combined.chn.sorted run/liftover.chn
Processing chr1L
mkdir Chr1L-liftover-Chr1L.v92
liftOver -gff ./Chr1L.example/Chr1L.gff3 run/liftover.chn Chr1L-liftover-Chr1L.v92/lifted.gff3 Chr1L-liftover-Chr1L.v92/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
rake aborted!
LoadError: cannot load such file -- bio/db/gff

Error when blat begins

Hello!

I am experiencing an error with Flo that I was hoping you might be able to help me with. Rake aborts when blat begins, I believe. I searched for this issue and found it has happened for a few others, and noticed that one piece of advice was to update parallel. I did that through a conda environment and am now using GNU parallel 20201122, which I realize is still GNU parallel. Nonetheless, I am getting the following error:

mkdir run
cp /path/genomic.fa run/source.fa
cp /path/target.fa
faToTwoBit run/source.fa run/source.2bit
faToTwoBit run/target.fa run/target.2bit
twoBitInfo run/source.2bit stdout | sort -k2nr > run/source.sizes
twoBitInfo run/target.2bit stdout | sort -k2nr > run/target.sizes
faSplit sequence run/target.fa 20 run/chunk_
parallel --joblog run/joblog.faSplit -j 20 -a run/joblst.faSplit
29164 pieces of 29164 written
26770 pieces of 26770 written
23287 pieces of 23287 written
25387 pieces of 25387 written
25525 pieces of 25525 written
25448 pieces of 25448 written
25555 pieces of 25555 written
26474 pieces of 26474 written
25992 pieces of 25992 written
27046 pieces of 27046 written
26153 pieces of 26153 written
26728 pieces of 26728 written
26526 pieces of 26526 written
27621 pieces of 27621 written
26588 pieces of 26588 written
26266 pieces of 26266 written
26387 pieces of 26387 written
25897 pieces of 25897 written
27300 pieces of 27300 written
25692 pieces of 25692 written
parallel --joblog run/joblog.blat -j 20 -a run/joblst.blat
Loaded 2510587379 letters in 14543 sequences
Searched 116386128 bases in 23287 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 127615767 bases in 25555 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 127103007 bases in 25448 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 132315417 bases in 26474 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 132646279 bases in 26588 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 129744868 bases in 25992 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 134829202 bases in 27046 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 130548735 bases in 26153 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 126148018 bases in 25387 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 133429151 bases in 26728 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 145757029 bases in 29164 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 137556497 bases in 27621 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 131717084 bases in 26387 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 130661225 bases in 26266 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 128121430 bases in 25692 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 126631904 bases in 25525 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 132127474 bases in 26770 sequences
rake aborted!
Command failed with status (3): [parallel --joblog run/joblog.blat -j 20 -a...]
/home/user/bin/flo/Rakefile:161:in parallel' /home/user/bin/flo/Rakefile:107:in block in <top (required)>'
/home/user/bin/flo/Rakefile:37:in `block in <top (required)>'
Tasks: TOP => run/liftover.chn
(See full trace by running task with --trace)

My job log looks like this:
9 : 1606275293.957 28015.902 0 Seq Host Starttime JobRuntime 18 : 1606275293.989 45.953 0 17 : 1606275293.975 46.491 0 16 : 1606275293.972 872.307 0 5 : 1606275293.949 19148.322 0 12 : 1606275293.963 20725.125 0 7 : 1606275293.953 22003.258 0 2 : 1606275293.943 23220.606 0 11 : 1606275293.961 23502.038 0 8 : 1606275293.955 23821.197 0 3 : 1606275293.945 24157.292 0 13 : 1606275293.966 24351.732 0 1 : 1606275293.941 24630.828 0 14 : 1606275293.968 24792.986 0 4 : 1606275293.947 25343.822 0 6 : 1606275293.951 25997.266 0 19 : 1606275293.993 26497.434 0 15 : 1606275293.970 26656.599 0 20 : 1606275294.002 27909.093 0 9 : 1606275293.957 28015.902 0 10 : 1606275293.959 28659.995 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_11.fa rujoblog.blat
Send Receive Exitval Signal Command
0 0 9 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_06.fa run/chunk_06.fa.psl
0 0 9 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_10.fa run/chunk_10.fa.psl
0 0 9 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_13.fa run/chunk_13.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_19.fa run/chunk_19.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_14.fa run/chunk_14.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_15.fa run/chunk_15.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_18.fa run/chunk_18.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_04.fa run/chunk_04.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_16.fa run/chunk_16.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_05.fa run/chunk_05.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_01.fa run/chunk_01.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_07.fa run/chunk_07.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_02.fa run/chunk_02.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_17.fa run/chunk_17.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_08.fa run/chunk_08.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_00.fa run/chunk_00.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_09.fa run/chunk_09.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_03.fa run/chunk_03.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_11.fa run/chunk_11.fa.psl
89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_12.fa run/chunk_12.fa.psl

Do you have any thoughts as to what might be going on?

Thanks so much,
Zoe

running rake fails

my mistake, I swapped source and target, leading to serious issues. Only remark is that rake should also be installed as dependency.
Sorry for the garbage tickets!

mitochondrial genes

Hi,
This is more of a question than an issue. I just realized that none of the mitochondrial genes were lifted. In the unlifted file, many of them are annotated as deleted/partially deleted in new. I'm just wondering if this is inherent due to the way the algorithm works? Thank you.

gt gff3 error

Hi,

I ran flo, and got this error:

warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: Parent "maker-Contig53-exonerate_est2genome-gene-0.0" on line 1 in file "-" was not defined (via "ID=")
rake aborted!
Command failed with status (1): [/data/apps/flo/gff_recover.rb run/Medicago...]
/data/apps/flo/Rakefile:60:in block (2 levels) in <top (required)>' /data/apps/flo/Rakefile:40:in each'
/data/apps/flo/Rakefile:40:in `block in <top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)

However, I can see the following output: lifted.gff3 and an unlifted.gff3 (both are non-empty). There is also an empty lifted_cleaned.gff. Can you please tell me what's going on?

Happy to send the .gff3 files if needed.

Thanks!

Not processing gff

Hi!

The lift over chaining works like a charm, however it crashes when trying to process_gff. The issue seems to come from line 62 in the rakefile: require 'bio/db/gff'
I don't know if I need to install something in order to make this work.

Error shows:
liftOver -gff dwil_flybase.gff3 run/liftover.chn dwil_flybase-liftover-dwil_JR/lifted.gff3 dwil_flybase-liftover-dwil_JR/unlifted.gff3 Reading liftover chains Mapping coordinates WARNING: -gff is not recommended. Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>' rake aborted! LoadError: cannot load such file -- bio/db/gff /home/pgonzale/Programs/flo/Rakefile:62:in process_gff'
/home/pgonzale/Programs/flo/Rakefile:234:in block (2 levels) in <top (required)>' /home/pgonzale/Programs/flo/Rakefile:223:in each'
/home/pgonzale/Programs/flo/Rakefile:223:in block in <top (required)>' Tasks: TOP => default (See full trace by running task with --trace)

Greetings!

Should check if lifted.gff is empty and quit gracefully

undefined method `last' for nil:NilClass (NoMethodError)

Hi, there.
I have a merged stringtie files and I want to use the gff_longest_transcripts.rb scripts to obtain the longest transcripts for each genes. Bu it do not work well. The infomation about the error and also the stringtie files are attached below. Would you mind giving some helps? Thanks so much!

Sincerely
Yizhong Huang

Present as a set of scripts

In addition to #22, provide:

Change Rakefile to a script to create chain file that reads flo_opts.yaml as before. If config file does not exist, run on tmp/source.fa and tmp/target.fa using current BLAT options and using as many cpus as available.
Instead of trying to automatically lift over the given file, provide the liftOver command that is currently run in the README file.
Instead of automatically running gff_recover.rb, suggest its use in a section titled 'Lifting over GFF files'. Also, mention gff_remove and gff_longest_transcript in the same section.
In README, provide steps for creating synteny maps for use with ALLMAPS.

running rake

Dear
Working on a RHEL7 24cores
I installed all and the ext/app's do work
I edited opts.yaml (including creating .ooc
'rake' does not trigger anything!
what did I miss?
Thanks for help

drwxr-xr-x. 2 splaisan bits 4.0K Jun 12 11:09 data
drwxr-xr-x. 5 splaisan bits 65 Jun 12 10:46 ext
drwxr-xr-x. 8 splaisan bits 4.0K Jun 12 10:30 .git
-rw-r--r--. 1 splaisan bits 1.5K Jun 12 10:30 opts_example.yaml
-rw-r--r--. 1 splaisan bits 1.6K Jun 12 11:10 opts.yaml
-rw-r--r--. 1 splaisan bits 7.9K Jun 12 10:30 Rakefile
-rw-r--r--. 1 splaisan bits 6.0K Jun 12 10:30 README.md
drwxr-xr-x. 2 splaisan bits 4.0K Jun 12 10:30 scripts

splaisan@NUC-SRV-01:/opt/biotools/flo$ ll data/
total 2.3G
drwxr-xr-x. 2 splaisan bits 4.0K Jun 12 11:09 .
drwxr-xr-x. 6 splaisan bits 4.0K Jun 12 11:14 ..
-rw-r--r--. 1 splaisan bits 1.9M Jun 12 11:09 11.ooc
-rw-r--r--. 1 splaisan bits 1.1G Jun 12 11:05 CA0.2_contigs.fasta
-rw-r--r--. 1 splaisan bits 77M Jun 12 11:02 CA0.2_gene_models_noseq.gff3
-rw-r--r--. 1 splaisan bits 1.2G Jun 12 11:05 hybrid_CA0.2.fasta

splaisan@NUC-SRV-01:/opt/biotools/flo$ cat opts.yaml

# Location of binaries expected by flo.
#
# These will be added to PATH before the pipeline is run. The paths below
# are created by `scripts/install.sh`.Comment out or edit the paths based
# on how you installed UCSC-Kent toolkit, GNU Parallel and genometools.
:add_to_path:
  - 'ext/kent/bin'
  - 'ext/parallel-20150722/src'
  - 'ext/genometools-1.5.6/bin'

# Location of source and target assemblies.
#
# If migrating annotations from assembly A to assembly B, A is the source
# and B is the target. Source and target assemblies are specified as path
# to the corresponding FASTA files (must end in .fa).
:source_fa: 'data/CA0.2_contigs.fasta'
:target_fa: 'data/hybrid_CA0.2.fasta'

# Number of processes that will be used to parallelise flo. Ideally, this
# will be the number of CPU cores you have.
:processes: '24'

# Parameters to run BLAT with.
#
# In addition to the options specified here, -noHead option is set by flo.
# -noHead simply causes the output BLAT output files to not have a header.
# It doesn't impact accuracy of results.
#
# Empty string is equivalent to:
#
#   -t=dna -q=dna -tileSize=11 -stepSize=11 -oneOff=0 -minMatch=2
#   -minScore=30 -minIdentity=90 -maxGap=2 -maxIntron=75000
#
# The default string defined below is a suitable trade-off between running
# time and sensitivity.
#:blat_opts: '-fastMap -tileSize=12 -minIdentity=98'
:blat_opts: 'blat -noHead -fastMap -ooc=data/11.ooc -minScore=100 -minIdentity=98'

# Path to the GFF files containing annotations on the source assembly that
# will be lifted to the target assembly.
:lift:
  - 'data/CA0.2_gene_models_noseq.gff3'

Redefine sh to exit more gracefully if command not found.

Results and discussion

Hi,
How did you calculate the below percentage?

For an ant genome (~350 Mb) we saw 90% annotations map identically to the new assembly (unpublished result).

Is it possible to get more statistics out of Flo?

Thank you in advance.

Michal

NoMethodError: undefined method `last' for nil:NilClass

Hi,
I receive the error above during annotation liftover with flo, and would appreciate some help. The output directory contains completed lifted.gff3 and unlifted.gff3 before rake aborts.

My system:
Ubuntu 12.04 x64
Ruby 2.3.1
Bioruby 1.5.1
flo and its dependencies have been installed as described.

Thanks

Log:

...
chainMergeSort run/*.chn.sorted | chainSplit run stdin -lump=1
mv run/000.chain run/combined.chn.sorted
chainNet run/combined.chn.sorted run/source.sizes run/target.sizes run/combined.chn.sorted.net /dev/null
Got 14 chroms in run/source.sizes, 14 in run/target.sizes
Finishing nets
writing run/combined.chn.sorted.net
writing /dev/null
netChainSubset run/combined.chn.sorted.net run/combined.chn.sorted run/liftover.chn
Processing Pf3D7_14_v3
Processing Pf3D7_13_v3
Processing Pf3D7_12_v3
Processing Pf3D7_11_v3
Processing Pf3D7_10_v3
Processing Pf3D7_09_v3
Processing Pf3D7_08_v3
Processing Pf3D7_07_v3
Processing Pf3D7_06_v3
Processing Pf3D7_05_v3
Processing Pf3D7_04_v3
Processing Pf3D7_03_v3
Processing Pf3D7_02_v3
Processing Pf3D7_01_v3
mkdir PlasmoDB-29_Pfalciparum3D7_GFF_CHROMOSOME-liftover-named_assembly_pacbio2
liftOver -gff plasmodium/PlasmoDB-29_Pfalciparum3D7_GFF_CHROMOSOME.gff3 run/liftover.chn PlasmoDB-29_Pfalciparum3D7_GFF_CHROMOSOME-liftover-named_assembly_pacbio2/lifted.gff3 PlasmoDB-29_Pfalciparum3D7_GFF_CHROMOSOME-liftover-named_assembly_pacbio2/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
rake aborted!
NoMethodError: undefined method last' for nil:NilClass /home/muol/Documents/Software/flo/Rakefile:72:inblock in process_gff'
/home/muol/Documents/Software/flo/Rakefile:69:in each' /home/muol/Documents/Software/flo/Rakefile:69:ingroup_by'
/home/muol/Documents/Software/flo/Rakefile:69:in process_gff' /home/muol/Documents/Software/flo/Rakefile:223:inblock (2 levels) in <top (required)>'
/home/muol/Documents/Software/flo/Rakefile:212:in each' /home/muol/Documents/Software/flo/Rakefile:212:inblock in <top (required)>'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/task.rb:240:in block in execute' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/task.rb:235:ineach'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/task.rb:235:in execute' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/task.rb:179:inblock in invoke_with_call_chain'
/usr/local/lib/ruby/2.3.0/monitor.rb:214:in mon_synchronize' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/task.rb:172:ininvoke_with_call_chain'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/task.rb:165:in invoke' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:150:ininvoke_task'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:106:in block (2 levels) in top_level' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:106:ineach'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:106:in block in top_level' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:115:inrun_with_threads'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:100:in top_level' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:78:inblock in run'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:176:in standard_exception_handling' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:75:inrun'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/bin/rake:33:in <top (required)>' /usr/local/bin/rake:23:inload'
/usr/local/bin/rake:23:in `
'
Tasks: TOP => default

Expecting number line ERROR

Hello, I want to thank you for create this amazing tool and helping people with their flo issues.
I am having a problem with a flo run and I am not able to solve it. I think is related with the gff file but I dont know what is the problem (I downloaded it directly from NCBI).
These are the files I am using to run flo flo_files.zip

"new" is the source
"old" is the target

Here is the error:

liftOver -gff /home/jose/flo_haloferax/new.gff run/liftover.chn run/new/lifted.gff3 run/new/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
Expecting number line 12 of /home/jose/flo_haloferax/new.gff
rake aborted!
Command failed with status (255): [liftOver -gff /home/jose/flo_haloferax/new...]
/home/jose/flo/Rakefile:45:in block (2 levels) in <top (required)>' /home/jose/flo/Rakefile:40:in each'
/home/jose/flo/Rakefile:40:in block in <top (required)>' /var/lib/gems/2.7.0/gems/rake-13.0.6/exe/rake:27:in <top (required)>'
Tasks: TOP => default

I tryed to preprocess the gff file using gt gff3 -tidy -sort -addids -retainids and also to delete the gene feature with gff_remove_feats.rb gene but the same error appears.

Thank you so much in advance!

gff_recover error

I'm getting an error that seems to have to do with my input gff. I have tried with both gff_remove_feats.rb and gff_longest_transcripts.rb

parallel --joblog run/joblog.chainSort -j 15 -a run/joblst.chainSort
chainMergeSort run/*.chn.sorted | chainSplit run stdin -lump=1
mv run/000.chain run/combined.chn.sorted
chainNet run/combined.chn.sorted run/source.sizes run/target.sizes run/combined.chn.sorted.net /dev/null
Got 15 chroms in run/source.sizes, 15 in run/target.sizes
Finishing nets
writing run/combined.chn.sorted.net
writing /dev/null
netChainSubset run/combined.chn.sorted.net run/combined.chn.sorted run/liftover.chn
Processing 1
Processing 5
Processing 2
Processing 3
Processing 11
Processing 6
Processing 7
Processing 8
Processing 9
Processing 4
Processing 10
Processing 14
Processing 12
Processing 13
Processing MT
mkdir run/h99_longest_transcript
liftOver -gff /scratch/mblab/chasem/liftOver/flo_crypto/h99/h99_longest_transcript.gff run/liftover.chn run/h99_longest_transcript/lifted.gff3 run/h99_longest_transcript/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/scratch/mblab/chasem/liftOver/flo/gff_recover.rb run/h99_longest_transcript/lifted.gff3 2> run/h99_longest_transcript/lifted_cleanup.log | gt gff3 -tidy -sort -addids -retainids - > run/h99_longest_transcript/lifted_cleaned.gff 2>> run/h99_longest_transcript/lifted_cleanup.log
rake aborted!
Command failed with status (1): [/scratch/mblab/chasem/liftOver/flo/gff_rec...]
/scratch/mblab/chasem/liftOver/flo/Rakefile:60:in `block (2 levels) in <top (required)>'
/scratch/mblab/chasem/liftOver/flo/Rakefile:40:in `each'
/scratch/mblab/chasem/liftOver/flo/Rakefile:40:in `block in <top (required)>'
Tasks: TOP => default

I'm not quite sure where to start debugging. Looking in the Rakefile and at gff_recover didn't give me any good ideas. Any suggestions?

Workflow manager?

Any plans to wrap flo in a workflow manager, e.g. snakemake or nextflow? This could help it run on many different platforms.

The reason I ask is, I discovered flo after writing my own nextflow pipeline to do something similar, but it doesn't fully work, so I might try to wrap flo in a workflow manager instead. If you are already working on doing that maybe we can join forces?
https://github.com/photocyte/doSameSpeciesLiftOver_nextflow

rake aborted

Hi,
I ran into the following problem:

…
Processing mito11
mkdir run/test_v2
liftOver -gff /work/team/banana/assembly/bam2consensus/flo/test_v2.gff3 run/liftover.chn run/test_v2/lifted.gff3 run/test_v2/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/lustre/work-lustre/team/apps/flo/gff_recover.rb run/test_v2/lifted.gff3 2> unprocessed.gff | gt gff3 -tidy -sort -addids -retainids - > run/test_v2/lifted_cleaned.gff
warning: GFF3 file "-" is empty
warning: GFF3 file "-" is empty
warning: GFF3 file "-" is empty
ln -s /work/team/banana/assembly/bam2consensus/flo/test_v2.gff3 run/test_v2/input.gff
/lustre/work-lustre/team/apps/flo/gff_compare.rb cds run/source.fa run/target.fa run/test_v2/input.gff run/test_v2/lifted_cleaned.gff > run/test_v2/unmapped.txt
gt gff3 -sort -retainids run/test_v2/input.gff | gt extractfeat -type CDS -join -retainids -seqfile run/source.fa -matchdescstart - > run/test_v2/input.cds.fa
gt gff3: error: illegal GFF version pragma in line 46728 of file "run/test_v2/input.gff": ##gff-version 3 (merge multiple GFF3 files with `gt gff3 -sort` and do not concatenate them manually)
gt extractfeat: error: GFF3 file "-" is empty
/lustre/work-lustre/team/miniconda2/envs/flo/lib/ruby/2.2.0/rake/file_utils.rb:66:in `block in create_shell_runner': Command failed with status (1): [gt gff3 -sort -retainids run/musa_acuminat...] (RuntimeError)
	from /lustre/work-lustre/team/miniconda2/envs/flo/lib/ruby/2.2.0/rake/file_utils.rb:57:in `call'
	from /lustre/work-lustre/team/miniconda2/envs/flo/lib/ruby/2.2.0/rake/file_utils.rb:57:in `sh'
	from /lustre/work-lustre/team/miniconda2/envs/flo/lib/ruby/2.2.0/rake/file_utils_ext.rb:37:in `sh'
	from /lustre/work-lustre/team/apps/flo/gff_compare.rb:25:in `extract_cds'
	from /lustre/work-lustre/team/apps/flo/gff_compare.rb:46:in `<main>'
rake aborted!
Command failed with status (1): [/lustre/work-lustre/team/apps/f...]
/work/team/apps/flo/Rakefile:56:in `block (2 levels) in <top (required)>'
/work/team/apps/flo/Rakefile:40:in `each'
/work/team/apps/flo/Rakefile:40:in `block in <top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)

How could I fix the gff3 file?

Best wishes,

Michal

flo result input to EVM?

the flo result gff can not feed into EVM?

has this been tested?

demonstration that is works is to carry over annotations from genome X to the same genome? One would expect 100% identity if this works correctly...

Syntax issue with rake

Hello!

I'm having trouble getting flo started. I was hoping you might be able to help me out. To set up flo, I installed all dependencies using apt-get or conda (conda create -n flo -c mvdbeek -c conda-forge parallel genometools ucsc_tools). I set up my data as requested, and am now getting the following error:

rake aborted!
SyntaxError: /path/to/dir/opts_example.yaml:16: syntax error, unexpected ':', expecting end-of-input

Often I encounter errors because my directory has spaces in it (can't change this), but I don't think this is the problem, here. In my opts file, the line throwing the error is as such:

:source_fa: '/path/to/genome/genome.fa'

Do you have any idea what might be going on here? Some sort of version error, perhaps?

Thanks,
Zoe

Issue a release & publish on Bioconda?

Hi there,

Big fan of flo. Has worked really well for my research. But it would be even better if flo were on bioconda and had its dependencies explicitly linked, namely:

conda install -c bioconda genometools-genometools
conda install -c conda-forge parallel
conda install -c bioconda -y ucsc-liftup ucsc-fasplit ucsc-liftover ucsc-axtchain ucsc-chainnet ucsc-blat ucsc-chainsort ucsc-fatotwobit ucsc-twobitinfo ucsc-chainsplit ucsc-chainmergesort ucsc-netchainsubset

A first step towards that, I believe would be issuing a Release here on this Github Repo. Then the Bioconda recipe could point to that release.

All the best,
-Tim

flo failed on Large genome

flo failed on a 14Gb genome, with "corrupted double-linked list (not small)" error. it runs normally with genome smaller than 4Gb in size. The setting is on an aws m5.16xlarge EC2 instance.

rake -f /home/ubuntu/flo/Rakefile &
mkdir run
cp /home/ubuntu/s.fa run/source.fa
cp /home/ubuntu/t.fa run/target.fa
faToTwoBit run/source.fa run/source.2bit
faToTwoBit run/target.fa run/target.2bit
twoBitInfo run/source.2bit stdout | sort -k2nr > run/source.sizes
twoBitInfo run/target.2bit stdout | sort -k2nr > run/target.sizes
faSplit sequence run/target.fa 21 run/chunk_
parallel --joblog run/joblog.faSplit -j 21 -a run/joblst.faSplit
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; and it won't cost you a cent.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence the citation notice: run 'parallel --bibtex'.

123322 pieces of 123923 written
133957 pieces of 134763 written
150983 pieces of 152743 written
156478 pieces of 157558 written
98419 pieces of 99073 written
99082 pieces of 99724 written
103154 pieces of 103663 written
113555 pieces of 113991 written
118767 pieces of 119728 written
123551 pieces of 124526 written
141741 pieces of 142672 written
144495 pieces of 146237 written
130388 pieces of 131310 written
147572 pieces of 148896 written
138549 pieces of 140111 written
141907 pieces of 142961 written
149246 pieces of 150844 written
149613 pieces of 150822 written
197774 pieces of 198899 written
160747 pieces of 162550 written
167525 pieces of 170389 written
parallel --joblog run/joblog.blat -j 21 -a run/joblst.blat
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; and it won't cost you a cent.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence the citation notice: run 'parallel --bibtex'.

corrupted double-linked list (not small)
free(): invalid next size (normal)
free(): invalid next size (normal)
double free or corruption (!prev)
double free or corruption (!prev)
malloc(): smallbin double linked list corrupted
free(): invalid next size (normal)
malloc(): memory corruption
free(): invalid next size (normal)
double free or corruption (!prev)
free(): invalid next size (normal)
double free or corruption (!prev)
double free or corruption (!prev)
rake aborted!
Command failed with status (21): [parallel --joblog run/joblog.blat -j 21 -a...]
/home/ubuntu/flo/Rakefile:153:in parallel' /home/ubuntu/flo/Rakefile:99:in block in <top (required)>'
/home/ubuntu/flo/Rakefile:37:in `block in <top (required)>'
Tasks: TOP => run/liftover.chn
(See full trace by running task with --trace)

[1]+ Exit 1 rake -f /home/ubuntu/flo/Rakefile

`foreach': no implicit conversion of nil into String (TypeError

Hi,
I ran in the following problem:

/apps/flo/gff_remove_feats.rb annotation_v2.gff3 > annotation_v2_cleaned.gff3 
apps/flo/gff_remove_feats.rb:15:in `foreach': no implicit conversion of nil into String (TypeError)
	from /apps/flo/gff_remove_feats.rb:15:in `<main>'

Did I miss anything?

Thank you in advance.

Best wishes,

Michal

Rakefile issue

Hey, this seems similar to another recent issue (involving the bio gem not being installed), but I am running into this issue now, and I am unsure what the error is indicating - if it is an issue with my gff or with ruby. Here is the trace, followed by the first three lines of my .gff.

rake --trace [ 3:01PM]
** Invoke default (first_time)
** Execute default
** Invoke run/liftover.chn (first_time, not_needed)
mkdir lepdec_OGSv1.0_ONLY_GENES.gff-liftover-Ldec_redundans_on_alpaths_contigs_genome
liftOver -gff lepdec_OGSv1.0_ONLY_GENES.gff run/liftover.chn lepdec_OGSv1.0_ONLY_GENES.gff-liftover-Ldec_redundans_on_alpaths_contigs_genome/lifted.gff3 lepdec_OGSv1.0_ONLY_GENES.gff-liftover-Ldec_redundans_on_alpaths_contigs_genome/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/var/lib/gems/2.2.0/gems/bio-1.5.1/lib/bio/db/gff.rb:921: warning: regexp match /.../n against to UTF-8 string
rake aborted!
undefined method last' for nil:NilClass /home/beetle/Documents/CPB/reference_genomes/flo/Rakefile:72:in block in process_gff'
/home/beetle/Documents/CPB/reference_genomes/flo/Rakefile:69:in each' /home/beetle/Documents/CPB/reference_genomes/flo/Rakefile:69:in group_by'
/home/beetle/Documents/CPB/reference_genomes/flo/Rakefile:69:in process_gff' /home/beetle/Documents/CPB/reference_genomes/flo/Rakefile:223:in block (2 levels) in <top (required)>'
/home/beetle/Documents/CPB/reference_genomes/flo/Rakefile:212:in each' /home/beetle/Documents/CPB/reference_genomes/flo/Rakefile:212:in block in <top (required)>'
/usr/lib/ruby/vendor_ruby/rake/task.rb:246:in call' /usr/lib/ruby/vendor_ruby/rake/task.rb:246:in block in execute'
/usr/lib/ruby/vendor_ruby/rake/task.rb:241:in each' /usr/lib/ruby/vendor_ruby/rake/task.rb:241:in execute'
/usr/lib/ruby/vendor_ruby/rake/task.rb:184:in block in invoke_with_call_chain' /usr/lib/ruby/2.2.0/monitor.rb:211:in mon_synchronize'
/usr/lib/ruby/vendor_ruby/rake/task.rb:177:in invoke_with_call_chain' /usr/lib/ruby/vendor_ruby/rake/task.rb:170:in invoke'
/usr/lib/ruby/vendor_ruby/rake/application.rb:143:in invoke_task' /usr/lib/ruby/vendor_ruby/rake/application.rb:101:in block (2 levels) in top_level'
/usr/lib/ruby/vendor_ruby/rake/application.rb:101:in each' /usr/lib/ruby/vendor_ruby/rake/application.rb:101:in block in top_level'
/usr/lib/ruby/vendor_ruby/rake/application.rb:110:in run_with_threads' /usr/lib/ruby/vendor_ruby/rake/application.rb:95:in top_level'
/usr/lib/ruby/vendor_ruby/rake/application.rb:73:in block in run' /usr/lib/ruby/vendor_ruby/rake/application.rb:160:in standard_exception_handling'
/usr/lib/ruby/vendor_ruby/rake/application.rb:70:in run' /usr/bin/rake:27:in

'
Tasks: TOP => default

GFF:
Scaffold1 OGSv1.0 gene 12481 16948 . - . ID=LDEC000001;Name=LDEC000001;Dbxref=I5KNAL:LDEC000001;method=Maker
Scaffold1 OGSv1.0 gene 19920 23242 . - . ID=LDEC000002;Name=LDEC000002;Dbxref=I5KNAL:LDEC000002;method=Maker
Scaffold1 OGSv1.0 gene 26074 37602 . + . ID=LDEC000003;Name=LDEC000003;Dbxref=I5KNAL:LDEC000003;method=Maker

Thanks!
Kristian

flo compilation error

Hi,

I've try to install flo but when I launch the install using install.sh I've a compilation error in return :

genometools-1.5.6/www/genometools.org/htdocs/trackselectors.html genometools-1.5.6/www/github/ genometools-1.5.6/www/github/assets/ genometools-1.5.6/www/github/assets/overview.png /bin/sh: 1: Syntax error: "(" unexpected /bin/sh: 1: Syntax error: "(" unexpected /bin/sh: 1: Syntax error: "(" unexpected [compile sqlite3.o] /bin/sh: 2: Syntax error: "(" unexpected make: *** [Makefile:741: obj/src/external/sqlite-3.8.7.1/sqlite3.o] Error 2

I've you got any idea to fix this issue ?

Thanks in advance

Flo failed at the GenomTools section

Hello,

I ran flo on my data to convert the gff coordinates from one assembly version to the other. I have the files, lifted.gff3 and unlifted.gff3. The lifted.gff3 looks fine in terms of the size comparison with the original gff3.

However, at the end, I get the following error:

liftOver -gff GCF_000698965.1_ASM69896v1_genomic.flo.gff run/liftover.chn run/GCF_000698965.1_ASM69896v1_genomic.flo/lifted.gff3 run/GCF_000698965.1_ASM69896v1_genomic.flo/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/crex/proj/uppstore2017180/private/homap/ostrich_Z_diversity/src/flo/gff_recover.rb run/GCF_000698965.1_ASM69896v1_genomic.flo/lifted.gff3 2> unprocessed.gff | gt gff3 -tidy -sort -addids -retainids - > run/GCF_000698965.1_ASM69896v1_genomic.flo/lifted_cleaned.gff
warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: line 1 in file "-" does not contain 9 tab (\t) separated fields
rake aborted!
Command failed with status (1): [/crex/proj/uppstore2017180/private/homap/o...]
/crex/proj/uppstore2017180/private/homap/ostrich_Z_diversity/src/flo/Rakefile:60:in `block (2 levels) in <top (required)>'
/crex/proj/uppstore2017180/private/homap/ostrich_Z_diversity/src/flo/Rakefile:40:in `each'
/crex/proj/uppstore2017180/private/homap/ostrich_Z_diversity/src/flo/Rakefile:40:in `block in <top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)

I was wondering how I could resolve this issue?

install.sh fails with a make error for libgenometoools.so

One should update the makefile so that the newest version of genometools is used in the building process. Right now version 1.5.6 can lead to the following make error :

flo_species_name/ext/genometools-1.5.6/src/mgth/metagenomethreader.h:224: multiple definition of `gt_cstr_nofree_ulp_hashtype'; obj/src/mgth/metagenomethreader.o:/media/nils/nils_ssd_01/flo_species_name/ext/genometools-1.5.6/src/mgth/metagenomethreader.h:224: first defined here
/usr/bin/ld: obj/src/mgth/mg_compute_gene_prediction.o
collect2: error: ld returned 1 exit status
make: *** [Makefile:587: lib/libgenometools.so] Error 1

To prevent this just adjust the make file section referring to the installation of genometools like this :

# Genometools
cd ext
wget -c https://github.com/genometools/genometools/archive/refs/tags/v1.6.2.tar.gz -O v1.6.2.tar.gz
tar xvf v1.6.2.tar.gz
rm v1.6.2.tar.gz
cd genometools-1.6.2
make cairo=no errorcheck=no

Encountering error running flo at early 'faSplit' step:

Hi:
I ran flo to liftOver from one nematode genome assembly to another (Pristionchus pacificus)
Command failed with status (127): [faSplit sequence run/target.fa 2 run/chunk...]

Here's the full command:

>$ rake -f Rakefile
mkdir run
cp /scratch/rtraborn/pp_liftOver/pp_hybrid1/Pristionchus_Hybrid_assembly.fa run/source.fa
cp /scratch/rtraborn/pp_liftOver/pp_hybrid2/pacificus_Hybrid2.fa run/target.fa
faToTwoBit run/source.fa run/source.2bit
/usr/local/share/gems/gems/rake-12.3.0/lib/rake/file_utils.rb:54: warning: Insecure world writable dir /home/rtraborn/genome_analysis/paml4.9d/bin in PATH, mode 040777
faToTwoBit run/target.fa run/target.2bit
twoBitInfo run/source.2bit stdout | sort -k2nr > run/source.sizes
twoBitInfo run/target.2bit stdout | sort -k2nr > run/target.sizes
faSplit sequence run/target.fa 2 run/chunk_
rake aborted!
Command failed with status (127): [faSplit sequence run/target.fa 2 run/chunk...]
/home/rtraborn/genome_analysis/flo/Rakefile:79:in `block in <top (required)>'
/home/rtraborn/genome_analysis/flo/Rakefile:37:in `block in <top (required)>'
/usr/local/share/gems/gems/rake-12.3.0/exe/rake:27:in `<top (required)>'
Tasks: TOP => run/liftover.chn
(See full trace by running task with --trace)

Concerning the opts file: I set blat_opts: as follows: '-tileSize=12 -minIdentity=98' and set processes: to '2'.

Any idea where this is going wrong? I'm certain the assemblies and gff file are correctly formatted (the latter being 'cleaned' as described.

error at end of process

Flo works but at the end dies due to some CDS extraction issue.
Any idea what I should do to fix this?

...
ln -s /data/nanopore/2741_MinION/flo_results/R64_genomic_cleaned.gff run/../R64_genomic_cleaned/input.gff
/data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/gff_compare.rb cds run/source.fa run/target.fa run/../R64_genomic_cleaned/input.gff run/../R64_genomic_cleaned/lifted_cleaned.gff > run/../R64_genomic_cleaned/unmapped.txt
gt extractfeat -type CDS -join -retainids -seqfile run/source.fa -matchdescstart run/../R64_genomic_cleaned/input.gff > run/../R64_genomic_cleaned/input.cds.fa
gt extractfeat: error: the file run/../R64_genomic_cleaned/input.gff is not sorted (example: line 5 and 6)
/usr/lib/ruby/vendor_ruby/rake/file_utils.rb:66:in `block in create_shell_runner': Command failed with status (1): [gt extractfeat -type CDS -join -retainids ...] (RuntimeError)
        from /usr/lib/ruby/vendor_ruby/rake/file_utils.rb:57:in `sh'
        from /usr/lib/ruby/vendor_ruby/rake/file_utils_ext.rb:37:in `sh'
        from /data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/gff_compare.rb:25:in `extract_cds'
        from /data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/gff_compare.rb:45:in `<main>'
rake aborted!
Command failed with status (1): [/data/nanopore/2741_MinION/flo_results/flo...]
/data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/Rakefile:56:in `block (2 levels) in <top (required)>'
/data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/Rakefile:40:in `each'
/data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/Rakefile:40:in `block in <top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)

Naming & moving data

Data:

no need to copy input data to run directory, just softlink
"tmp" is a better name for run
flo_opts could include example names. I like tmp/old.fa, tmp/new.fa, and tmp/old.gff

liftover gtf file

Hi,

Sorry if this has been discussed before but I can't find it. Is it possible to liftover a gtf instead of a gff file?

Thank you.

collision between gene names using a merge of two genome GFF files

I work on a hybrid yeast made of two known referenced yeasts.
I constructed the artificial assembly by merging the fasta files and did the same with the two GFF files from NCBI.
When I run flo with that gff and a denovo assembly of the hybrid genome, it dies at the GFF cleaning step producing errors because the two stains have genes with different geneIDs gbIDs but identical canonical gene names.

I thought I was clever by replacing th emerged GFF by the two GFF in the yaml but it fails also (but not due to name)

Is the best solution adding both GFF to the yaml and merging the cleaned results back after lifting?

Thanks in advance,
Stephane

PLEASE DELETE this issue:

I did not run the gff cleanup rb script
I did not sort the gff
after doing both, it ran without a glinch (I now have to review the results but no error messages)
Thanks again for the great app

Trying to map over gff results error

I ran flo on a whole genome and found that the lifted.gff had features like this referring to a parent PKINGS_0.1_G055355, but the parent PKINGS_0.1_G055355 was not in the file

Scaffold_87 maker   mRNA    5628861 5664273 .   -   .   ID=PKINGS_0.1_T055355-R4;Parent=PKINGS_0.1_G055355;Name=PKINGS_0.1_T055355-R4;Alias=maker-Scaffold517-augustus-gene-2.2-mRNA-1;Dbxref=InterPro:IPR000157,InterPro:IPR007632,Pfam:PF01582,Pfam:PF04547;Note=Similar to ANO4: Anoctamin-4 (Homo sapi
ens);Ontology_term=GO:0005515,GO:0007165;_AED=0.30;_QI=451%7C0.83%7C0.83%7C1%7C0.96%7C0.93%7C31%7C1288%7C1182;_eAED=0.30

I took Scaffold_87 from target.fa and the scaffold that PKINGS_0.1_G055355 originally came from and ran a smaller flo alignment between these two sequences. Interestingly enough, the new lifted.gff actually did contain that parent PKINGS_0.1_G055355, but it seems like gff_recover.rb run/annotations/lifted.gff3 actually removed the gene line?

Full output

...
chainMergeSort run/*.chn.sorted | chainSplit run stdin -lump=1
mv run/000.chain run/combined.chn.sorted
chainNet run/combined.chn.sorted run/source.sizes run/target.sizes run/combined.chn.sorted.net /dev/null
Got 1 chroms in run/source.sizes, 1 in run/target.sizes
Finishing nets
writing run/combined.chn.sorted.net
writing /dev/null
netChainSubset run/combined.chn.sorted.net run/combined.chn.sorted run/liftover.chn
Processing Scaffold517
mkdir run/annotations
liftOver -gff annotations.gff run/liftover.chn run/annotations/lifted.gff3 run/annotations/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/home/me/flo/gff_recover.rb run/annotations/lifted.gff3 2> unprocessed.gff | gt gff3 -tidy -sort -addids -retainids - > run/annotations/lifted_cleaned.gff
warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: Parent "PKINGS_0.1_G055355" on line 1 in file "-" was not defined (via "ID=")
rake aborted!

Here is an example

flo.tar.gz

GFF format issue with CoGe input

HI
I am trying to liftover a GFF genome annotation but fail to get the GFF accepted by the software.

Here is a minimal sample that triggers the crash

1       CoGe    transcript      8522    12619   .       +       .       transcript_id "C00s001g005000.mRNA1"; gene_id "C00s001g005000"; gene_name "C00s001g005000";

I run the following command using the latest docker and all inputs are present

docker run   --rm   --user "$(id -u):$(id -g)"  \
-v $PWD:/workdir   \
informationsea/transanno:latest transanno minimap2chain  \
 /workdir/${pfxq}_to_${pfxt}.paf  \
--output /workdir/${pfxq}_to_${pfxt}.chain

I get

nom error: Error(("transcript_id \"C00s001g005000.mRNA1\"; gene_id \"C00s001g005000\"; gene_name \"C00s001g005000\";\n", CrLf))
thread 'main' panicked at 'Operation Error: LiftOverError { inner: GeneParseError { inner: GeneParseError { inner: 

Parse error }

Parse error at line: 1 }

Failed to parse gene annotation }', transanno/src/main.rs:30:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I could not figure how to add RUST_BACKTRACE=1 to a docker call

Thanks for your help

expecting number line 38 of gff3

Hi,

I am trying to liftover the ref_GRCh37.p13_top_level.gff3 to a denovo-assembled genome. but got the following error:
mkdir ref_GRCh37.p13_top_level-liftover-A673_combined_fastq.1
liftOver -gff /gpfs0/home/rslssnck/cxt050/hg19/ref_GRCh37.p13_top_level.gff3 run/liftover.chn ref_GRCh37.p13_top_level-liftover-A673_combined_fastq.1/lifted.gff3 ref_GRCh37.p13_top_level-liftover-A673_combined_fastq.1/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
Expecting number line 38 of /gpfs0/home/rslssnck/cxt050/hg19/ref_GRCh37.p13_top_level.gff3
rake aborted!
Command failed with status (255): [liftOver -gff /gpfs0/home/rslssnck/cxt0...]
/gpfs0/home/rslssnck/cxt050/opt/flo/Rakefile:232:in block (2 levels) in <top (required)>' /gpfs0/home/rslssnck/cxt050/opt/flo/Rakefile:223:in each'
/gpfs0/home/rslssnck/cxt050/opt/flo/Rakefile:223:in block in <top (required)>' /gpfs0/home/rslssnck/cxt050/.rvm/gems/ruby-2.4.0@global/gems/rake-12.0.0/exe/rake:27:in <top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)

I'm not sure what to do with this error. Your help is greatly appreciated. Thanks!

faToTwoBit: cannot execute binary file

I am trying to lift over a viral genome from one version to another. I am getting the following error when doing that. I checked the permissions of the faToTwoBit file and that looks fine. I am running this on a Mac

Error:
mkdir run
cp /Users/divya/NC_009333.fa run/source.fa
cp /Users/divya/GQ994935.fa run/target.fa
faToTwoBit run/source.fa run/source.2bit
ext/kent/bin/faToTwoBit: ext/kent/bin/faToTwoBit: cannot execute binary file
rake aborted!
Command failed with status (126): [faToTwoBit run/source.fa run/source.2bit...]
/Users/divya/flo/Rakefile:127:in to_2bit' /Users/divya/flo/Rakefile:72:in block in <top (required)>'
/Users/divya/flo/Rakefile:37:in `block in <top (required)>'
Tasks: TOP => run/liftover.chn

gt gff3: error: Parent ... was not defined (via "ID=")

Hi,
I tried to lift the below TAIR10 annotation:

> head TAIR10_GFF3_genes.gff
Chr1	TAIR10	chromosome	1	30427671	.	.	.	ID=Chr1;Name=Chr1
Chr1	TAIR10	gene	3631	5899	.	+	.	ID=AT1G01010;Note=protein_coding_gene;Name=AT1G01010
Chr1	TAIR10	mRNA	3631	5899	.	+	.	ID=AT1G01010.1;Parent=AT1G01010;Name=AT1G01010.1;Index=1
Chr1	TAIR10	protein	3760	5630	.	+	.	ID=AT1G01010.1-Protein;Name=AT1G01010.1;Derives_from=AT1G01010.1
Chr1	TAIR10	exon	3631	3913	.	+	.	Parent=AT1G01010.1
Chr1	TAIR10	five_prime_UTR	3631	3759	.	+	.	Parent=AT1G01010.1
Chr1	TAIR10	CDS	3760	3913	.	+	0	Parent=AT1G01010.1,AT1G01010.1-Protein;
Chr1	TAIR10	exon	3996	4276	.	+	.	Parent=AT1G01010.1
Chr1	TAIR10	CDS	3996	4276	.	+	2	Parent=AT1G01010.1,AT1G01010.1-Protein;
Chr1	TAIR10	exon	4486	4605	.	+	.	Parent=AT1G01010.1

Next, I did

> gff_remove_feats.rb chromosome TAIR10_GFF3_genes.gff > TAIR10_GFF3_genes-fix1.gff |head
Chr1	TAIR10	gene	3631	5899	.	+	.	ID=AT1G01010;Note=protein_coding_gene;Name=AT1G01010
Chr1	TAIR10	mRNA	3631	5899	.	+	.	ID=AT1G01010.1;Parent=AT1G01010;Name=AT1G01010.1;Index=1
Chr1	TAIR10	protein	3760	5630	.	+	.	ID=AT1G01010.1-Protein;Name=AT1G01010.1;Derives_from=AT1G01010.1
Chr1	TAIR10	exon	3631	3913	.	+	.	Parent=AT1G01010.1
Chr1	TAIR10	five_prime_UTR	3631	3759	.	+	.	Parent=AT1G01010.1
Chr1	TAIR10	CDS	3760	3913	.	+	0	Parent=AT1G01010.1,AT1G01010.1-Protein;
Chr1	TAIR10	exon	3996	4276	.	+	.	Parent=AT1G01010.1
Chr1	TAIR10	CDS	3996	4276	.	+	2	Parent=AT1G01010.1,AT1G01010.1-Protein;
Chr1	TAIR10	exon	4486	4605	.	+	.	Parent=AT1G01010.1
Chr1	TAIR10	CDS	4486	4605	.	+	0	Parent=AT1G01010.1,AT1G01010.1-Protein;

While running flo I got:

> mkdir run/TAIR10_GFF3_genes-fix1
liftOver -gff /QRISdata/Q0231/flo/tair10/TAIR10_GFF3_genes-fix1.gff run/liftover.chn run/TAIR10_GFF3_genes-fix1/lifted.gff3 run/TAIR10_GFF3_genes-fix1/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/QRISdata/Q0231/apps/flo/gff_recover.rb run/TAIR10_GFF3_genes-fix1/lifted.gff3 2> unprocessed.gff | gt gff3 -tidy -sort -addids -retainids - > run/TAIR10_GFF3_genes-fix1/lifted_cleaned.gff
warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: Parent "AT1G64130.1-Protein" on line 3 in file "-" was not defined (via "ID=")
rake aborted!

What did I miss?

Thank you in advance,

Michal

file "-" does not contain 9 tab (\t) separated fields

First of all I would like to thank you for this tool. It targets a task that is extremely difficult to do for non-model organisms with other tools.

I am having however an issue that I am not being able to solve. According to the error, my gff file does not have a header, nor does contain 9 tab separated fields. But it does (file attached: gff_file.zip). This is the error:

...

Processing chromosome_2
mkdir run/ref_v5.6_exons3_chromosome_2
liftOver -gff ref_v5.6_exons3_chromosome_2.gff3 run/liftover.chn run/ref_v5.6_exons3_chromosome_2/lifted.gff3 run/ref_v5.6_exons3_chromosome_2/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/home/elcortegano/tmp/lift/flo/gff_recover.rb run/ref_v5.6_exons3_chromosome_2/lifted.gff3 2> unprocessed.gff | gt gff3 -tidy -sort -addids -retainids - > run/ref_v5.6_exons3_chromosome_2/lifted_cleaned.gff
warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: line 1 in fil
[gff_file.zip](https://github.com/wurmlab/flo/files/5493835/gff_file.zip)
e "-" does not contain 9 tab (\t) separated fields
rake aborted!
Command failed with status (1): [/home/elcortegano/tmp/lift/flo/gff_recover...]
/home/elcortegano/tmp/lift/flo/Rakefile:60:in `block (2 levels) in <top (required)>'
/home/elcortegano/tmp/lift/flo/Rakefile:40:in `each'
/home/elcortegano/tmp/lift/flo/Rakefile:40:in `block in <top (required)>'
/usr/share/rubygems-integration/all/gems/rake-13.0.1/exe/rake:27:in `<top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)

This is using the (attached above) gff3 file after removing annotations using gff_remove_feats.rb so that only mRNA, exon and CDS are left, although the same error is for the original file.

What is wrong with the file?

Thank you

Problem with temporary GFF file

I tried flo yesterday, but it ended up in an error. It seems like there is a problem in a temorary GFF file?
So the question is if the program or my input GFF is the problem?

It created a file called "lifted.gff3" and one called "unlifted.gff3". Both of them are filled. But there is also a third file "Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta.gff3" which is empty.

Here are the last lines flo printed:

Processing Scaffold_3140
mkdir Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta
liftOver -gff /home/muehlich/Desktop/aethionema/data/Aarabicum.v2.5.gff run/liftover.chn Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta/lifted.gff3 Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
gt gff3 -tidy -sort -addids -retainids /tmp/lifted20170614-22821-oyvvge > Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta/Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta.gff3
warning: line 1 in file "/tmp/lifted20170614-22821-oyvvge" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: Parent "AA1G00001" on line 2 in file "/tmp/lifted20170614-22821-oyvvge" was not defined (via "ID=")
rake aborted!
Command failed with status (1): [gt gff3 -tidy -sort -addids -retainids /tm...]
/home/muehlich/flo/Rakefile:113:in process_gff' /home/muehlich/flo/Rakefile:234:in block (2 levels) in <top (required)>'
/home/muehlich/flo/Rakefile:223:in each' /home/muehlich/flo/Rakefile:223:in block in <top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)

wurmlab / flo Goto Github PK

flo's People

Contributors

Stargazers

Watchers

Forkers

flo's Issues

Recommend Projects

Recommend Topics

Recommend Org