wurmlab / flo Goto Github PK
View Code? Open in Web Editor NEWSame species annotation lift over pipeline.
Same species annotation lift over pipeline.
Hi,
I recently started liftover my gff with flo.
Since I concern if repeats in the genome affect the liftover result, I would like to try running flo with target genome with and without RepeatMask.
For the target genome with RepeatMask, I added 'qMask=mygenome.fasta.out' where mygenome.fasta.out is outfile of RepeatMasker.
The flo seems stuck at blat step because target genome is split and mygenome.fasta.out is not.
Could you give me any solutions?
Thanks,
Takashi
Hi,
I was able to fix the previous error I posted about, but reached another in the final steps of processing on a sample run. I am unsure which database is missing, but I did receive the lifted/unlifted gff files which look reasonable so far. Any help finishing the last steps of the analysis would be greatly appreciated, log file copied below:
cat error.log
nohup: ignoring input
/global/u2/a/asession/SCRIPTS/flo/Rakefile:25: warning: Insecure world writable dir /global/common/genepool/usg/languages/R in PATH, mode 040777
mkdir run
cp ./Chr1L.example/Chr1L.v91.fa run/source.fa
cp ./Chr1L.example/Chr1L.v92.fa run/target.fa
faToTwoBit run/source.fa run/source.2bit
faToTwoBit run/target.fa run/target.2bit
twoBitInfo run/source.2bit stdout | sort -k2nr > run/source.sizes
twoBitInfo run/target.2bit stdout | sort -k2nr > run/target.sizes
faSplit sequence run/target.fa 1 run/chunk_
parallel --joblog run/joblog.faSplit -j 1 -a run/joblst.faSplit
43201 pieces of 43961 written
parallel --joblog run/joblog.blat -j 1 -a run/joblst.blat
Loaded 219879705 letters in 1 sequences
Searched 216002468 bases in 43201 sequences
parallel --joblog run/joblog.liftUp -j 1 -a run/joblst.liftUp
Got 43201 lifts in run/chunk_0.fa.lft
Lifting run/chunk_0.fa.psl
parallel --joblog run/joblog.axtChain -j 1 -a run/joblst.axtChain
693297 blocks after duplicate removal
Loaded 219802468 bases of NC_030724.1 from run/target.2bit
Loaded 219879705 bases of chr1L from run/source.2bit
chainPair NC_030724.1-chr1L
Main chaining step done in 2979 milliseconds
747068 blocks after duplicate removal
chainPair NC_030724.1+chr1L
Main chaining step done in 14412 milliseconds
parallel --joblog run/joblog.chainSort -j 1 -a run/joblst.chainSort
chainMergeSort run/*.chn.sorted | chainSplit run stdin -lump=1
mv run/000.chain run/combined.chn.sorted
chainNet run/combined.chn.sorted run/source.sizes run/target.sizes run/combined.chn.sorted.net /dev/null
Got 1 chroms in run/source.sizes, 1 in run/target.sizes
Finishing nets
writing run/combined.chn.sorted.net
writing /dev/null
netChainSubset run/combined.chn.sorted.net run/combined.chn.sorted run/liftover.chn
Processing chr1L
mkdir Chr1L-liftover-Chr1L.v92
liftOver -gff ./Chr1L.example/Chr1L.gff3 run/liftover.chn Chr1L-liftover-Chr1L.v92/lifted.gff3 Chr1L-liftover-Chr1L.v92/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
rake aborted!
LoadError: cannot load such file -- bio/db/gff
Hello!
I am experiencing an error with Flo that I was hoping you might be able to help me with. Rake aborts when blat begins, I believe. I searched for this issue and found it has happened for a few others, and noticed that one piece of advice was to update parallel. I did that through a conda environment and am now using GNU parallel 20201122, which I realize is still GNU parallel. Nonetheless, I am getting the following error:
mkdir run
cp /path/genomic.fa run/source.fa
cp /path/target.fa
faToTwoBit run/source.fa run/source.2bit
faToTwoBit run/target.fa run/target.2bit
twoBitInfo run/source.2bit stdout | sort -k2nr > run/source.sizes
twoBitInfo run/target.2bit stdout | sort -k2nr > run/target.sizes
faSplit sequence run/target.fa 20 run/chunk_
parallel --joblog run/joblog.faSplit -j 20 -a run/joblst.faSplit
29164 pieces of 29164 written
26770 pieces of 26770 written
23287 pieces of 23287 written
25387 pieces of 25387 written
25525 pieces of 25525 written
25448 pieces of 25448 written
25555 pieces of 25555 written
26474 pieces of 26474 written
25992 pieces of 25992 written
27046 pieces of 27046 written
26153 pieces of 26153 written
26728 pieces of 26728 written
26526 pieces of 26526 written
27621 pieces of 27621 written
26588 pieces of 26588 written
26266 pieces of 26266 written
26387 pieces of 26387 written
25897 pieces of 25897 written
27300 pieces of 27300 written
25692 pieces of 25692 written
parallel --joblog run/joblog.blat -j 20 -a run/joblst.blat
Loaded 2510587379 letters in 14543 sequences
Searched 116386128 bases in 23287 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 127615767 bases in 25555 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 127103007 bases in 25448 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 132315417 bases in 26474 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 132646279 bases in 26588 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 129744868 bases in 25992 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 134829202 bases in 27046 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 130548735 bases in 26153 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 126148018 bases in 25387 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 133429151 bases in 26728 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 145757029 bases in 29164 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 137556497 bases in 27621 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 131717084 bases in 26387 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 130661225 bases in 26266 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 128121430 bases in 25692 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 126631904 bases in 25525 sequences
Loaded 2510587379 letters in 14543 sequences
Searched 132127474 bases in 26770 sequences
rake aborted!
Command failed with status (3): [parallel --joblog run/joblog.blat -j 20 -a...]
/home/user/bin/flo/Rakefile:161:in parallel' /home/user/bin/flo/Rakefile:107:in
block in <top (required)>'
/home/user/bin/flo/Rakefile:37:in `block in <top (required)>'
Tasks: TOP => run/liftover.chn
(See full trace by running task with --trace)
My job log looks like this:
9 : 1606275293.957 28015.902 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_11.fa rujoblog.blat
Seq Host Starttime JobRuntime Send Receive Exitval Signal Command
18 : 1606275293.989 45.953 0 0 0 9 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_06.fa run/chunk_06.fa.psl
17 : 1606275293.975 46.491 0 0 0 9 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_10.fa run/chunk_10.fa.psl
16 : 1606275293.972 872.307 0 0 0 9 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_13.fa run/chunk_13.fa.psl
5 : 1606275293.949 19148.322 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_19.fa run/chunk_19.fa.psl
12 : 1606275293.963 20725.125 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_14.fa run/chunk_14.fa.psl
7 : 1606275293.953 22003.258 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_15.fa run/chunk_15.fa.psl
2 : 1606275293.943 23220.606 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_18.fa run/chunk_18.fa.psl
11 : 1606275293.961 23502.038 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_04.fa run/chunk_04.fa.psl
8 : 1606275293.955 23821.197 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_16.fa run/chunk_16.fa.psl
3 : 1606275293.945 24157.292 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_05.fa run/chunk_05.fa.psl
13 : 1606275293.966 24351.732 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_01.fa run/chunk_01.fa.psl
1 : 1606275293.941 24630.828 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_07.fa run/chunk_07.fa.psl
14 : 1606275293.968 24792.986 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_02.fa run/chunk_02.fa.psl
4 : 1606275293.947 25343.822 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_17.fa run/chunk_17.fa.psl
6 : 1606275293.951 25997.266 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_08.fa run/chunk_08.fa.psl
19 : 1606275293.993 26497.434 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_00.fa run/chunk_00.fa.psl
15 : 1606275293.970 26656.599 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_09.fa run/chunk_09.fa.psl
20 : 1606275294.002 27909.093 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_03.fa run/chunk_03.fa.psl
9 : 1606275293.957 28015.902 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_11.fa run/chunk_11.fa.psl
10 : 1606275293.959 28659.995 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_12.fa run/chunk_12.fa.psl
Do you have any thoughts as to what might be going on?
Thanks so much,
Zoe
my mistake, I swapped source and target, leading to serious issues. Only remark is that rake should also be installed as dependency.
Sorry for the garbage tickets!
Hi,
This is more of a question than an issue. I just realized that none of the mitochondrial genes were lifted. In the unlifted file, many of them are annotated as deleted/partially deleted in new. I'm just wondering if this is inherent due to the way the algorithm works? Thank you.
Hi,
I ran flo, and got this error:
warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: Parent "maker-Contig53-exonerate_est2genome-gene-0.0" on line 1 in file "-" was not defined (via "ID=")
rake aborted!
Command failed with status (1): [/data/apps/flo/gff_recover.rb run/Medicago...]
/data/apps/flo/Rakefile:60:in block (2 levels) in <top (required)>' /data/apps/flo/Rakefile:40:in
each'
/data/apps/flo/Rakefile:40:in `block in <top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)
However, I can see the following output: lifted.gff3 and an unlifted.gff3 (both are non-empty). There is also an empty lifted_cleaned.gff. Can you please tell me what's going on?
Happy to send the .gff3 files if needed.
Thanks!
Hi!
The lift over chaining works like a charm, however it crashes when trying to process_gff. The issue seems to come from line 62 in the rakefile: require 'bio/db/gff'
I don't know if I need to install something in order to make this work.
Error shows:
liftOver -gff dwil_flybase.gff3 run/liftover.chn dwil_flybase-liftover-dwil_JR/lifted.gff3 dwil_flybase-liftover-dwil_JR/unlifted.gff3 Reading liftover chains Mapping coordinates WARNING: -gff is not recommended. Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>' rake aborted! LoadError: cannot load such file -- bio/db/gff /home/pgonzale/Programs/flo/Rakefile:62:in
process_gff'
/home/pgonzale/Programs/flo/Rakefile:234:in block (2 levels) in <top (required)>' /home/pgonzale/Programs/flo/Rakefile:223:in
each'
/home/pgonzale/Programs/flo/Rakefile:223:in block in <top (required)>' Tasks: TOP => default (See full trace by running task with --trace)
Greetings!
Hi, there.
I have a merged stringtie files and I want to use the gff_longest_transcripts.rb scripts to obtain the longest transcripts for each genes. Bu it do not work well. The infomation about the error and also the stringtie files are attached below. Would you mind giving some helps? Thanks so much!
Sincerely
Yizhong Huang
In addition to #22, provide:
tmp/source.fa
and tmp/target.fa
using current BLAT options and using as many cpus as available.Dear
Working on a RHEL7 24cores
I installed all and the ext/app's do work
I edited opts.yaml (including creating .ooc
'rake' does not trigger anything!
what did I miss?
Thanks for help
drwxr-xr-x. 2 splaisan bits 4.0K Jun 12 11:09 data
drwxr-xr-x. 5 splaisan bits 65 Jun 12 10:46 ext
drwxr-xr-x. 8 splaisan bits 4.0K Jun 12 10:30 .git
-rw-r--r--. 1 splaisan bits 1.5K Jun 12 10:30 opts_example.yaml
-rw-r--r--. 1 splaisan bits 1.6K Jun 12 11:10 opts.yaml
-rw-r--r--. 1 splaisan bits 7.9K Jun 12 10:30 Rakefile
-rw-r--r--. 1 splaisan bits 6.0K Jun 12 10:30 README.md
drwxr-xr-x. 2 splaisan bits 4.0K Jun 12 10:30 scriptssplaisan@NUC-SRV-01:/opt/biotools/flo$ ll data/
total 2.3G
drwxr-xr-x. 2 splaisan bits 4.0K Jun 12 11:09 .
drwxr-xr-x. 6 splaisan bits 4.0K Jun 12 11:14 ..
-rw-r--r--. 1 splaisan bits 1.9M Jun 12 11:09 11.ooc
-rw-r--r--. 1 splaisan bits 1.1G Jun 12 11:05 CA0.2_contigs.fasta
-rw-r--r--. 1 splaisan bits 77M Jun 12 11:02 CA0.2_gene_models_noseq.gff3
-rw-r--r--. 1 splaisan bits 1.2G Jun 12 11:05 hybrid_CA0.2.fastasplaisan@NUC-SRV-01:/opt/biotools/flo$ cat opts.yaml
# Location of binaries expected by flo.
#
# These will be added to PATH before the pipeline is run. The paths below
# are created by `scripts/install.sh`.Comment out or edit the paths based
# on how you installed UCSC-Kent toolkit, GNU Parallel and genometools.
:add_to_path:
- 'ext/kent/bin'
- 'ext/parallel-20150722/src'
- 'ext/genometools-1.5.6/bin'
# Location of source and target assemblies.
#
# If migrating annotations from assembly A to assembly B, A is the source
# and B is the target. Source and target assemblies are specified as path
# to the corresponding FASTA files (must end in .fa).
:source_fa: 'data/CA0.2_contigs.fasta'
:target_fa: 'data/hybrid_CA0.2.fasta'
# Number of processes that will be used to parallelise flo. Ideally, this
# will be the number of CPU cores you have.
:processes: '24'
# Parameters to run BLAT with.
#
# In addition to the options specified here, -noHead option is set by flo.
# -noHead simply causes the output BLAT output files to not have a header.
# It doesn't impact accuracy of results.
#
# Empty string is equivalent to:
#
# -t=dna -q=dna -tileSize=11 -stepSize=11 -oneOff=0 -minMatch=2
# -minScore=30 -minIdentity=90 -maxGap=2 -maxIntron=75000
#
# The default string defined below is a suitable trade-off between running
# time and sensitivity.
#:blat_opts: '-fastMap -tileSize=12 -minIdentity=98'
:blat_opts: 'blat -noHead -fastMap -ooc=data/11.ooc -minScore=100 -minIdentity=98'
# Path to the GFF files containing annotations on the source assembly that
# will be lifted to the target assembly.
:lift:
- 'data/CA0.2_gene_models_noseq.gff3'
Hi,
How did you calculate the below percentage?
For an ant genome (~350 Mb) we saw 90% annotations map identically to the new assembly (unpublished result).
Is it possible to get more statistics out of Flo?
Thank you in advance.
Michal
Hi,
I receive the error above during annotation liftover with flo, and would appreciate some help. The output directory contains completed lifted.gff3 and unlifted.gff3 before rake aborts.
My system:
Ubuntu 12.04 x64
Ruby 2.3.1
Bioruby 1.5.1
flo and its dependencies have been installed as described.
Thanks
Log:
...
'
chainMergeSort run/*.chn.sorted | chainSplit run stdin -lump=1
mv run/000.chain run/combined.chn.sorted
chainNet run/combined.chn.sorted run/source.sizes run/target.sizes run/combined.chn.sorted.net /dev/null
Got 14 chroms in run/source.sizes, 14 in run/target.sizes
Finishing nets
writing run/combined.chn.sorted.net
writing /dev/null
netChainSubset run/combined.chn.sorted.net run/combined.chn.sorted run/liftover.chn
Processing Pf3D7_14_v3
Processing Pf3D7_13_v3
Processing Pf3D7_12_v3
Processing Pf3D7_11_v3
Processing Pf3D7_10_v3
Processing Pf3D7_09_v3
Processing Pf3D7_08_v3
Processing Pf3D7_07_v3
Processing Pf3D7_06_v3
Processing Pf3D7_05_v3
Processing Pf3D7_04_v3
Processing Pf3D7_03_v3
Processing Pf3D7_02_v3
Processing Pf3D7_01_v3
mkdir PlasmoDB-29_Pfalciparum3D7_GFF_CHROMOSOME-liftover-named_assembly_pacbio2
liftOver -gff plasmodium/PlasmoDB-29_Pfalciparum3D7_GFF_CHROMOSOME.gff3 run/liftover.chn PlasmoDB-29_Pfalciparum3D7_GFF_CHROMOSOME-liftover-named_assembly_pacbio2/lifted.gff3 PlasmoDB-29_Pfalciparum3D7_GFF_CHROMOSOME-liftover-named_assembly_pacbio2/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
rake aborted!
NoMethodError: undefined methodlast' for nil:NilClass /home/muol/Documents/Software/flo/Rakefile:72:in
block in process_gff'
/home/muol/Documents/Software/flo/Rakefile:69:ineach' /home/muol/Documents/Software/flo/Rakefile:69:in
group_by'
/home/muol/Documents/Software/flo/Rakefile:69:inprocess_gff' /home/muol/Documents/Software/flo/Rakefile:223:in
block (2 levels) in <top (required)>'
/home/muol/Documents/Software/flo/Rakefile:212:ineach' /home/muol/Documents/Software/flo/Rakefile:212:in
block in <top (required)>'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/task.rb:240:inblock in execute' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/task.rb:235:in
each'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/task.rb:235:inexecute' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/task.rb:179:in
block in invoke_with_call_chain'
/usr/local/lib/ruby/2.3.0/monitor.rb:214:inmon_synchronize' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/task.rb:172:in
invoke_with_call_chain'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/task.rb:165:ininvoke' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:150:in
invoke_task'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:106:inblock (2 levels) in top_level' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:106:in
each'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:106:inblock in top_level' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:115:in
run_with_threads'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:100:intop_level' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:78:in
block in run'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:176:instandard_exception_handling' /usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/lib/rake/application.rb:75:in
run'
/usr/local/lib/ruby/gems/2.3.0/gems/rake-10.4.2/bin/rake:33:in<top (required)>' /usr/local/bin/rake:23:in
load'
/usr/local/bin/rake:23:in `
Tasks: TOP => default
Hello, I want to thank you for create this amazing tool and helping people with their flo issues.
I am having a problem with a flo run and I am not able to solve it. I think is related with the gff file but I dont know what is the problem (I downloaded it directly from NCBI).
These are the files I am using to run flo flo_files.zip
Here is the error:
liftOver -gff /home/jose/flo_haloferax/new.gff run/liftover.chn run/new/lifted.gff3 run/new/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
Expecting number line 12 of /home/jose/flo_haloferax/new.gff
rake aborted!
Command failed with status (255): [liftOver -gff /home/jose/flo_haloferax/new...]
/home/jose/flo/Rakefile:45:inblock (2 levels) in <top (required)>' /home/jose/flo/Rakefile:40:in
each'
/home/jose/flo/Rakefile:40:inblock in <top (required)>' /var/lib/gems/2.7.0/gems/rake-13.0.6/exe/rake:27:in
<top (required)>'
Tasks: TOP => default
I tryed to preprocess the gff file using gt gff3 -tidy -sort -addids -retainids
and also to delete the gene feature with gff_remove_feats.rb gene
but the same error appears.
Thank you so much in advance!
I'm getting an error that seems to have to do with my input gff. I have tried with both gff_remove_feats.rb
and gff_longest_transcripts.rb
parallel --joblog run/joblog.chainSort -j 15 -a run/joblst.chainSort
chainMergeSort run/*.chn.sorted | chainSplit run stdin -lump=1
mv run/000.chain run/combined.chn.sorted
chainNet run/combined.chn.sorted run/source.sizes run/target.sizes run/combined.chn.sorted.net /dev/null
Got 15 chroms in run/source.sizes, 15 in run/target.sizes
Finishing nets
writing run/combined.chn.sorted.net
writing /dev/null
netChainSubset run/combined.chn.sorted.net run/combined.chn.sorted run/liftover.chn
Processing 1
Processing 5
Processing 2
Processing 3
Processing 11
Processing 6
Processing 7
Processing 8
Processing 9
Processing 4
Processing 10
Processing 14
Processing 12
Processing 13
Processing MT
mkdir run/h99_longest_transcript
liftOver -gff /scratch/mblab/chasem/liftOver/flo_crypto/h99/h99_longest_transcript.gff run/liftover.chn run/h99_longest_transcript/lifted.gff3 run/h99_longest_transcript/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/scratch/mblab/chasem/liftOver/flo/gff_recover.rb run/h99_longest_transcript/lifted.gff3 2> run/h99_longest_transcript/lifted_cleanup.log | gt gff3 -tidy -sort -addids -retainids - > run/h99_longest_transcript/lifted_cleaned.gff 2>> run/h99_longest_transcript/lifted_cleanup.log
rake aborted!
Command failed with status (1): [/scratch/mblab/chasem/liftOver/flo/gff_rec...]
/scratch/mblab/chasem/liftOver/flo/Rakefile:60:in `block (2 levels) in <top (required)>'
/scratch/mblab/chasem/liftOver/flo/Rakefile:40:in `each'
/scratch/mblab/chasem/liftOver/flo/Rakefile:40:in `block in <top (required)>'
Tasks: TOP => default
I'm not quite sure where to start debugging. Looking in the Rakefile and at gff_recover didn't give me any good ideas. Any suggestions?
Any plans to wrap flo in a workflow manager, e.g. snakemake or nextflow? This could help it run on many different platforms.
The reason I ask is, I discovered flo after writing my own nextflow pipeline to do something similar, but it doesn't fully work, so I might try to wrap flo in a workflow manager instead. If you are already working on doing that maybe we can join forces?
https://github.com/photocyte/doSameSpeciesLiftOver_nextflow
Hi,
I ran into the following problem:
โฆ
Processing mito11
mkdir run/test_v2
liftOver -gff /work/team/banana/assembly/bam2consensus/flo/test_v2.gff3 run/liftover.chn run/test_v2/lifted.gff3 run/test_v2/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/lustre/work-lustre/team/apps/flo/gff_recover.rb run/test_v2/lifted.gff3 2> unprocessed.gff | gt gff3 -tidy -sort -addids -retainids - > run/test_v2/lifted_cleaned.gff
warning: GFF3 file "-" is empty
warning: GFF3 file "-" is empty
warning: GFF3 file "-" is empty
ln -s /work/team/banana/assembly/bam2consensus/flo/test_v2.gff3 run/test_v2/input.gff
/lustre/work-lustre/team/apps/flo/gff_compare.rb cds run/source.fa run/target.fa run/test_v2/input.gff run/test_v2/lifted_cleaned.gff > run/test_v2/unmapped.txt
gt gff3 -sort -retainids run/test_v2/input.gff | gt extractfeat -type CDS -join -retainids -seqfile run/source.fa -matchdescstart - > run/test_v2/input.cds.fa
gt gff3: error: illegal GFF version pragma in line 46728 of file "run/test_v2/input.gff": ##gff-version 3 (merge multiple GFF3 files with `gt gff3 -sort` and do not concatenate them manually)
gt extractfeat: error: GFF3 file "-" is empty
/lustre/work-lustre/team/miniconda2/envs/flo/lib/ruby/2.2.0/rake/file_utils.rb:66:in `block in create_shell_runner': Command failed with status (1): [gt gff3 -sort -retainids run/musa_acuminat...] (RuntimeError)
from /lustre/work-lustre/team/miniconda2/envs/flo/lib/ruby/2.2.0/rake/file_utils.rb:57:in `call'
from /lustre/work-lustre/team/miniconda2/envs/flo/lib/ruby/2.2.0/rake/file_utils.rb:57:in `sh'
from /lustre/work-lustre/team/miniconda2/envs/flo/lib/ruby/2.2.0/rake/file_utils_ext.rb:37:in `sh'
from /lustre/work-lustre/team/apps/flo/gff_compare.rb:25:in `extract_cds'
from /lustre/work-lustre/team/apps/flo/gff_compare.rb:46:in `<main>'
rake aborted!
Command failed with status (1): [/lustre/work-lustre/team/apps/f...]
/work/team/apps/flo/Rakefile:56:in `block (2 levels) in <top (required)>'
/work/team/apps/flo/Rakefile:40:in `each'
/work/team/apps/flo/Rakefile:40:in `block in <top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)
How could I fix the gff3 file?
Best wishes,
Michal
the flo result gff can not feed into EVM?
demonstration that is works is to carry over annotations from genome X to the same genome? One would expect 100% identity if this works correctly...
Hello!
I'm having trouble getting flo started. I was hoping you might be able to help me out. To set up flo, I installed all dependencies using apt-get or conda (conda create -n flo -c mvdbeek -c conda-forge parallel genometools ucsc_tools
). I set up my data as requested, and am now getting the following error:
rake aborted!
SyntaxError: /path/to/dir/opts_example.yaml:16: syntax error, unexpected ':', expecting end-of-input
Often I encounter errors because my directory has spaces in it (can't change this), but I don't think this is the problem, here. In my opts file, the line throwing the error is as such:
:source_fa: '/path/to/genome/genome.fa'
Do you have any idea what might be going on here? Some sort of version error, perhaps?
Thanks,
Zoe
Hi there,
Big fan of flo. Has worked really well for my research. But it would be even better if flo were on bioconda and had its dependencies explicitly linked, namely:
conda install -c bioconda genometools-genometools
conda install -c conda-forge parallel
conda install -c bioconda -y ucsc-liftup ucsc-fasplit ucsc-liftover ucsc-axtchain ucsc-chainnet ucsc-blat ucsc-chainsort ucsc-fatotwobit ucsc-twobitinfo ucsc-chainsplit ucsc-chainmergesort ucsc-netchainsubset
A first step towards that, I believe would be issuing a Release here on this Github Repo. Then the Bioconda recipe could point to that release.
All the best,
-Tim
flo failed on a 14Gb genome, with "corrupted double-linked list (not small)" error. it runs normally with genome smaller than 4Gb in size. The setting is on an aws m5.16xlarge EC2 instance.
rake -f /home/ubuntu/flo/Rakefile &
mkdir run
cp /home/ubuntu/s.fa run/source.fa
cp /home/ubuntu/t.fa run/target.fa
faToTwoBit run/source.fa run/source.2bit
faToTwoBit run/target.fa run/target.2bit
twoBitInfo run/source.2bit stdout | sort -k2nr > run/source.sizes
twoBitInfo run/target.2bit stdout | sort -k2nr > run/target.sizes
faSplit sequence run/target.fa 21 run/chunk_
parallel --joblog run/joblog.faSplit -j 21 -a run/joblst.faSplit
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; and it won't cost you a cent.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence the citation notice: run 'parallel --bibtex'.
123322 pieces of 123923 written
133957 pieces of 134763 written
150983 pieces of 152743 written
156478 pieces of 157558 written
98419 pieces of 99073 written
99082 pieces of 99724 written
103154 pieces of 103663 written
113555 pieces of 113991 written
118767 pieces of 119728 written
123551 pieces of 124526 written
141741 pieces of 142672 written
144495 pieces of 146237 written
130388 pieces of 131310 written
147572 pieces of 148896 written
138549 pieces of 140111 written
141907 pieces of 142961 written
149246 pieces of 150844 written
149613 pieces of 150822 written
197774 pieces of 198899 written
160747 pieces of 162550 written
167525 pieces of 170389 written
parallel --joblog run/joblog.blat -j 21 -a run/joblst.blat
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; and it won't cost you a cent.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence the citation notice: run 'parallel --bibtex'.
corrupted double-linked list (not small)
free(): invalid next size (normal)
free(): invalid next size (normal)
double free or corruption (!prev)
double free or corruption (!prev)
malloc(): smallbin double linked list corrupted
free(): invalid next size (normal)
malloc(): memory corruption
free(): invalid next size (normal)
double free or corruption (!prev)
free(): invalid next size (normal)
double free or corruption (!prev)
double free or corruption (!prev)
rake aborted!
Command failed with status (21): [parallel --joblog run/joblog.blat -j 21 -a...]
/home/ubuntu/flo/Rakefile:153:in parallel' /home/ubuntu/flo/Rakefile:99:in
block in <top (required)>'
/home/ubuntu/flo/Rakefile:37:in `block in <top (required)>'
Tasks: TOP => run/liftover.chn
(See full trace by running task with --trace)
[1]+ Exit 1 rake -f /home/ubuntu/flo/Rakefile
Hi,
I ran in the following problem:
/apps/flo/gff_remove_feats.rb annotation_v2.gff3 > annotation_v2_cleaned.gff3
apps/flo/gff_remove_feats.rb:15:in `foreach': no implicit conversion of nil into String (TypeError)
from /apps/flo/gff_remove_feats.rb:15:in `<main>'
Did I miss anything?
Thank you in advance.
Best wishes,
Michal
Hey, this seems similar to another recent issue (involving the bio gem not being installed), but I am running into this issue now, and I am unsure what the error is indicating - if it is an issue with my gff or with ruby. Here is the trace, followed by the first three lines of my .gff.
rake --trace [ 3:01PM]
** Invoke default (first_time)
** Execute default
** Invoke run/liftover.chn (first_time, not_needed)
mkdir lepdec_OGSv1.0_ONLY_GENES.gff-liftover-Ldec_redundans_on_alpaths_contigs_genome
liftOver -gff lepdec_OGSv1.0_ONLY_GENES.gff run/liftover.chn lepdec_OGSv1.0_ONLY_GENES.gff-liftover-Ldec_redundans_on_alpaths_contigs_genome/lifted.gff3 lepdec_OGSv1.0_ONLY_GENES.gff-liftover-Ldec_redundans_on_alpaths_contigs_genome/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/var/lib/gems/2.2.0/gems/bio-1.5.1/lib/bio/db/gff.rb:921: warning: regexp match /.../n against to UTF-8 string
rake aborted!
undefined method last' for nil:NilClass /home/beetle/Documents/CPB/reference_genomes/flo/Rakefile:72:in
block in process_gff'
/home/beetle/Documents/CPB/reference_genomes/flo/Rakefile:69:in each' /home/beetle/Documents/CPB/reference_genomes/flo/Rakefile:69:in
group_by'
/home/beetle/Documents/CPB/reference_genomes/flo/Rakefile:69:in process_gff' /home/beetle/Documents/CPB/reference_genomes/flo/Rakefile:223:in
block (2 levels) in <top (required)>'
/home/beetle/Documents/CPB/reference_genomes/flo/Rakefile:212:in each' /home/beetle/Documents/CPB/reference_genomes/flo/Rakefile:212:in
block in <top (required)>'
/usr/lib/ruby/vendor_ruby/rake/task.rb:246:in call' /usr/lib/ruby/vendor_ruby/rake/task.rb:246:in
block in execute'
/usr/lib/ruby/vendor_ruby/rake/task.rb:241:in each' /usr/lib/ruby/vendor_ruby/rake/task.rb:241:in
execute'
/usr/lib/ruby/vendor_ruby/rake/task.rb:184:in block in invoke_with_call_chain' /usr/lib/ruby/2.2.0/monitor.rb:211:in
mon_synchronize'
/usr/lib/ruby/vendor_ruby/rake/task.rb:177:in invoke_with_call_chain' /usr/lib/ruby/vendor_ruby/rake/task.rb:170:in
invoke'
/usr/lib/ruby/vendor_ruby/rake/application.rb:143:in invoke_task' /usr/lib/ruby/vendor_ruby/rake/application.rb:101:in
block (2 levels) in top_level'
/usr/lib/ruby/vendor_ruby/rake/application.rb:101:in each' /usr/lib/ruby/vendor_ruby/rake/application.rb:101:in
block in top_level'
/usr/lib/ruby/vendor_ruby/rake/application.rb:110:in run_with_threads' /usr/lib/ruby/vendor_ruby/rake/application.rb:95:in
top_level'
/usr/lib/ruby/vendor_ruby/rake/application.rb:73:in block in run' /usr/lib/ruby/vendor_ruby/rake/application.rb:160:in
standard_exception_handling'
/usr/lib/ruby/vendor_ruby/rake/application.rb:70:in run' /usr/bin/rake:27:in
GFF:
Scaffold1 OGSv1.0 gene 12481 16948 . - . ID=LDEC000001;Name=LDEC000001;Dbxref=I5KNAL:LDEC000001;method=Maker
Scaffold1 OGSv1.0 gene 19920 23242 . - . ID=LDEC000002;Name=LDEC000002;Dbxref=I5KNAL:LDEC000002;method=Maker
Scaffold1 OGSv1.0 gene 26074 37602 . + . ID=LDEC000003;Name=LDEC000003;Dbxref=I5KNAL:LDEC000003;method=Maker
Thanks!
Kristian
Hi,
I've try to install flo but when I launch the install using install.sh I've a compilation error in return :
genometools-1.5.6/www/genometools.org/htdocs/trackselectors.html genometools-1.5.6/www/github/ genometools-1.5.6/www/github/assets/ genometools-1.5.6/www/github/assets/overview.png /bin/sh: 1: Syntax error: "(" unexpected /bin/sh: 1: Syntax error: "(" unexpected /bin/sh: 1: Syntax error: "(" unexpected [compile sqlite3.o] /bin/sh: 2: Syntax error: "(" unexpected make: *** [Makefile:741: obj/src/external/sqlite-3.8.7.1/sqlite3.o] Error 2
I've you got any idea to fix this issue ?
Thanks in advance
Hello,
I ran flo on my data to convert the gff coordinates from one assembly version to the other. I have the files, lifted.gff3 and unlifted.gff3. The lifted.gff3 looks fine in terms of the size comparison with the original gff3.
However, at the end, I get the following error:
liftOver -gff GCF_000698965.1_ASM69896v1_genomic.flo.gff run/liftover.chn run/GCF_000698965.1_ASM69896v1_genomic.flo/lifted.gff3 run/GCF_000698965.1_ASM69896v1_genomic.flo/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/crex/proj/uppstore2017180/private/homap/ostrich_Z_diversity/src/flo/gff_recover.rb run/GCF_000698965.1_ASM69896v1_genomic.flo/lifted.gff3 2> unprocessed.gff | gt gff3 -tidy -sort -addids -retainids - > run/GCF_000698965.1_ASM69896v1_genomic.flo/lifted_cleaned.gff
warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: line 1 in file "-" does not contain 9 tab (\t) separated fields
rake aborted!
Command failed with status (1): [/crex/proj/uppstore2017180/private/homap/o...]
/crex/proj/uppstore2017180/private/homap/ostrich_Z_diversity/src/flo/Rakefile:60:in `block (2 levels) in <top (required)>'
/crex/proj/uppstore2017180/private/homap/ostrich_Z_diversity/src/flo/Rakefile:40:in `each'
/crex/proj/uppstore2017180/private/homap/ostrich_Z_diversity/src/flo/Rakefile:40:in `block in <top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)
I was wondering how I could resolve this issue?
One should update the makefile so that the newest version of genometools is used in the building process. Right now version 1.5.6 can lead to the following make error :
flo_species_name/ext/genometools-1.5.6/src/mgth/metagenomethreader.h:224: multiple definition of `gt_cstr_nofree_ulp_hashtype'; obj/src/mgth/metagenomethreader.o:/media/nils/nils_ssd_01/flo_species_name/ext/genometools-1.5.6/src/mgth/metagenomethreader.h:224: first defined here
/usr/bin/ld: obj/src/mgth/mg_compute_gene_prediction.o
collect2: error: ld returned 1 exit status
make: *** [Makefile:587: lib/libgenometools.so] Error 1
To prevent this just adjust the make file section referring to the installation of genometools like this :
# Genometools
cd ext
wget -c https://github.com/genometools/genometools/archive/refs/tags/v1.6.2.tar.gz -O v1.6.2.tar.gz
tar xvf v1.6.2.tar.gz
rm v1.6.2.tar.gz
cd genometools-1.6.2
make cairo=no errorcheck=no
Hi:
I ran flo to liftOver from one nematode genome assembly to another (Pristionchus pacificus)
Command failed with status (127): [faSplit sequence run/target.fa 2 run/chunk...]
Here's the full command:
>$ rake -f Rakefile
mkdir run
cp /scratch/rtraborn/pp_liftOver/pp_hybrid1/Pristionchus_Hybrid_assembly.fa run/source.fa
cp /scratch/rtraborn/pp_liftOver/pp_hybrid2/pacificus_Hybrid2.fa run/target.fa
faToTwoBit run/source.fa run/source.2bit
/usr/local/share/gems/gems/rake-12.3.0/lib/rake/file_utils.rb:54: warning: Insecure world writable dir /home/rtraborn/genome_analysis/paml4.9d/bin in PATH, mode 040777
faToTwoBit run/target.fa run/target.2bit
twoBitInfo run/source.2bit stdout | sort -k2nr > run/source.sizes
twoBitInfo run/target.2bit stdout | sort -k2nr > run/target.sizes
faSplit sequence run/target.fa 2 run/chunk_
rake aborted!
Command failed with status (127): [faSplit sequence run/target.fa 2 run/chunk...]
/home/rtraborn/genome_analysis/flo/Rakefile:79:in `block in <top (required)>'
/home/rtraborn/genome_analysis/flo/Rakefile:37:in `block in <top (required)>'
/usr/local/share/gems/gems/rake-12.3.0/exe/rake:27:in `<top (required)>'
Tasks: TOP => run/liftover.chn
(See full trace by running task with --trace)
Concerning the opts file: I set blat_opts:
as follows: '-tileSize=12 -minIdentity=98'
and set processes:
to '2'.
Any idea where this is going wrong? I'm certain the assemblies and gff file are correctly formatted (the latter being 'cleaned' as described.
Flo works but at the end dies due to some CDS extraction issue.
Any idea what I should do to fix this?
...
ln -s /data/nanopore/2741_MinION/flo_results/R64_genomic_cleaned.gff run/../R64_genomic_cleaned/input.gff
/data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/gff_compare.rb cds run/source.fa run/target.fa run/../R64_genomic_cleaned/input.gff run/../R64_genomic_cleaned/lifted_cleaned.gff > run/../R64_genomic_cleaned/unmapped.txt
gt extractfeat -type CDS -join -retainids -seqfile run/source.fa -matchdescstart run/../R64_genomic_cleaned/input.gff > run/../R64_genomic_cleaned/input.cds.fa
gt extractfeat: error: the file run/../R64_genomic_cleaned/input.gff is not sorted (example: line 5 and 6)
/usr/lib/ruby/vendor_ruby/rake/file_utils.rb:66:in `block in create_shell_runner': Command failed with status (1): [gt extractfeat -type CDS -join -retainids ...] (RuntimeError)
from /usr/lib/ruby/vendor_ruby/rake/file_utils.rb:57:in `sh'
from /usr/lib/ruby/vendor_ruby/rake/file_utils_ext.rb:37:in `sh'
from /data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/gff_compare.rb:25:in `extract_cds'
from /data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/gff_compare.rb:45:in `<main>'
rake aborted!
Command failed with status (1): [/data/nanopore/2741_MinION/flo_results/flo...]
/data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/Rakefile:56:in `block (2 levels) in <top (required)>'
/data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/Rakefile:40:in `each'
/data/nanopore/2741_MinION/flo_results/flo_canu_contigs_R64/Rakefile:40:in `block in <top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)
Data:
Hi,
Sorry if this has been discussed before but I can't find it. Is it possible to liftover a gtf instead of a gff file?
Thank you.
I work on a hybrid yeast made of two known referenced yeasts.
I constructed the artificial assembly by merging the fasta files and did the same with the two GFF files from NCBI.
When I run flo with that gff and a denovo assembly of the hybrid genome, it dies at the GFF cleaning step producing errors because the two stains have genes with different geneIDs gbIDs but identical canonical gene names.
I thought I was clever by replacing th emerged GFF by the two GFF in the yaml but it fails also (but not due to name)
Is the best solution adding both GFF to the yaml and merging the cleaned results back after lifting?
Thanks in advance,
Stephane
PLEASE DELETE this issue:
I ran flo on a whole genome and found that the lifted.gff had features like this referring to a parent PKINGS_0.1_G055355, but the parent PKINGS_0.1_G055355 was not in the file
Scaffold_87 maker mRNA 5628861 5664273 . - . ID=PKINGS_0.1_T055355-R4;Parent=PKINGS_0.1_G055355;Name=PKINGS_0.1_T055355-R4;Alias=maker-Scaffold517-augustus-gene-2.2-mRNA-1;Dbxref=InterPro:IPR000157,InterPro:IPR007632,Pfam:PF01582,Pfam:PF04547;Note=Similar to ANO4: Anoctamin-4 (Homo sapi
ens);Ontology_term=GO:0005515,GO:0007165;_AED=0.30;_QI=451%7C0.83%7C0.83%7C1%7C0.96%7C0.93%7C31%7C1288%7C1182;_eAED=0.30
I took Scaffold_87 from target.fa and the scaffold that PKINGS_0.1_G055355 originally came from and ran a smaller flo alignment between these two sequences. Interestingly enough, the new lifted.gff actually did contain that parent PKINGS_0.1_G055355, but it seems like gff_recover.rb run/annotations/lifted.gff3
actually removed the gene line?
Full output
...
chainMergeSort run/*.chn.sorted | chainSplit run stdin -lump=1
mv run/000.chain run/combined.chn.sorted
chainNet run/combined.chn.sorted run/source.sizes run/target.sizes run/combined.chn.sorted.net /dev/null
Got 1 chroms in run/source.sizes, 1 in run/target.sizes
Finishing nets
writing run/combined.chn.sorted.net
writing /dev/null
netChainSubset run/combined.chn.sorted.net run/combined.chn.sorted run/liftover.chn
Processing Scaffold517
mkdir run/annotations
liftOver -gff annotations.gff run/liftover.chn run/annotations/lifted.gff3 run/annotations/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/home/me/flo/gff_recover.rb run/annotations/lifted.gff3 2> unprocessed.gff | gt gff3 -tidy -sort -addids -retainids - > run/annotations/lifted_cleaned.gff
warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: Parent "PKINGS_0.1_G055355" on line 1 in file "-" was not defined (via "ID=")
rake aborted!
Here is an example
HI
I am trying to liftover a GFF genome annotation but fail to get the GFF accepted by the software.
Here is a minimal sample that triggers the crash
1 CoGe transcript 8522 12619 . + . transcript_id "C00s001g005000.mRNA1"; gene_id "C00s001g005000"; gene_name "C00s001g005000";
I run the following command using the latest docker and all inputs are present
docker run --rm --user "$(id -u):$(id -g)" \
-v $PWD:/workdir \
informationsea/transanno:latest transanno minimap2chain \
/workdir/${pfxq}_to_${pfxt}.paf \
--output /workdir/${pfxq}_to_${pfxt}.chain
I get
nom error: Error(("transcript_id \"C00s001g005000.mRNA1\"; gene_id \"C00s001g005000\"; gene_name \"C00s001g005000\";\n", CrLf))
thread 'main' panicked at 'Operation Error: LiftOverError { inner: GeneParseError { inner: GeneParseError { inner:
Parse error }
Parse error at line: 1 }
Failed to parse gene annotation }', transanno/src/main.rs:30:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
I could not figure how to add RUST_BACKTRACE=1
to a docker call
Thanks for your help
Hi,
I am trying to liftover the ref_GRCh37.p13_top_level.gff3 to a denovo-assembled genome. but got the following error:
mkdir ref_GRCh37.p13_top_level-liftover-A673_combined_fastq.1
liftOver -gff /gpfs0/home/rslssnck/cxt050/hg19/ref_GRCh37.p13_top_level.gff3 run/liftover.chn ref_GRCh37.p13_top_level-liftover-A673_combined_fastq.1/lifted.gff3 ref_GRCh37.p13_top_level-liftover-A673_combined_fastq.1/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
Expecting number line 38 of /gpfs0/home/rslssnck/cxt050/hg19/ref_GRCh37.p13_top_level.gff3
rake aborted!
Command failed with status (255): [liftOver -gff /gpfs0/home/rslssnck/cxt0...]
/gpfs0/home/rslssnck/cxt050/opt/flo/Rakefile:232:in block (2 levels) in <top (required)>' /gpfs0/home/rslssnck/cxt050/opt/flo/Rakefile:223:in
each'
/gpfs0/home/rslssnck/cxt050/opt/flo/Rakefile:223:in block in <top (required)>' /gpfs0/home/rslssnck/cxt050/.rvm/gems/ruby-2.4.0@global/gems/rake-12.0.0/exe/rake:27:in
<top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)
I'm not sure what to do with this error. Your help is greatly appreciated. Thanks!
I am trying to lift over a viral genome from one version to another. I am getting the following error when doing that. I checked the permissions of the faToTwoBit file and that looks fine. I am running this on a Mac
Error:
mkdir run
cp /Users/divya/NC_009333.fa run/source.fa
cp /Users/divya/GQ994935.fa run/target.fa
faToTwoBit run/source.fa run/source.2bit
ext/kent/bin/faToTwoBit: ext/kent/bin/faToTwoBit: cannot execute binary file
rake aborted!
Command failed with status (126): [faToTwoBit run/source.fa run/source.2bit...]
/Users/divya/flo/Rakefile:127:in to_2bit' /Users/divya/flo/Rakefile:72:in
block in <top (required)>'
/Users/divya/flo/Rakefile:37:in `block in <top (required)>'
Tasks: TOP => run/liftover.chn
Hi,
I tried to lift the below TAIR10 annotation:
> head TAIR10_GFF3_genes.gff
Chr1 TAIR10 chromosome 1 30427671 . . . ID=Chr1;Name=Chr1
Chr1 TAIR10 gene 3631 5899 . + . ID=AT1G01010;Note=protein_coding_gene;Name=AT1G01010
Chr1 TAIR10 mRNA 3631 5899 . + . ID=AT1G01010.1;Parent=AT1G01010;Name=AT1G01010.1;Index=1
Chr1 TAIR10 protein 3760 5630 . + . ID=AT1G01010.1-Protein;Name=AT1G01010.1;Derives_from=AT1G01010.1
Chr1 TAIR10 exon 3631 3913 . + . Parent=AT1G01010.1
Chr1 TAIR10 five_prime_UTR 3631 3759 . + . Parent=AT1G01010.1
Chr1 TAIR10 CDS 3760 3913 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
Chr1 TAIR10 exon 3996 4276 . + . Parent=AT1G01010.1
Chr1 TAIR10 CDS 3996 4276 . + 2 Parent=AT1G01010.1,AT1G01010.1-Protein;
Chr1 TAIR10 exon 4486 4605 . + . Parent=AT1G01010.1
Next, I did
> gff_remove_feats.rb chromosome TAIR10_GFF3_genes.gff > TAIR10_GFF3_genes-fix1.gff |head
Chr1 TAIR10 gene 3631 5899 . + . ID=AT1G01010;Note=protein_coding_gene;Name=AT1G01010
Chr1 TAIR10 mRNA 3631 5899 . + . ID=AT1G01010.1;Parent=AT1G01010;Name=AT1G01010.1;Index=1
Chr1 TAIR10 protein 3760 5630 . + . ID=AT1G01010.1-Protein;Name=AT1G01010.1;Derives_from=AT1G01010.1
Chr1 TAIR10 exon 3631 3913 . + . Parent=AT1G01010.1
Chr1 TAIR10 five_prime_UTR 3631 3759 . + . Parent=AT1G01010.1
Chr1 TAIR10 CDS 3760 3913 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
Chr1 TAIR10 exon 3996 4276 . + . Parent=AT1G01010.1
Chr1 TAIR10 CDS 3996 4276 . + 2 Parent=AT1G01010.1,AT1G01010.1-Protein;
Chr1 TAIR10 exon 4486 4605 . + . Parent=AT1G01010.1
Chr1 TAIR10 CDS 4486 4605 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
While running flo I got:
> mkdir run/TAIR10_GFF3_genes-fix1
liftOver -gff /QRISdata/Q0231/flo/tair10/TAIR10_GFF3_genes-fix1.gff run/liftover.chn run/TAIR10_GFF3_genes-fix1/lifted.gff3 run/TAIR10_GFF3_genes-fix1/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/QRISdata/Q0231/apps/flo/gff_recover.rb run/TAIR10_GFF3_genes-fix1/lifted.gff3 2> unprocessed.gff | gt gff3 -tidy -sort -addids -retainids - > run/TAIR10_GFF3_genes-fix1/lifted_cleaned.gff
warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: Parent "AT1G64130.1-Protein" on line 3 in file "-" was not defined (via "ID=")
rake aborted!
What did I miss?
Thank you in advance,
Michal
First of all I would like to thank you for this tool. It targets a task that is extremely difficult to do for non-model organisms with other tools.
I am having however an issue that I am not being able to solve. According to the error, my gff file does not have a header, nor does contain 9 tab separated fields. But it does (file attached: gff_file.zip). This is the error:
...
Processing chromosome_2
mkdir run/ref_v5.6_exons3_chromosome_2
liftOver -gff ref_v5.6_exons3_chromosome_2.gff3 run/liftover.chn run/ref_v5.6_exons3_chromosome_2/lifted.gff3 run/ref_v5.6_exons3_chromosome_2/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
/home/elcortegano/tmp/lift/flo/gff_recover.rb run/ref_v5.6_exons3_chromosome_2/lifted.gff3 2> unprocessed.gff | gt gff3 -tidy -sort -addids -retainids - > run/ref_v5.6_exons3_chromosome_2/lifted_cleaned.gff
warning: line 1 in file "-" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: line 1 in fil
[gff_file.zip](https://github.com/wurmlab/flo/files/5493835/gff_file.zip)
e "-" does not contain 9 tab (\t) separated fields
rake aborted!
Command failed with status (1): [/home/elcortegano/tmp/lift/flo/gff_recover...]
/home/elcortegano/tmp/lift/flo/Rakefile:60:in `block (2 levels) in <top (required)>'
/home/elcortegano/tmp/lift/flo/Rakefile:40:in `each'
/home/elcortegano/tmp/lift/flo/Rakefile:40:in `block in <top (required)>'
/usr/share/rubygems-integration/all/gems/rake-13.0.1/exe/rake:27:in `<top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)
This is using the (attached above) gff3 file after removing annotations using gff_remove_feats.rb
so that only mRNA, exon and CDS are left, although the same error is for the original file.
What is wrong with the file?
Thank you
I tried flo yesterday, but it ended up in an error. It seems like there is a problem in a temorary GFF file?
So the question is if the program or my input GFF is the problem?
It created a file called "lifted.gff3" and one called "unlifted.gff3". Both of them are filled. But there is also a third file "Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta.gff3" which is empty.
Here are the last lines flo printed:
Processing Scaffold_3140
mkdir Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta
liftOver -gff /home/muehlich/Desktop/aethionema/data/Aarabicum.v2.5.gff run/liftover.chn Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta/lifted.gff3 Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta/unlifted.gff3
Reading liftover chains
Mapping coordinates
WARNING: -gff is not recommended.
Use 'ldHgGene -out=<file.gp>' and then 'liftOver -genePred <file.gp>'
gt gff3 -tidy -sort -addids -retainids /tmp/lifted20170614-22821-oyvvge > Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta/Aarabicum.v2.5.gff-liftover-aethionema-arabicum_v3.0.fasta.gff3
warning: line 1 in file "/tmp/lifted20170614-22821-oyvvge" does not begin with "##gff-version" or "##gvf-version", create "##gff-version 3" line automatically
gt gff3: error: Parent "AA1G00001" on line 2 in file "/tmp/lifted20170614-22821-oyvvge" was not defined (via "ID=")
rake aborted!
Command failed with status (1): [gt gff3 -tidy -sort -addids -retainids /tm...]
/home/muehlich/flo/Rakefile:113:inprocess_gff' /home/muehlich/flo/Rakefile:234:in
block (2 levels) in <top (required)>'
/home/muehlich/flo/Rakefile:223:ineach' /home/muehlich/flo/Rakefile:223:in
block in <top (required)>'
Tasks: TOP => default
(See full trace by running task with --trace)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.