Comments (38)
Unfortunately, I was short-sighted when designing this pipeline in allowing such flexibility. The same is true of providing external hints. I want to re-engineer the whole thing eventually. If I time find soon, I think I can make this possible without too much work, and without the hack below.
How to hack it in: pretend your PPX output is either CGP or PB. Place the GTF you produced in the --work-dir/augustus_cgp or --work-dir/augustus_pb directory with the correct names ($genome.augCGP.gtf). Construct a matching genePred with gtfToGenePred -genePredExt $genome.augCGP.gtf $genome.augCGP.gp
. Then restart the pipeline with --augustus-cgp or --augustus-pb set, and it should proceed with parent gene assignment, homGeneMapping and consensus gene set building.
from comparative-annotation-toolkit.
One other question -- did you ever train your CGP model? CGP relies heavily on being trained for the current alignment. I am still working on automating the process, but there is a guide in more recent versions of the augustus repo. There is a new graduate student in Mario's lab called Lizzie who is working on this. I can hook you two up via email. I haven't heard from them in a few weeks, but they promised me a much more straightforward training approach so that I can integrate it into the pipeline.
from comparative-annotation-toolkit.
No, I ended up not retraining Augustus. Partly because I don't have a reference gene set, and partly because the human param seems to work pretty well in general. If there is a better solution, I am happy to try it out.
About the ppx thing, thanks, I will probably try your hack sooner than later (i.e. feeding the output as augustus_pb output).
from comparative-annotation-toolkit.
After pulling the lastest version of the master (071117) with the hope of using the new self-training module with Augustus-CGP, I finally tried feeding external annotation following your advice. Basically, I converted gff3 into .gtf and .gp and added that to the --work-dir/augustus_pb directory
and restarted CAT with the --augustus-pb option.
The pipeline failed, complaining that there were no data to run augustus-pb. So, I aligned the cDNA corresponding to the external annotation using gmap and added the resulting bam under [ISO_SEQ_BAM] in the config file.
Now, I get the error message pasted below. Apparently, no PB hints are found still ...
---TOIL WORKER OUTPUT LOG---
INFO:toil:Running Toil version 3.5.2-378fffa320ded1ed1ebade5ec7d01138699db3f6.
WARNING:toil.resource:Can't find resource for leader path '/n/home01/lassance/Comparative-Annotation-Toolkit/cat'
WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/n/home01/lassance/Comparative-Annotation-Toolkit', name='cat.augustus_pb', fromVirtualEnv=False)
WARNING:toil.resource:Can't find resource for leader path '/n/home01/lassance/Comparative-Annotation-Toolkit/cat'
WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/n/home01/lassance/Comparative-Annotation-Toolkit', name='cat.augustus_pb', fromVirtualEnv=False)
Traceback (most recent call last):
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/worker.py", line 340, in main
job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1270, in _runner
returnValues = self._run(jobGraph, fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1217, in _run
return self.run(fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1383, in run
rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
File "/n/home01/lassance/Comparative-Annotation-Toolkit/cat/augustus_pb.py", line 64, in setup
raise RuntimeError('No PB hints found.')
RuntimeError: No PB hints found.
ERROR:toil.worker:Exiting the worker because of a failed job on host holy2a08207.rc.fas.harvard.edu
WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count of job 'setup' n/U/jobwaP8zf with ID n/U/jobwaP8zf to 0
Any idea of what is happening here?
from comparative-annotation-toolkit.
I didn't think through this hack entirely -- as you encountered, the pipeline checks for an ISO_SEQ_BAM
before allowing the PB module to run.
But, what you did should have worked -- did you delete the --work-dir/hints_database
directory? If you go into that folder and grep src=PB
do you get hints?
The pipeline is supposed to re-run key steps like that if the config file has a different hash than the previous run, maybe that process broke somehow. I will look into it. In the meantime, if there are no src=PB
hints in your hints GFF files, then you need to have it re-do the hints building step by removing that folder. That grep step is effectively what the pipeline is doing at this step, and finding nothing, and so raising an exception.
from comparative-annotation-toolkit.
Mmmh, it makes sense. I have deleted the hints_database and restarted the pipeline. Will post an update as soon as I can. Thanks again!
from comparative-annotation-toolkit.
Hi,
Just wanted to let you know that with the latest commit (#a69c959) I have introduced the ability to provide a protein FASTA and have hints be automatically generated by performing BLAT alignments of this file against the genomes listed in the config.
I will next provide a way to directly feed in your own extra GFF, if desired. I have to think a bit about how to perform this, because augustus config files need to be tuned for the types of hints being provided, and allowing open-ended hints could lead to problems.
from comparative-annotation-toolkit.
Thanks for the info, it sounds extremely useful.
About the original issue, the --augustus-pb completed succesfully. I then fed CAT my own manually curated .gp and .gtf and relaunched so that these files where used for homGenMapping.
Also, as you suggested to do so, I am running augCGP using --cgp-train-num-exons 10000 to train Augustus (I have also upgraded to version 3.3.0 of Augustus). This seems to be running for a while; how long to you expect this step to take?
from comparative-annotation-toolkit.
from comparative-annotation-toolkit.
Ah, that is it, I am using the previous commit. Good to know that you already fixed this (you're the best!). Will update accordingly then. Thanks!
BTW, is --cgp-train-num-exons 10000 a good setting?
from comparative-annotation-toolkit.
from comparative-annotation-toolkit.
hmm, I could not figure out to what revision version 3.3.0 corresponds to, but I was encouraged by what is in the description of that release as compatibility with CAT is explicitly mentioned:
List of changes from version 3.2.3 to 3.3 (until July 11th, 2017)
- new program ESPOCA to estimate selective pressure on codon alignments
- gene finding on ancestral genomes is enabled
- new default parameters for comparative gene prediction (CGP)
- clade parameters training for CGP
- compatibility to Ian Fiddes' Comparative Annotation Toolkit (CAT)
- new scripts eval_dualdecomp.pl,
- more tolerant tree parsing
- bugfixes in augustus, joingenes, load2sqlitedb, transMap2hints.pl, splitMfasta.pl, intron2exex.pl,
aln2wig
- new functionality in homGeneMapping, joingenes
from comparative-annotation-toolkit.
Did you ever get CGP to work? I have (I think) finally gotten the protein based evidence portion to work, which should help CGP prediction in species without RNA-seq or highly divergent species.
from comparative-annotation-toolkit.
Thanks for the info. I will give it a try.
from comparative-annotation-toolkit.
Hi Ian,
I have a job running now with protein data.
Things seem to not be working properly:
---TOIL WORKER OUTPUT LOG---
INFO:toil:Running Toil version 3.5.2-378fffa320ded1ed1ebade5ec7d01138699db3f6.
WARNING:toil.resource:Can't find resource for leader path '/n/home01/lassance/Comparative-Annotation-Toolkit/cat'
WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/n/home01/lassance/Comparative-Annotation-Toolkit', name='cat.hints_db', fromVirtualEnv=False)
WARNING:toil.resource:Can't find resource for leader path '/n/home01/lassance/Comparative-Annotation-Toolkit/cat'
WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/n/home01/lassance/Comparative-Annotation-Toolkit', name='cat.hints_db', fromVirtualEnv=False)
Traceback (most recent call last):
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/worker.py", line 340, in main
job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1270, in _runner
returnValues = self._run(jobGraph, fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1217, in _run
return self.run(fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1383, in run
rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
File "/n/home01/lassance/Comparative-Annotation-Toolkit/cat/hints_db.py", line 309, in run_protein_blat
return job.fileStore.writeGlobalFile(tmp_psl)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/fileStore.py", line 1646, in writeGlobalFile
fileStoreID = self.jobStore.writeFile(absLocalFileName, cleanupID)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/jobStores/fileJobStore.py", line 212, in writeFile
shutil.copyfile(localFilePath, absPath)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/shutil.py", line 82, in copyfile
with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '/scratch/tmp/toil-2dda024a-60b3-4486-9070-bb8aeead8cca/tmpSNEzeR/6db70d1e-d16d-4e04-83a2-6c48fd00c09f/holy2a08107.rc.fas.harvard.edu.20716.9229939338.tmp'
ERROR:toil.worker:Exiting the worker because of a failed job on host holy2a08107.rc.fas.harvard.edu
WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count of job 'run_protein_blat' A/6/jobbLlQWK with ID A/6/jobbLlQWK to 0
WARNING:toil.jobGraph:We have increased the default memory of the failed job 'run_protein_blat' A/6/jobbLlQWK to 13958643712 bytes
Thanks for your help troubleshooting this!
from comparative-annotation-toolkit.
Added a commit on this. The issue here is that the protein-genome alignments that BLAT produces sometimes produce invalid alignments, which we filter with pslCheck
. This is a hack, but it's way faster than trying to use exonerate.
In your case, it seems that every single alignment failed for that specific input chunk, and so the output file never got created. This should bypass that now, but I don't have a test set for this case.
from comparative-annotation-toolkit.
Good. Your patch seems to work for that error.
Now I start seeing another set of error messages (related to a bam sorting step):
---TOIL WORKER OUTPUT LOG---
INFO:toil:Running Toil version 3.5.2-378fffa320ded1ed1ebade5ec7d01138699db3f6.
WARNING:toil.resource:Can't find resource for leader path '/n/home01/lassance/Comparative-Annotation-Toolkit/cat'
WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/n/home01/lassance/Comparative-Annotation-Toolkit', name='cat.hints_db', fromVirtualEnv=False)
WARNING:toil.fileStore:Starting job i/c/jobFOCViW/g/tmpVg8kib.tmp with less than 10% of disk space remaining.
WARNING:toil.resource:Can't find resource for leader path '/n/home01/lassance/Comparative-Annotation-Toolkit/cat'
WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/n/home01/lassance/Comparative-Annotation-Toolkit', name='cat.hints_db', fromVirtualEnv=False)
Traceback (most recent call last):
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/worker.py", line 340, in main
job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1270, in _runner
returnValues = self._run(jobGraph, fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1217, in _run
return self.run(fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1383, in run
rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
File "/n/home01/lassance/Comparative-Annotation-Toolkit/cat/hints_db.py", line 155, in namesort_bam
bam_path = job.fileStore.readGlobalFile(bam_file_id)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/fileStore.py", line 1658, in readGlobalFile
self.jobStore.readFile(fileStoreID, localFilePath)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/jobStores/fileJobStore.py", line 251, in readFile
shutil.copyfile(jobStoreFilePath, localFilePath)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/shutil.py", line 84, in copyfile
copyfileobj(fsrc, fdst)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/shutil.py", line 52, in copyfileobj
fdst.write(buf)
IOError: [Errno 28] No space left on device
ERROR:toil.worker:Exiting the worker because of a failed job on host holy2a02206.rc.fas.harvard.edu
WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count of job 'namesort_bam' N/u/jobNEbiKh with ID N/u/jobNEbiKh to 0
---TOIL WORKER OUTPUT LOG---
INFO:toil:Running Toil version 3.5.2-378fffa320ded1ed1ebade5ec7d01138699db3f6.
WARNING:toil.resource:Can't find resource for leader path '/n/home01/lassance/Comparative-Annotation-Toolkit/cat'
WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/n/home01/lassance/Comparative-Annotation-Toolkit', name='cat.hints_db', fromVirtualEnv=False)
WARNING:toil.fileStore:Starting job t/5/jobQu96hO/g/tmp3xDeNW.tmp with less than 10% of disk space remaining.
WARNING:toil.resource:Can't find resource for leader path '/n/home01/lassance/Comparative-Annotation-Toolkit/cat'
WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/n/home01/lassance/Comparative-Annotation-Toolkit', name='cat.hints_db', fromVirtualEnv=False)
[E::bgzf_flush] hwrite error (wrong size)
Traceback (most recent call last):
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/worker.py", line 340, in main
job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1270, in _runner
returnValues = self._run(jobGraph, fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1217, in _run
return self.run(fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1383, in run
rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
File "/n/home01/lassance/Comparative-Annotation-Toolkit/cat/hints_db.py", line 181, in namesort_bam
file_id = write_bam(r, ns_handle)
File "/n/home01/lassance/Comparative-Annotation-Toolkit/cat/hints_db.py", line 151, in write_bam
outf_h.write(rec)
File "pysam/libcalignmentfile.pyx", line 1334, in pysam.libcalignmentfile.AlignmentFile.write (pysam/libcalignmentfile.c:15439)
File "pysam/libcalignmentfile.pyx", line 1363, in pysam.libcalignmentfile.AlignmentFile.write (pysam/libcalignmentfile.c:15367)
IOError: sam_write1 failed with error code -1
ERROR:toil.worker:Exiting the worker because of a failed job on host holy2a04106.rc.fas.harvard.edu
WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count of job 'namesort_bam' r/c/jobNvseen with ID r/c/jobNvseen to 0
---TOIL WORKER OUTPUT LOG---
INFO:toil:Running Toil version 3.5.2-378fffa320ded1ed1ebade5ec7d01138699db3f6.
WARNING:toil.resource:Can't find resource for leader path '/n/home01/lassance/Comparative-Annotation-Toolkit/cat'
WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/n/home01/lassance/Comparative-Annotation-Toolkit', name='cat.hints_db', fromVirtualEnv=False)
WARNING:toil.fileStore:Starting job t/5/jobQu96hO/g/tmpNYLQ55.tmp with less than 10% of disk space remaining.
WARNING:toil.resource:Can't find resource for leader path '/n/home01/lassance/Comparative-Annotation-Toolkit/cat'
WARNING:toil.resource:Can't localize module ModuleDescriptor(dirPath='/n/home01/lassance/Comparative-Annotation-Toolkit', name='cat.hints_db', fromVirtualEnv=False)
sambamba-sort: Unable to write to stream
Traceback (most recent call last):
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/worker.py", line 340, in main
job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1270, in _runner
returnValues = self._run(jobGraph, fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1217, in _run
return self.run(fileStore)
File "/n/home01/lassance/.conda/envs/ENV_PROGRESSIVECACTUS/lib/python2.7/site-packages/toil/job.py", line 1383, in run
rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
File "/n/home01/lassance/Comparative-Annotation-Toolkit/cat/hints_db.py", line 161, in namesort_bam
tools.procOps.run_proc(cmd, stdout=name_sorted)
File "/n/home01/lassance/Comparative-Annotation-Toolkit/tools/procOps.py", line 36, in run_proc
pl.wait()
File "/n/home01/lassance/Comparative-Annotation-Toolkit/tools/pipeline.py", line 1127, in wait
self.raiseIfExcept()
File "/n/home01/lassance/Comparative-Annotation-Toolkit/tools/pipeline.py", line 1085, in raiseIfExcept
p.raiseIfExcept()
File "/n/home01/lassance/Comparative-Annotation-Toolkit/tools/pipeline.py", line 749, in raiseIfExcept
raise self.exceptInfo[0], self.exceptInfo[1], self.exceptInfo[2]
ProcException: process exited 1: sambamba sort -t 4 -m 15G -o /dev/stdout -n /dev/stdin
ERROR:toil.worker:Exiting the worker because of a failed job on host holy2a02206.rc.fas.harvard.edu
WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count of job 'namesort_bam' 8/b/job37GqEO with ID 8/b/job37GqEO to 0
I think I captured the different types of error message I see. My intuition is that it has to do with some parametrization. What do you think?
Thanks!
JM
from comparative-annotation-toolkit.
Hmm. The first error is easy -- the location of your $TMPDIR is out of space. Toil automatically places all of its work in that location unless you specify the --workDir
flag. If you do specify that flag, that location needs to be accessible by all nodes on the cluster (and preferably something fast, i.e. not NFS). The other errors are more vague, but my guess is that they are both symptoms of the same problem -- no space to write.
If clearing enough temp space is not an option, does your file system setup allow for using the --workDir
flag? I personally always use it, because I also have issues with other people filling the tempdir on cluster nodes. If you set --workDir
to a shared filesystem, you should probably also set --disableCaching
to avoid needless file copying.
from comparative-annotation-toolkit.
OK, seems that I was a bit quick crying for help here, sorry about that. I restarted the pipeline and those failed jobs got repaired successfully. I may try the --workDir if this re-occurs. By default I specify that toil jobs should land on nodes that have a least 20G of temporary disk space available, which of course doesn't prevent someone else to cause trouble.
from comparative-annotation-toolkit.
reviving this thread, although I am not sure if the mis-behavior I observe has to do with augustus-pb per se.
CAT finish fine but I see abnormally long gene prediction(s).
chr1 CAT gene 483503 193059942 . - . ID=BEAST_G0000001;Name=None;gene_biotype=unknown_likely_coding;source_gene=None;source_gene_common_name=None;transcript_modes=augPB,augCGP
This 'thing' contains 2073 transcripts...
Is it something you have seen before?
from comparative-annotation-toolkit.
False fusions are often a problem with these kinds of ab-initio predictions, but that is crazy crazy long. Is it possible to share the assembly hub? If I remember correctly, you provided it with a pre-formed dataset derived from augustus-ppx, right? In that case, this should exist in the input set (CAT will do nothing with it past that point but classify it and decide whether to include or exclude it). Or are you actually running AugustusPB now? If that is the case, what hints are in the hints database? These false fusions occur most often when the model has only sequence information to go off of.
from comparative-annotation-toolkit.
Also just saw that both CGP and PB predicted the same thing. Also very interesting.
from comparative-annotation-toolkit.
I mapped the transcripts from my curated annotation using gmap and used that as IsoSeq hints to run PB (previously, CAT complained that I was not providing data). I guess I was too optimistic thinking that AugustusPB would generate preds corresponding perfectly to my curated annotation as it sounds like there could be not enough info to generate reliable prediction with PB. I may roll back to the initial plan, and replace the PB prediction by my own gtf and regenerate the consensus if you think that this is what is causing the prb.
It is a bit confusing that the gene as transcript_modes=augPB,augCGP
because individual transcripts have either transcript_modes=augCGP
or transcript_mode=augPB
, but never the two together. Is there such thing as a 'proximity' rule to define when transcripts belong to the same gene in the consensus (i.e. if two things are less than x bp from each other, than they belong to the same thing, a bit like what Cufflinks does for example)
I ran CAT without the --assembly-hub flag. I guess re-runnning CAT with that option would produce the assembly hub.
from comparative-annotation-toolkit.
from comparative-annotation-toolkit.
Here is the gp_info associated with that gene.
Will restart CAT momentarily to generate the hub.
from comparative-annotation-toolkit.
from comparative-annotation-toolkit.
What does the first few lines of the PB .gp file look like? I think something must be wrong with the name2
field, as I rely on that field to know what gene we are looking at in novel predictions.
from comparative-annotation-toolkit.
here are a few lines:
augPB-67.t1 chr1 - 483502 517463 483502 517463 6 483502,490014,493354,496742,497905,517254, 484422,490138,493582,497546,498197,517463, 0 augPB-67 cmpl cmpl 1,0,0,0,2,0,
augPB-68.t1 chr1 + 615108 631032 615108 631032 5 615108,615944,620283,623370,630970, 615240,616172,620407,623382,631032, 0 augPB-68 cmpl cmpl 0,0,0,1,1,
augPB-69.t1 chr1 - 653159 689365 653159 689365 6 653159,675347,679440,682991,684071,689158, 653188,676154,679564,683219,684872,689365, 0 augPB-69 cmpl cmpl 1,1,0,0,0,0,
augPB-70.t1 chr1 + 1276604 1430174 1276604 1430174 2 1276604,1430142, 1277922,1430174, 0 augPB-70 cmpl cmpl 0,1,
augPB-71.t1 chr1 - 1540364 1546767 1540364 1546767 4 1540364,1543911,1545556,1546629, 1541293,1544035,1545775,1546767, 0 augPB-71 cmpl cmpl 1,0,0,0,
augPB-72.t1 chr1 - 2029082 2029328 2029082 2029328 1 2029082, 2029328, 0 augPB-72 cmpl cmpl 0,
augPB-73.t1 chr1 + 2133805 2298726 2133805 2298726 9 2133805,2148173,2148821,2150309,2151995,2155825,2210621,2264193,2298716, 2134056,2148465,2149628,2150525,2152119,2155899,2210664,2264293,2298726, 0 augPB-73 cmpl cmpl 0,2,0,0,0,1,0,1,2,
augPB-73.t2 chr1 + 2149607 2317495 2149607 2317495 11 2149607,2150309,2151995,2155825,2210621,2298684,2302178,2302828,2304322,2313142,2316584, 2149628,2150525,2152119,2155899,2210664,2298760,2302470,2303635,2304541,2313266,2317495, 0 augPB-73 cmpl cmpl 0,0,0,1,0,1,2,0,0,0,1,
augPB-73.t3 chr1 + 2149607 2322145 2149607 2322145 13 2149607,2150309,2151995,2155825,2210621,2264193,2276261,2302178,2302828,2304322,2313142,2316584,2322099, 2149628,2150525,2152119,2155899,2210664,2264293,2276516,2302470,2303635,2304541,2313266,2317491,2322145, 0 augPB-73 cmpl cmpl 0,0,0,1,0,1,2,2,0,0,0,1,2,
augPB-74.t1 chr1 + 2392595 2459244 2392595 2459244 4 2392595,2392933,2394632,2459218, 2392616,2393152,2394756,2459244, 0 augPB-74 cmpl cmpl 0,0,0,1,
from comparative-annotation-toolkit.
That is weird, I don't understand why it is broken then. I hate to ask, but is there any way you could share your database files? I may end up needing the full input to consensus finding to track this down, but for now I think the $genome.db file will help.
from comparative-annotation-toolkit.
Something crossed my mind: I noticed that you fixed a bug in tools/transcripts.py, which I had not noticed.
Do you think this may have anything to do with the erroneous consensus generation?
from comparative-annotation-toolkit.
I don't think so.
I think the bug is here:
I keep track of gene IDs to assign unique identifiers, handling the case where sorted order may not be gene order. source_gene
should always be None
for a CGP/PB transcript that was not assigned a parental gene from transMap, and so then I assign it to the name2
field. Somehow I think this is not incrementing properly, but it does for my test cases. For that reason, I was going to look at your database and see what your AugPbAlternativeGenes
table contained.
from comparative-annotation-toolkit.
Finally getting back to this after doing some testing.
First, I don't think there is a bug, but more likely some inconsistencies were introduced when I was troubleshooting the issues resulting form the update of the toil module. I ended up deleting the database, the hgm folder and restarted the pipeline. Now, the consensus does not contain this very long gene anymore.
Second, and this is somewhat secondary, I tried to replace the augPB.gtf automatically produced by my own. However, after CAT finished, I could see that the original got restored and used. So the hack did not work.
from comparative-annotation-toolkit.
It should work. However, you will need to replace the genePred, not the GTF. The GTF is the direct output of AugustusPB, but CAT works in genePred space.
So you will want to use the Kent program gtfToGenePred
to replace that file. I realized one other hack that would need to be done -- the pipeline relies on a consistent naming scheme, where each augustusPB transcript ID is of the form augPB-X.tY
and gene ID is of the form augPB-X
. I am defending next week, so after the Thanksgiving break I should have time to add the ability to directly incorporate external gene predictions in the process.
from comparative-annotation-toolkit.
I think I followed those steps, but will doublecheck.
I am defending next week
I should let you focus on that then and refrain from bugging you until after Thanksgiving. Good luck with your defense!
from comparative-annotation-toolkit.
Can I close this, or are there still outstanding issues?
from comparative-annotation-toolkit.
I guess it can be closed; as is I never got this to work the way I wanted. It seems easier to merge the CAT output with an external gff3 afterwards.
from comparative-annotation-toolkit.
Those CAT actually output the protein evidence tracks separately besides in the AssemblyHub?
from comparative-annotation-toolkit.
If you didn't get it to work, I will fix it.
I am going to start a new issue and add a method to directly provide additional transcripts to CAT. That should be easy.
from comparative-annotation-toolkit.
Related Issues (20)
- Capture stderr with Luigi
- Redo clustering in filtering of transmap step
- ANNOTATION in the config file HOT 3
- CAT fails on cleanup when there are .nfs files HOT 1
- Error when running CAT with --augustus-cgp mode HOT 12
- Unable to find CAT dependencies HOT 5
- Improve docker installation and docs
- Potential enhancements
- TransMapPsl failed
- example data erro in TransMapPsl
- Error at filter_transmap.py HOT 2
- CAT T2T CHM13 GFF3 have gene_name mismatches between gene and transcript records
- Got exit code 1 (indicating failure) from job _toil_worker cgp file:/Comparative-Annotation-Toolkit/Darmor_A01_qwe/toil/augustus_cgp/jobStore kind-cgp/instance-cf_rpblx. HOT 1
- CAT consensus track needs filters
- CAT should not use Ensembl transcript names
- Concensue bigGenePred column name incorrect HOT 1
- CAT can't find gene duplications
- Unable to run annotation pipeline. Target h2tg000040l block 10438982-10444550 exceeds sequence length 10441754
- transMapPsl doesn't show base mismatches in browser HOT 2
- CAT should handle internal exon expansions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from comparative-annotation-toolkit.