The mason components needed to run the solgenomics.net website
solgenomics / bio-genomeupdate Goto Github PK
View Code? Open in Web Editor NEWTools for updating a genome assembly
Tools for updating a genome assembly
Columns for Trim curation file:
Create new classes
-r Fasta file of reference (required)
-q Fasta file of query (assembled and singleton BACs, required)
-c Contig or component AGP file for reference (includes scaffold gaps)
-s Chromosome AGP file for reference (with only scaffolds and gaps)
-t <TPF_file> Original TPF file (mandatory)
-s scaffold AGP file (mandatory)
-c chromosome AGP file (mandatory)
Should include switch over cases. For testing GFF -> TPF pipeline for bld 3.0
The problem is that mummer does not report alignments to N's so BAC regions that extend beyond the WGS contig are not considered. Need to compute "overhangs" from BAC length and seq_in_clusters.
Add methods AlignmentCoordsGroup.pm. Removing BAC alignments that align in non co-linear order to ref chr.
Needed for BAC alignments to all chrs.
Fix get_gap_overlap()
Then remove err msg from align_BACends_group_coords.pl and test
AEKE02023669 ? SL2.50sc05925 MINUS
AC244870 ? SL2.50sc05925 PLUS
AC244937 ? SL2.50sc05925 MINUS contig469
AC244803 ? SL2.50sc05925 PLUS contig469
AC244944 ? SL2.50sc05925 PLUS contig469
AC254768 ? SL2.50sc05925 MINUS contig469
AEKE02023661 ? SL2.50sc05925 MINUS
Contig469_right_1000 aligns to -ive AEKE02023661.1
Contig469_right_1000 aligns to -ive end of AC244937
Contig469_left_1000 aligns to -ive end of AC254768
Contig469_left_1000 aligns to middle of AC244870
Correct order
AEKE02023669 ? SL2.50sc05925 MINUS
AC244870 ? SL2.50sc05925 PLUS
AC254768 ? SL2.50sc05925 MINUS contig469
AC244944 ? SL2.50sc05925 PLUS contig469
AC244803 ? SL2.50sc05925 PLUS contig469
AC244937 ? SL2.50sc05925 MINUS contig469
AEKE02023661 ? SL2.50sc05925 MINUS
68kb 99% identical alignment between AC244870 and AC254768
9.2kb 99% identical alignment between AC244937 and AEKE02023661
get_tpf_with_bacs_inserted
Print filtered delta output file without the BACs that are aligned out of order.
Is it required???
Check using GRC tpf_solo pipeline.
uniq in mixed, outoforder
common in both_errors
Will be substituted in for "ContigX" placeholder from the group_coords sdtout.
Use 500 1 convention of nucmer
Should be able to read in set of AGP/TPF files and produce tabular report
Maybe move .PL /scripts into scripts dir. Only if history is preserved.
Print fasta of query seqs that did not align
The location is relative to the original TPF line but multiple insertions are done with respect to the original line number, not the new location. Need to maintain a separate offset counter for 'before' and 'after' insertions for each original TPF line.
TPF spec v1.8 added biological feature in a non-sequence line (centromere or heterochromatin).
Attribute (accession_prefix_last_base) does not pass the type constraint because: The string, -1, was not a positive coordinate at /usr/local/lib/x86_64-linux-gnu/perl/5.20.2/Moose/Object.pm line 24
Moose::Object::new('Bio::GenomeUpdate::SP::SPLine', 'chromosome', 10, 'accession_prefix', 'AEKE02007654', 'accession_suffix', 'AC239654', 'accession_prefix_orientation', '-', 'accession_suffix_orientation', '+', 'accession_prefix_last_base', -1, 'accession_suffix_first_base', 1, 'comment', 'BAC AC239654 is contained within WGS contig AEKE02007654 from previous version. Designates switch point from WGS contig to BAC.') called at /home/surya/work/Eclipse/Bio-GenomeUpdate/lib/Bio/GenomeUpdate/TPF.pm line 1628
File: query_bacends.fasta
Modify copy_updated_coordinates_to_vcf to sort -V its output file so that features end up in correct order and are ready for compression and display in jbrowse.
See Jeremy's controller code
nof components
nof gaps
avg, std dev
% covered by components and gaps
find way to include paths of lib modules in scripts that call them without hardcoding them.
Check AlignCoordGroup.pm
sample nucmer output
18361542 18377740 1 16206 16199 16206 99.94 70787664 203766 0.02 7.95 1 1 SL2.50ch03 Contig90
53614018 53614614 17944 17348 597 597 99.66 70787664 203766 0.00 0.29 1 -1 SL2.50ch03 Contig90
500bp.mixedoutoforder.agp.group_coords.stdout
Contig90 SL2.50ch03 18361542 18564072 202530 1 203766 203766 154944 1 1 0 Contains 0 0 48822 0 596 SL2.50ch03:597:53614018:53614614
query BAC aligns to both + and - strand of ref chr
Columns for switch point curation file:
Align BACs to all chrs and validate if they align to only the chr they belong to. If not, add them to the no_chr set.
add #,length of gaps covered for components/contigs and scaffolds to groupcoords report
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.