nlapalu / sddetector Goto Github PK
View Code? Open in Web Editor NEWSegmental duplication detection tool
License: GNU General Public License v3.0
Segmental duplication detection tool
License: GNU General Public License v3.0
This is a feature request (was briefly discussed in #2).
Being able to use TBLASTX output with SDDetector would be very helpful. First, this would ideally not require repeat masking; precise repeat masking is hard for non-model species and this can increase false-positives. Second, TBLASTX is likely to have a higher signal-vs-noise ratio with regards to the number of alignments generated. This can make alignment chaining more efficient for larger genomes (#2).
Hi!
Thank you for developing the tool! I'm working on a vertebrate genome (2.5Gb) and I was trying to run this as this looked the easiest. I'm aware that you have not tested on such large genomes based on other issues.
I have both the soft-masked and hard-masked genome with REd (Hani Girgis).
We are interested in looking at long SDs, i.e greater than 10kb long. So, I had a few questions:
Traceback (most recent call last):
File "/home/harish/miniconda3/envs/py2/bin/segmental_duplication_detector.py", line 408, in <module> detector.runSDDetection()
File "/home/harish/miniconda3/envs/py2/bin/segmental_duplication_detector.py", line 110, in runSDDetection self.chainAlignments(maxGap=self.maxGap, chainLength=self.chainLength)
File "/home/harish/miniconda3/envs/py2/bin/segmental_duplication_detector.py", line 237, in chainAlignments cur.executescript(sql)
sqlite3.OperationalError: database is locked
Any ideas why the above error pops up? My command is this: segmental_duplication_detector.py P2.xml xml P2_0.9_5000_10000.gff3 :memory: -i 0.9 -g 5000 -l 10000 -a --procs 50
Thank you!
Hi,
I ran SDDetector using a 13 GB BLAST XML input. It has run for 15 days now but not finished. Is this normal? Please could you advice how I might be able to speed it up?
Hi Nicolas,
I have been trying to run SDdetector but end up with the following error.
$/home/softwares/SDDetector/bin/segmental_duplication_detector.py blast_masked1.tab tab sdd_0.9_3000_5000.gff3 :memory: -g 3000 -l 5000 -a
Traceback (most recent call last): File "/home/softwares/SDDetector/bin/segmental_duplication_detector.py", line 257, in <module> logging.getLogger().setLevel(logLevel) NameError: name 'logLevel' is not defined
Please help me fix it.
Thanks and regards,
Soumya,
CDFD,
Hyderabad,
India
Hi,
I am encountering a segmentation fault when running SSDetector on my data (See error below).
My genome is 2.4Gbp and the blast.tab file is 7.3 Gigabytes. Is it possible that the size of my dataset is causing this error?
Thanks,
Tim.
INFO:root:SQLite db stored in memory
INFO:root:Parsing alignments
INFO:root:Loading alignments into database
DEBUG:root:91215553 HSP parsed
INFO:root:Exporting matches after loading in database in gff3 format, file: sdd_0.9_3000_5000.gff3.loading
INFO:root:Removing self-self matches
INFO:root:Exporting matches after removing self-matches in gff3 format, file: sdd_0.9_3000_5000.gff3.selfmatch
INFO:root:Removing alignments below the identity threshold: 0.9
/var/spool/pbs/mom_priv/jobs/66335.flm1.SC: line 18: 4180 Segmentation fault python ~/PROGRAMS/SDDetector/bin/segmental_duplication_detector.py blast.tab tab sdd_0.9_3000_5000.gff3 :memory: -g 3000 -l 5000 -a -v 3
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.