nlapalu / sddetector Goto Github PK

View Code? Open in Web Editor NEW

12.0 12.0 5.0 65.82 MB

Segmental duplication detection tool

License: GNU General Public License v3.0

Python 100.00%

sddetector's People

Contributors

Stargazers

Watchers

Forkers

altingia yeban brothersgrimmm yuzhenpeng

sddetector's Issues

Using TBLASTX output

This is a feature request (was briefly discussed in #2).

Being able to use TBLASTX output with SDDetector would be very helpful. First, this would ideally not require repeat masking; precise repeat masking is hard for non-model species and this can increase false-positives. Second, TBLASTX is likely to have a higher signal-vs-noise ratio with regards to the number of alignments generated. This can make alignment chaining more efficient for larger genomes (#2).

Identification of Long SDs

Hi!

Thank you for developing the tool! I'm working on a vertebrate genome (2.5Gb) and I was trying to run this as this looked the easiest. I'm aware that you have not tested on such large genomes based on other issues.

I have both the soft-masked and hard-masked genome with REd (Hani Girgis).

We are interested in looking at long SDs, i.e greater than 10kb long. So, I had a few questions:

After finishing the blast and getting an XML, I'm running the "detector" portion of the pipeline for soft masked genome. Should I just increase the chain length to 10000, and the maximum gap to 5000? Would that make sense? What does your intuition suggest? Should I change the minimum match length (-t) or overlap (-p) to something else?
If I'm using the hard-filtered genome, should I use -db_hard_mask? and if so, what should the value for it?
There is some issue as to how the database is being used/called, as I keep getting this error consistently:
Traceback (most recent call last):
File "/home/harish/miniconda3/envs/py2/bin/segmental_duplication_detector.py", line 408, in <module> detector.runSDDetection()
File "/home/harish/miniconda3/envs/py2/bin/segmental_duplication_detector.py", line 110, in runSDDetection self.chainAlignments(maxGap=self.maxGap, chainLength=self.chainLength)
File "/home/harish/miniconda3/envs/py2/bin/segmental_duplication_detector.py", line 237, in chainAlignments cur.executescript(sql)
sqlite3.OperationalError: database is locked

Any ideas why the above error pops up? My command is this: segmental_duplication_detector.py P2.xml xml P2_0.9_5000_10000.gff3 :memory: -i 0.9 -g 5000 -l 10000 -a --procs 50

Thank you!

Problem at installation

Hello,

I would like to use SDDetector to detect duplicated genes in a reference genome fasta file. When I try to install the program I obtain this error:

Could you help me to resolve it? Is there available any docker version in order to avoid this kind of errors?

Best and thanks in advance

Pablo

Run time

Hi,

I ran SDDetector using a 13 GB BLAST XML input. It has run for 15 days now but not finished. Is this normal? Please could you advice how I might be able to speed it up?

Error in running segmental_duplication_detector.py

Hi Nicolas,

I have been trying to run SDdetector but end up with the following error.

$/home/softwares/SDDetector/bin/segmental_duplication_detector.py blast_masked1.tab tab sdd_0.9_3000_5000.gff3 :memory: -g 3000 -l 5000 -a

Traceback (most recent call last): File "/home/softwares/SDDetector/bin/segmental_duplication_detector.py", line 257, in <module> logging.getLogger().setLevel(logLevel) NameError: name 'logLevel' is not defined

Please help me fix it.

Thanks and regards,
Soumya,
CDFD,
Hyderabad,
India

Segmentation fault

Hi,

I am encountering a segmentation fault when running SSDetector on my data (See error below).
My genome is 2.4Gbp and the blast.tab file is 7.3 Gigabytes. Is it possible that the size of my dataset is causing this error?

Thanks,
Tim.

INFO:root:SQLite db stored in memory
INFO:root:Parsing alignments
INFO:root:Loading alignments into database
DEBUG:root:91215553 HSP parsed
INFO:root:Exporting matches after loading in database in gff3 format, file: sdd_0.9_3000_5000.gff3.loading
INFO:root:Removing self-self matches
INFO:root:Exporting matches after removing self-matches in gff3 format, file: sdd_0.9_3000_5000.gff3.selfmatch
INFO:root:Removing alignments below the identity threshold: 0.9
/var/spool/pbs/mom_priv/jobs/66335.flm1.SC: line 18: 4180 Segmentation fault python ~/PROGRAMS/SDDetector/bin/segmental_duplication_detector.py blast.tab tab sdd_0.9_3000_5000.gff3 :memory: -g 3000 -l 5000 -a -v 3

nlapalu / sddetector Goto Github PK

sddetector's People

Contributors

Stargazers

Watchers

Forkers

sddetector's Issues

Using TBLASTX output

Identification of Long SDs

Problem at installation

Run time

Error in running segmental_duplication_detector.py

Segmentation fault

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent