Code Monkey home page Code Monkey logo

sddetector's People

Contributors

nlapalu avatar yeban avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sddetector's Issues

Using TBLASTX output

This is a feature request (was briefly discussed in #2).

Being able to use TBLASTX output with SDDetector would be very helpful. First, this would ideally not require repeat masking; precise repeat masking is hard for non-model species and this can increase false-positives. Second, TBLASTX is likely to have a higher signal-vs-noise ratio with regards to the number of alignments generated. This can make alignment chaining more efficient for larger genomes (#2).

Identification of Long SDs

Hi!

Thank you for developing the tool! I'm working on a vertebrate genome (2.5Gb) and I was trying to run this as this looked the easiest. I'm aware that you have not tested on such large genomes based on other issues.

I have both the soft-masked and hard-masked genome with REd (Hani Girgis).

We are interested in looking at long SDs, i.e greater than 10kb long. So, I had a few questions:

  1. After finishing the blast and getting an XML, I'm running the "detector" portion of the pipeline for soft masked genome. Should I just increase the chain length to 10000, and the maximum gap to 5000? Would that make sense? What does your intuition suggest? Should I change the minimum match length (-t) or overlap (-p) to something else?
  2. If I'm using the hard-filtered genome, should I use -db_hard_mask? and if so, what should the value for it?
  3. There is some issue as to how the database is being used/called, as I keep getting this error consistently:
    Traceback (most recent call last):
    File "/home/harish/miniconda3/envs/py2/bin/segmental_duplication_detector.py", line 408, in <module> detector.runSDDetection()
    File "/home/harish/miniconda3/envs/py2/bin/segmental_duplication_detector.py", line 110, in runSDDetection self.chainAlignments(maxGap=self.maxGap, chainLength=self.chainLength)
    File "/home/harish/miniconda3/envs/py2/bin/segmental_duplication_detector.py", line 237, in chainAlignments cur.executescript(sql)
    sqlite3.OperationalError: database is locked

Any ideas why the above error pops up? My command is this: segmental_duplication_detector.py P2.xml xml P2_0.9_5000_10000.gff3 :memory: -i 0.9 -g 5000 -l 10000 -a --procs 50

Thank you!

Problem at installation

Hello,

I would like to use SDDetector to detect duplicated genes in a reference genome fasta file. When I try to install the program I obtain this error:

SDDetector

Could you help me to resolve it? Is there available any docker version in order to avoid this kind of errors?

Best and thanks in advance

Pablo

Run time

Hi,

I ran SDDetector using a 13 GB BLAST XML input. It has run for 15 days now but not finished. Is this normal? Please could you advice how I might be able to speed it up?

Error in running segmental_duplication_detector.py

Hi Nicolas,

I have been trying to run SDdetector but end up with the following error.

$/home/softwares/SDDetector/bin/segmental_duplication_detector.py blast_masked1.tab tab sdd_0.9_3000_5000.gff3 :memory: -g 3000 -l 5000 -a

Traceback (most recent call last): File "/home/softwares/SDDetector/bin/segmental_duplication_detector.py", line 257, in <module> logging.getLogger().setLevel(logLevel) NameError: name 'logLevel' is not defined

Please help me fix it.

Thanks and regards,
Soumya,
CDFD,
Hyderabad,
India

Segmentation fault

Hi,

I am encountering a segmentation fault when running SSDetector on my data (See error below).
My genome is 2.4Gbp and the blast.tab file is 7.3 Gigabytes. Is it possible that the size of my dataset is causing this error?

Thanks,
Tim.

INFO:root:SQLite db stored in memory
INFO:root:Parsing alignments
INFO:root:Loading alignments into database
DEBUG:root:91215553 HSP parsed
INFO:root:Exporting matches after loading in database in gff3 format, file: sdd_0.9_3000_5000.gff3.loading
INFO:root:Removing self-self matches
INFO:root:Exporting matches after removing self-matches in gff3 format, file: sdd_0.9_3000_5000.gff3.selfmatch
INFO:root:Removing alignments below the identity threshold: 0.9
/var/spool/pbs/mom_priv/jobs/66335.flm1.SC: line 18: 4180 Segmentation fault python ~/PROGRAMS/SDDetector/bin/segmental_duplication_detector.py blast.tab tab sdd_0.9_3000_5000.gff3 :memory: -g 3000 -l 5000 -a -v 3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.