Code Monkey home page Code Monkey logo

haploclique's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

haploclique's Issues

make fails on ubuntu

Hi,
[30-05-14 16:37:20 l: 0.50] $> make
Scanning dependencies of target BamTools
[ 1%] Building CXX object 3rd-party/bamtools/src/bamtools/api/CMakeFiles/BamTools.dir/BamAlignment.cpp.o
cc1plus: error: unrecognised command line option ‘-std=c++11’
make[2]: *** [3rd-party/bamtools/src/bamtools/api/CMakeFiles/BamTools.dir/BamAlignment.cpp.o] Error 1
make[1]: *** [3rd-party/bamtools/src/bamtools/api/CMakeFiles/BamTools.dir/all] Error 2
make: *** [all] Error 2

When I download bamtools and try and make them separately there is no problem:-

[ 97%] Building CXX object src/toolkit/CMakeFiles/bamtools_cmd.dir/bamtools_sort.cpp.o
[ 98%] Building CXX object src/toolkit/CMakeFiles/bamtools_cmd.dir/bamtools_split.cpp.o
[ 99%] Building CXX object src/toolkit/CMakeFiles/bamtools_cmd.dir/bamtools_stats.cpp.o
[100%] Building CXX object src/toolkit/CMakeFiles/bamtools_cmd.dir/bamtools.cpp.o
Linking CXX executable ../../../bin/bamtools
[100%] Built target bamtools_cmd

I'm not a big C++ programmer but is the correct compiler being specified?
best regards
Jake

Doubt about an implementation detail in Haploclique

Hi,
I am working on my master thesis where I am modifying the edge criteria in Haploclique to Identify nucleosome patterns. I am not explaining the complete edge criteria here since the question is not directly related to that.

However I have a doubt in the part where the reads from a clique are merged to form superreads. Does the algorithm use a consensus for each position while merging?

I am talking about the class AlignmentRecord.cpp and the method
AlignmentRecord::AlignmentRecord(...){....}

There is a for loop and it looks like the reads are merged sequentially as pairs of two and not a consensus of all the reads that are present in the clique.

I would appreciate if you point out where the consensus merging takes place.

Thanks and Regards
Shounak

install mac

Hi,
I am very interested in your software to use on MiSeq HIV data
I've just installed haploclique on my Mac with the followings:
git clone https://github.com/cbg-ethz/haploclique && cd haploclique
git submodule update --init --remote
mkdir build && cd build
cmake .. && make
No errors.
Haploclique should now be installed in the following dir:
/Users/myname/Documents/haploclique/

When I tried to test it on bam file with the following command but I get this error:

screen shot 2017-03-15 at 10 07 12 am

It is probably something obvious...Any chance you can help me?
Let me know if you need any other info.
thanks!
a

Error: Unable to access jarfile /ConsensusFixer.jar

The haploclique-assembly can not find ConsensusFixer jar file and there is no detail on where to put this jar file. Can you please let me know how to resolve it? I have already tried placing the said .jar file in bin build and external folders.

$ haploclique-assembly -r hbv_refseq.fasta -i C7_mapped.bam
Error: Unable to access jarfile /ConsensusFixer.jar
[E::hts_open_format] fail to open file 'single_sort.bam'
samtools view: failed to open "single_sort.bam" for reading: No such file or directory
rm: cannot remove ‘single_sort.bam’: No such file or directory
STATUS: 169
Cliques/Uniques/CPU time: 62/41/1
STATUS: 200
Cliques/Uniques/CPU time: 100/52/1
Error correction of singletons
Max read length: 378 bp
STATUS: 47
Cliques/Uniques/CPU time: 0/0/0
cat: x*/data_cliques_paired_R1.fastq: No such file or directory
cat: x*/data_cliques_paired_R2.fastq: No such file or directory
cat: x*/data_cliques_single.fastq: No such file or directory
cat: x*/data_clique_to_reads.tsv: No such file or directory
Max read length: 378 bp
[bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files
Usage: samtools sort [options...] [in.bam]
Options:
-l INT Set compression level, from 0 (uncompressed) to 9 (best)
-m INT Set maximum memory per thread; suffix K/M/G recognized [768M]
-n Sort by read name
-o FILE Write final output to FILE rather than standard output
-T PREFIX Write temporary files to PREFIX.nnnn.bam
-@, --threads INT
Set number of sorting and compression threads [1]
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
mv: cannot stat ‘single_2.bam’: No such file or directory

Wrong output naming scheme

The output files are called:

quasispecies.fasta.bam
quasispecies.fasta.fasta

This should be an easy fix.

Update README.md

The readme is not up2date after the latest rounds of refactoring. Describe how to build it locally and which dependencies are really necessary.

git:// versus https:// in install-additional-software.sh

install-additional-software.sh hangs on cloning via git:// URLs but using https:// URLs instead works. Maybe because I'm behind a firewall proxy or maybe because of a change at github.com?

This is a fix:

sed -i -e "s/git:/https:/g" install-additional-software.sh

Refactor cmake

Simplify the cmake code. I personally do it like this nowadays, but feel free to use your own best practices.

Use ninja

Use ninja as a build tool: cmake -GNinja .. && ninja

Quasispecies assembly of long-range haplotypes

Hey I'm currently trying to assembly long quasi-species,
but don't know how i should do this.
Can you provide some information?
The most important questions are:

  1. In the workflow picture, HaploClique means that i have to start haploclique-assembly or haploclique?
  2. How many iterations should i do (30)?
  3. If i use haploclique-assembly and implement the workflow, pictured the the description, in a script with like 3 iterations what should I do in the last step in order to get the haplotypes?

post processing ctk-version

ctk-version, required by processing. No details in the README of where to get the program. vcf file is not produced.

Only provide BAM output

Drop all output files except for BAM. Annotate metrics to each record using custom BAM tags, such as super-read coverage or the read names of that were used.

Two short questions about haploclique capabilities ?

Dear Haploclique Creators,

1- Could we use Haploclique on DNA virus dataset ?
2- Does Haploclique could manage large insertion elements (>1000 bp) ? It could be problematic since the read length is samller compare to deletion length indeed this information is not contain in Bam file (I see the help page that structural variant is not fully supported).

Cheers,

JB

How set the number of threads?

I'm using the last release of hapliclique and I had seen that use pthreads. How can I set more than one thread to do the analysis?
Thanks
Pedro

What's new on 1.0 last release?

I'm user of Haploclique a long time ago and I see that you have released a major version. I didn't found a changelog but I would like to known the new features of this version. Please, can you do a brief description?
Thank you in advance

SAMTools commands changes cause [bam_sort] fails?

haplo-clique-assemble seems to complete, generating:

consensus.fasta
statistics.txt
deletions.txt
mean-sd
data_cliques_paired_R1.fastq
data_cliques_paired_R2.fastq
data_cliques_single.fastq
data_clique_to_reads.tsv
singles.prior
alignment.prior
quasispecies.fasta

...but I get the error below during the process so maybe I'm missing some results output? I know SAMTools command usage changed recently ("The obsolete samtools sort in.bam out.prefix usage has been removed. If you are still using ‑f, ‑o, or out.prefix, convert to use -T PREFIX and/or -o FILE instead. (#295, #349, #356, #418, PR #441; see also discussions in #171, #213.)").

[E::hts_open_format] fail to open file 'single_sort.bam'
samtools view: failed to open "single_sort.bam" for reading: No such file or directory
rm: cannot remove 'single_sort.bam': No such file or directory
parallel: Warning: $HOME not set. Using /tmp
When using programs that use GNU Parallel to process data for publication please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
  ;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; and it won't cost you a cent.
Or you can get GNU Parallel without this requirement by paying 10000 EUR.

To silence this citation notice run 'parallel --bibtex' once or use '--no-notice'.

STATUS: 200
Cliques/Uniques/CPU time:       3/1/0
...
...
Cliques/Uniques/CPU time:       0/0/0
STATUS: 200
Cliques/Uniques/CPU time:       0/0/1
Max read length:                287 bp
[bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files
Usage: samtools sort [options...] [in.bam]
Options:
  -l INT     Set compression level, from 0 (uncompressed) to 9 (best)
  -m INT     Set maximum memory per thread; suffix K/M/G recognized [768M]
  -n         Sort by read name
  -o FILE    Write final output to FILE rather than standard output
  -T PREFIX  Write temporary files to PREFIX.nnnn.bam
  -@, --threads INT
             Set number of sorting and compression threads [1]
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
  -O, --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]
rm: cannot remove 'reads.bam': No such file or directory
[bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files
Usage: samtools sort [options...] [in.bam]
Options: 
  -l INT     Set compression level, from 0 (uncompressed) to 9 (best)
  -m INT     Set maximum memory per thread; suffix K/M/G recognized [768M]
  -n         Sort by read name
  -o FILE    Write final output to FILE rather than standard output
  -T PREFIX  Write temporary files to PREFIX.nnnn.bam
  -@, --threads INT
             Set number of sorting and compression threads [1]
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
  -O, --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]
mv: cannot stat 'single_2.bam': No such file or directory

no header for indel.out

No header details are produced for the file indel.out when haploclique-assembly is called in -l mode

Installation problem

Hello,
I am trying to install Haploclique (on Ubuntu) and I always get the same error message:

CMake Warning at external/googletest/googletest/CMakeLists.txt:51 (project):
VERSION keyword not followed by a value or was followed by a value that
expanded to nothing.

Found Python: /usr/bin/python3.8 (found version "3.8.0") found components: Interpreter
CMake Error at external/googletest/googletest/CMakeLists.txt:127 (set_target_properties):
set_target_properties called with incorrect number of arguments.

CMake Error at external/googletest/googletest/CMakeLists.txt:129 (set_target_properties):
set_target_properties called with incorrect number of arguments.

Configuring incomplete, errors occurred!
See also "/home/natasa/Downloads/Programi/haploclique/build/CMakeFiles/CMakeOutput.log".
See also "/home/natasa/Downloads/Programi/haploclique/build/CMakeFiles/CMakeError.log".

I would be really grateful if you could give me some advice, because I found this software perfect for my research and I would really like to find a way to use it.

Thank you so much in advance!
Natasa

Empty output files

Hello. when I run haploclique-assembly, i get the output files (the latter 7 of which are 0 bytes):

consensus.fasta
statistics.txt
deletions.txt
alignment.prior
data_clique_to_reads.tsv
data_cliques_paired_R1.fastq
data_cliques_paired_R2.fastq
data_cliques_single.fastq
quasispecies.fasta
singles.prior

and the following error:

[E::hts_open_format] fail to open file 'single_sort.bam'
samtools view: failed to open "single_sort.bam" for reading: No such file or directory
rm: single_sort.bam: No such file or directory
mv: rename x*-dir to x*-dir/alignment.prior: Invalid argument
cat: alignment.prior: No such file or directory
STATUS
Cliques/Uniques/CPU time: 0/0/0
Error correction of singletons
cat: x*/data_cliques_paired_R1.fastq: No such file or directory
cat: x*/data_cliques_paired_R2.fastq: No such file or directory
cat: x*/data_cliques_single.fastq: No such file or directory
cat: x*/data_clique_to_reads.tsv: No such file or directory
Max read length: 0 bp
mv: rename x*-dir to x*-dir/alignment.prior: Invalid argument
cat: alignment.prior: No such file or directory
STATUS
Cliques/Uniques/CPU time: 0/0/0
cat: x*/data_cliques_paired_R1.fastq: No such file or directory
cat: x*/data_cliques_paired_R2.fastq: No such file or directory
cat: x*/data_cliques_single.fastq: No such file or directory
cat: x*/data_clique_to_reads.tsv: No such file or directory
Max read length: 0 bp

I have all of the dependencies installed (and updated). I would greatly appreciate any advice as to what this error message means and how to rectify it. thank you!

Errors when reading Bam files [Conda container of haploclique]

Dear Haploclique creators,

I am able to install without any problem this conda https://anaconda.org/bioconda/haploclique.
When i tested haploclique -h, I get the manual without any problem.

However, when I ran haploclique on Bam I get an error "Unexpected error while reading BamFile." line 296 of the cpp program...

I also rebuilt the conda env to be sure that no error was made. I get the same error. because i am working on a cluster, I have not chance to compile haploclique from the source.

I also check if bam was corrupted, I ran gatk ValidateSamFile in verbose, I only get warnings.
I have no more idea...
JB

Runtime for 10000x coverage

Hi, I am trying to run Haploclique on my dataset with coverage 10,000x. The time taken for this is more than a day. Is there a parameter to set for such high coverages?
Thanks in advance.

Haploclique script getting stalled

Hi
I was trying to run Haploclique on a subset of HCV RNA data referred in your paper (Illumina MiSeq NGS , 1000X coverage, 6926 paired end reads, reference ~ 9720 bp). The script execution seems to stall with only this message -
mv: cannot move x*-dir' to a subdirectory of itself,x*-dir/alignment.prior'

Here is a snippet of the debug log for the script haploclique-assembly

$ bash -x haploclique-assembly -r HCV1.fasta -i HCV1_reads_sorted.bam
...
...

  • mv paired.priors alignment.prior
  • '[' 0 == 0 ']'
  • cat single.priors
  • rm paired.sam single_sort.sam paired.bam single_sort.bam single.bam single.priors
  • [[ 0 == 0 ]]
    ++ seq 1 1
  • for i in '$(seq 1 ${ITERATIONS})'
  • [[ 0 == 0 ]]
  • cat singles.prior
  • rm -rf singles.prior 'data_.fastq' 'trash_' '-e' 'x'
  • [[ 0 == 1 ]]
  • sort -k6,6 -g alignment.prior
  • split -a 10 -l 200
  • for i in 'x_'
  • mkdir 'x_-dir'
  • mv 'x_-dir' 'x_-dir/alignment.prior'
    mv: cannot move x*-dir' to a subdirectory of itself,x_-dir/alignment.prior'
  • parallel computeParallelSingle '{}' ::: 'x_-dir'

The code seems to take a huge time to execute haploclique under computeParallelSingle(). Am I missing some parameter setting ? Your help in debugging this is much appreciated.
Additionally, could you provide a pointer to the simulated HIV data from the paper ?
Thanks very much !

Enable drop-in replacements of modules

Separate the individual parts of the workflow in their own "modules" by generalization the code. Minimalize APIs. Acceptance criterion: I can use another MCE algorithm or different distance function.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.