Code Monkey home page Code Monkey logo

minerva_barcode_deconvolution's People

Contributors

dcdanko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

minerva_barcode_deconvolution's Issues

Only a fraction of reads are deconvolved

Hello,

I am trying to run Minerva on a toy dataset, but the output file of Minerva only contains about 1% of all the reads present in my original file. I have lowered the thresholds to -a 1 and -d 1 to exclude as few reads as possible. The command line is

cat ./reads_cov50_redundance4.fastq | minerva_deconvolve -k 20 -w 40 -d 1 -a 1 --remove-stopwords --eps 0.51 > results_minerva/deconvolved_minerva_E_coli.tsv

The problem may come from the way this toy data is generated, which is not identical to the output of longranger basic (the tags are number, not sequence, for example). Here are a few lines of the input fastq:

@read0_TBX:0 BX:Z:4104
AAAGCGAGTCGAACCACTTCCGAAGGAGCCGTTCGCTAATTGTGCACGAGTCTAAGTATGTATCTAGGACCTCTCCCTAAACCTCGATCTCGTGCCTTCGTCTGTCGTCCGATAGGCCTATGGCTACTCAGTTCTATTCTAGACGTCCTG
+
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
@read0_TBX:0 BX:Z:4104
ACTCAGTTAGACAAGAGGTACTTCAGAACCTAAGTGACAACCTTGTCTCTCGAGTGGGAGTACCCCGCCAAGTAAGCCTAGGATGATATGCCTACCAAAGCTACCAACGGGCACGTCATCCTTCTCGGCGCGAGGCCCAACGGGATTATG
+
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
@read1_TBX:0 BX:Z:4104
CGTGGATATGATGAGATCAACCTGAATGTCGGCTGCCCGTCTGACCGGGTGCAGAACGGCATGTTTGGTGCGTGTCTGATGGGTAATGCGCAGCTGGTTGCCGACTGCGTGAAAGCGATGCGCGATGTGGTGTCGATTCCGGTGACGGTG
+
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
@read1_TBX:0 BX:Z:4104
TTTCCGGCAAAGGCGAGTGTGAGATGTTCATCATCCACGCACGTAAAGCCTGGCTTTCGGGGTTAAGCCCGAAAGAAAACCGTGAAATCCCGCCGCTCGATTATCCGCGTGTGTATCAACTGAAGCGTGACTTTCCGCATCTGACGATGT
+
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Do you have any idea of what I should change to have a better result ?

Thanks in advance,
Roland

Problem with --help

Hi

After installation using the below line
pip install minerva_deconvolve --user
I run this
minerva_deconvolve --help

I faced this

Traceback (most recent call last):
  File "/home/majid001/.local/bin/minerva_deconvolve", line 6, in <module>
    from minerva.deconvolution.deconvolve_barcodes import main
  File "/home/majid001/.local/lib/python2.7/site-packages/minerva/deconvolution/deconvolve_barcodes.py", line 27
    print(msg.format(len(barcodeTables)), file=sys.stderr)
                                              ^
SyntaxError: invalid syntax

Would you please help me on this issue?

Out of memory error

Hello David,

I just discovered Minerva recently and I want to use it on my dataset. I was able to install it very easily and use it on a subset of my data. I am now trying to run Minerva on my whole 10X dataset (~300M PE reads, ~2.8M barcodes) on a server with 500 GB memory. As mentioned in the readme, I used the following command:

> cat barcoded.fastq | minerva_deconvolve -k 20 -w 40 -d 8 -a 20 --remove-stopwords --eps 0.51 > ebc_assignments.tsv

After running for a few days, the output file was still empty and the last output on stdout was:
"parsed 1,327,100" when it OOM.

My question is: did you ever used Minerva on a dataset this size? Is there a workaround to limit memory usage on my data? Also, in the paper, you mention that the method is easy to multithread. Is it something I can do on my end (for instance by splitting the fastq file) or something that might be included in Minerva in the future ?

Thank you,
Cédric

Error to install

The following is the error I got when install:

running build_ext
building 'cseqs' extension
gcc -pthread -B /research/cxs/anaconda2/compiler_compat -Wl,--sysroot=/ -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/research/cxs/anaconda2/include/python2.7 -c cext_minerva/_cseqs.c -o build/temp.linux-x86_64-2.7/cext_minerva/_cseqs.o
cext_minerva/_cseqs.c:36:15: error: variable 'cseqsmodule' has initializer but incomplete type
 static struct PyModuleDef cseqsmodule = {
               ^
cext_minerva/_cseqs.c:37:3: error: 'PyModuleDef_HEAD_INIT' undeclared here (not in a function)
   PyModuleDef_HEAD_INIT,
   ^
cext_minerva/_cseqs.c:37:3: warning: excess elements in struct initializer
cext_minerva/_cseqs.c:37:3: note: (near initialization for 'cseqsmodule')
cext_minerva/_cseqs.c:38:3: warning: excess elements in struct initializer
   "cseqs",
   ^
cext_minerva/_cseqs.c:38:3: note: (near initialization for 'cseqsmodule')
cext_minerva/_cseqs.c:39:3: warning: excess elements in struct initializer
   module_docstring,
   ^
cext_minerva/_cseqs.c:39:3: note: (near initialization for 'cseqsmodule')
cext_minerva/_cseqs.c:40:3: warning: excess elements in struct initializer
   -1,
   ^
cext_minerva/_cseqs.c:40:3: note: (near initialization for 'cseqsmodule')
cext_minerva/_cseqs.c:41:3: warning: excess elements in struct initializer
   cseqs_methods
   ^
cext_minerva/_cseqs.c:41:3: note: (near initialization for 'cseqsmodule')
cext_minerva/_cseqs.c: In function 'PyInit_cseqs':
cext_minerva/_cseqs.c:45:10: warning: implicit declaration of function 'PyModule_Create' [-Wimplicit-function-declaration]
   return PyModule_Create(&cseqsmodule);
          ^
cext_minerva/_cseqs.c:45:10: warning: 'return' with a value, in function returning void
error: command 'gcc' failed with exit status 1

after minerva_deconvolve

Hi @dcdanko !

I run the command you suggest in the README file for minerva_deconvolve and got my "ebc_assignments.tsv" file with 8 different clusters. My question is what analysis I can perform considering this information. An idea is to assembly separately the reads from each cluster. But have you some additional suggestion?

Thanks.

SyntaxError to install

I followed the installation procedure on README, and I can finish installation of requirements using pip(pip verson: 18.1; python verson: 3.6). But when I install Minerva from the code, it comes up a SyntaxError. The detail SyntaxError seems like:
image

After installation, the "minerva_annotate" and "minerva_eval" can run successfully. But the "minerva_deconvolve" and "minerva_enhance_kraken" can not be installed successfully.

Is my installation procedure wrong? I will be very appreciated if you could help me fix it.

division by zero error

I keep getting this division by zero error, with different datasets:

parsed 966,783 barcodes
Removing 2,518,439 stop and singleton kmers
Removed stop and singleton kmers
686,968 barcodes were at or above dropout threshold

Traceback (most recent call last):
  File "/research/c/anaconda3/bin/minerva_deconvolve", line 11, in <module>
    load_entry_point('minerva-barcoded-read-deconvolution', 'console_scripts', 'minerva_deconvolve')()
  File "/research/c/src/minerva_barcode_deconvolution/minerva/deconvolution/deconvolve_barcodes.py", line 35, in main
    progressBar.write()
  File "/research/c/src/minerva_barcode_deconvolution/minerva/deconvolution/progress_bar.py", line 23, in write
    p = self.events / self.total
ZeroDivisionError: float division by zero

running on provided sample data

Dear David,

When running the latest github version of Minerva on the sample data from the README (Dataset 1) using parameters recommended from the README (-k 20 -w 40 -d 8 -a 50 --remove-stopwords), the program produces no results to stdout, and only outputs debug info to stderr. This behavior doesn't happen on other test datasets, where proper results are printed to stdout. I also tried with -k 20 -a 50 as per the bioRxiv paper and no luck. What were the parameters used on that dataset?

To reproduce the issue

wget https://s3.us-east-2.amazonaws.com/minerva-datasets/10M.data1_atgctgaaq.fq.gz
zcat 10M.data1_atgctgaaq.fq.gz |  minerva_deconvolve -k 20 -w 40 -d 8 -a 50 --remove-stopwords > results.tsv
wc -l results.tsv # returns 0

Thanks in advance,
Rayan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.