Code Monkey home page Code Monkey logo

Comments (5)

sbooeshaghi avatar sbooeshaghi commented on June 6, 2024

A few things:

  1. I would do -m 100G for the memory, this is how the memory string is read
  2. This looks like inDropsv3 data format, is it? If so, the newest release of kallisto supports this, and we will be releasing kb-python soon that will support this structure.

For clarity this is the structure I am referring to:
R1: Biological Read
R2: Cell BC1 (1-8bp)
R3: Index (1-8bp) UMI (9-14bp)
R4: Cell BC2 (1-8bp)

from kb_python.

mariusmessemaker avatar mariusmessemaker commented on June 6, 2024

Thank you for your quick response! I will try to rerun with -m 100G; I will let you know whether that solves the issue. Does the newest version of kallisto have a inDropsv3 structure that also accepts a R3 library index file? If so, I will try to run kallisto. But it should be possible right to specify a new technology string with the library index of inDrops in R3 as a BC because in essence a library index is just a BC (that together with BC1 and BC2 specifies unique cells)?

In addition, should the UMI in your inDrops structure not be in R4? I use the following inDrops version 3 structure, which is the same as used in: https://github.com/indrops/indrops (I checked the reads manually):
R1: Biological read
R2: Cell BC1 (1-8 bp)
R3: Index (1-8 bp)
R4: Cell BC2 (1-8 bp) and UMI (9-14 bp).

Again, thank you so much for creating kallisto bus, and kb-python.

from kb_python.

sbooeshaghi avatar sbooeshaghi commented on June 6, 2024

The index file is only necessary if you wish to demultiplex samples that were pooled on the same lane, using the samplesheet.csv file that you create (if you used Illumina short read sequencing), see Illumina documentation. kallisto does not use the sample index.

Small side note, I made an error in the comment above, the UMI is in R4 and the structure is:

R1: Biological read
R2: Cell BC1 (1-8 bp)
R3: Index (1-8 bp)
R4: Cell BC2 (1-8 bp) and UMI (9-14 bp).

To process your reads lets look at main.cpp in the kallisto repo. We see the following lines:

      } else if (opt.technology == "INDROPSV3") {
        busopt.nfiles = 3;
        busopt.seq.push_back(BUSOptionSubstr(2,0,0));
        busopt.umi = BUSOptionSubstr(1,8,14);
        busopt.bc.push_back(BUSOptionSubstr(0,0,8));
        busopt.bc.push_back(BUSOptionSubstr(1,0,8));

In plain english: kallisto expects 3 files. Given how you have defined what R1,R2,R3,R4 mean, we note the that first half of the cell barcode comes from R2, the second half of the cell barcode comes from R4, the UMI comes from R4 and the biological read is in R1. So the command would be:

kallisto bus -i index.idx -o ./output -x inDropsv3 R2.fastq.gz R4.fastq.gz R1.fastq.gz

Where R2 is the 0th file, R4 is the 1st file and R1 is the 2nd file (0-indexed).

This works with the current release of kallisto and will be added to kb-python soon.

from kb_python.

mariusmessemaker avatar mariusmessemaker commented on June 6, 2024

Thank you for your reaction. Yes, I understood that I could not use technology = "INDROPSV3" because this kb technology specification expects 3 files that together contain BC1, BC2, UMI, and Biological read. Therefore, I used the opportunity of kb to specifiy a new technology myself that has 3 BC of which one BC is the library index because in essence a library index is just a BC that together with BC1 and BC2 specifies an unique cell. The issue that I raised here is that the specification of a new technology with 3 BCs, a UMI, and a Biological read does not work, while it should be possible to do this according to the kallisto bus documentation? I also tried to run with -m 100G and I get a different error <Signals.SIGKILL: 9> instead of <Signals.SIGSEGV: 11> but still 0 reads pseudo aligned and an empty bus file (which I think causes the error in the bustools sort command).

from kb_python.

github-actions avatar github-actions commented on June 6, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

from kb_python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.