Code Monkey home page Code Monkey logo

kb_python's Introduction

kb-python

github version pypi version python versions status codecov pypi downloads docs license

kb-python is a python package for processing single-cell RNA-sequencing. It wraps the kallisto | bustools single-cell RNA-seq command line tools in order to unify multiple processing workflows.

kb-python was developed by Kyung Hoi (Joseph) Min and A. Sina Booeshaghi while in Lior Pachter's lab at Caltech. If you use kb-python in a publication please cite*:

Melsted, P., Booeshaghi, A.S., et al. 
Modular, efficient and constant-memory single-cell RNA-seq preprocessing. 
Nat Biotechnol  39, 813–818 (2021). 
https://doi.org/10.1038/s41587-021-00870-2

Installation

The latest release can be installed with

pip install kb-python

The development version can be installed with

pip install git+https://github.com/pachterlab/kb_python

There are no prerequisite packages to install. The kallisto and bustools binaries are included with the package.

Usage

kb consists of four subcommands

$ kb
usage: kb [-h] [--list] <CMD> ...
positional arguments:
  <CMD>
    info      Display package and citation information
    compile   Compile `kallisto` and `bustools` binaries from source
    ref       Build a kallisto index and transcript-to-gene mapping
    count     Generate count matrices from a set of single-cell FASTQ files

kb ref: generate a pseudoalignment index

The kb ref command takes in a species annotation file (GTF) and associated genome (FASTA) and builds a species-specific index for pseudoalignment of reads. This must be run before kb count. Internally, kb ref extracts the coding regions from the GTF and builds a transcriptome FASTA that is then indexed with kallisto index.

kb ref -i index.idx -g t2g.txt -f1 transcriptome.fa <GENOME> <GENOME_ANNOTATION>
  • <GENOME> refers to a genome file (FASTA).
    • For example, the zebrafish genome is hosted by ensembl and can be downloaded here
  • <GENOME_ANNOTATION> refers to a genome annotation file (GTF)
    • For example, the zebrafish genome annotation file is hosted by ensembl and can be downloaded here
  • Note: The latest genome annotation and genome file for every species on ensembl can be found with the gget command-line tool.

Prebuilt indices are available at https://github.com/pachterlab/kallisto-transcriptome-indices

Examples

# Index the transcriptome from genome FASTA (genome.fa.gz) and GTF (annotation.gtf.gz)
$ kb ref -i index.idx -g t2g.txt -f1 transcriptome.fa genome.fa.gz annotation.gtf.gz
# An example for downloading a prebuilt reference for mouse
$ kb ref -d mouse -i index.idx -g t2g.txt

kb count: pseudoalign and count reads

The kb count command takes in the pseudoalignment index (built with kb ref) and sequencing reads generated by a sequencing machine to generate a count matrix. Internally, kb count runs numerous kallisto and bustools commands comprising a single-cell workflow for the specified technology that generated the sequencing reads.

kb  count -i index.idx -g t2g.txt -o out/ -x <TECHNOLOGY> <FASTQ FILE[s]>
  • <TECHNOLOGY> refers to the assay that generated the sequencing reads.
    • For a list of supported assays run kb --list
  • <FASTQ FILE[s]> refers to the a list of FASTQ files generated
    • Different assays will have a different number of FASTQ files
    • Different assays will place the different features in different FASTQ files
      • For example, sequencing a 10xv3 library on a NextSeq Illumina sequencer usually results in two FASTQ files.
      • The R1.fastq.gz file (colloquially called "read 1") contains a 16 basepair cell barcode and a 12 basepair unique molecular identifier (UMI).
      • The R2.fastq.gz file (colloquially called "read 2") contains the cDNA associated with the cell barcode-UMI pair in read 1.

Examples

# Quantify 10xv3 reads read1.fastq.gz and read2.fastq.gz
$ kb count -i index.idx -g t2g.txt -o out/ -x 10xv3 read1.fastq.gz read2.fastq.gz

kb info: display package and citation information

The kb info command prints out package information including the version of kb-python, kallisto, and bustools along with their installation location.

$ kb info
kb_python 0.28.0 ...
kallisto: 0.50.1 ...
bustools: 0.43.1 ...
...

kb compile: compile kallisto and bustools binaries from source

The kb compile command grabs the latest kallisto and bustools source and compiles the binaries. Note: this is not required to run kb-python.

Use cases

kb-python facilitates fast and uniform pre-processing of single-cell sequencing data to answer relevant research questions.

$ pip install kb-python gget ffq

# Goal: quantify publicly available scRNAseq data
$ kb ref -i index.idx -g t2g.txt -f1 transcriptome.fa $(gget ref --ftp -w dna,gtf homo_sapiens)
$ kb count -i index.idx -g t2g.txt -x 10xv3 -o out $(ffq --ftp SRR10668798 | jq -r '.[] | .url' | tr '\n' ' ')
# -> count matrix in out/ folder

# Goal: quantify 10xv2 feature barcode data, feature_barcodes.txt is a tab-delimited file
# containing barcode_sequence<tab>barcode_name
$ kb ref -i index.idx -g f2g.txt -f1 features.fa --workflow kite feature_barcodes.txt
$ kb count -i index.idx -g f2b.txt -x 10xv2 -o out/ --workflow kite --h5ad R1.fastq.gz R2.fastq.gz
# -> count matrix in out/ folder

Submitted by @sbooeshaghi.

Do you have a cool use case for kb-python? Submit a PR (including the goal, code snippet, and your username) so that we can feature it here.

Tutorials

For a list of tutorials that use kb-python please see https://www.kallistobus.tools/.

Documentation

Developer documentation is hosted on Read the Docs.

Contributing

Thank you for wanting to improve kb-python! If you have believe you've found a bug, please submit an issue.

If you have a new feature you'd like to add to kb-python please create a pull request. Pull requests should contain a message detailing the exact changes made, the reasons for the change, and tests that check for the correctness of those changes.

Cite

If you use kb-python in a publication, please cite the following papers:

kb-python & kallisto and/or bustools

@article{sullivan2023kallisto,
  title={kallisto, bustools, and kb-python for quantifying bulk, single-cell, and single-nucleus RNA-seq},
  author={Sullivan, Delaney K and Min, Kyung Hoi and Hj{\"o}rleifsson, Kristj{\'a}n Eldj{\'a}rn and Luebbert, Laura and Holley, Guillaume and Moses, Lambda and Gustafsson, Johan and Bray, Nicolas L and Pimentel, Harold and Booeshaghi, A Sina and others},
  journal={bioRxiv},
  pages={2023--11},
  year={2023},
  publisher={Cold Spring Harbor Laboratory}
}

bustools

@article{melsted2021modular,
  title={\href{https://doi.org/10.1038/s41587-021-00870-2}{Modular, efficient and constant-memory single-cell RNA-seq preprocessing}},
  author={Melsted, P{\'a}ll and Booeshaghi, A. Sina and Liu, Lauren and Gao, Fan and Lu, Lambda and Min, Kyung Hoi Joseph and da Veiga Beltrame, Eduardo and Hj{\"o}rleifsson, Kristj{\'a}n Eldj{\'a}rn and Gehring, Jase and Pachter, Lior},
  author+an={1=first;2=first,highlight},
  journal={Nature biotechnology},
  year={2021},
  month={4},
  day={1},
  doi={https://doi.org/10.1038/s41587-021-00870-2}
}

kallisto

@article{bray2016near,
  title={Near-optimal probabilistic RNA-seq quantification},
  author={Bray, Nicolas L and Pimentel, Harold and Melsted, P{\'a}ll and Pachter, Lior},
  journal={Nature biotechnology},
  volume={34},
  number={5},
  pages={525--527},
  year={2016},
  publisher={Nature Publishing Group}
}

kITE

@article{booeshaghi2024quantifying,
  title={Quantifying orthogonal barcodes for sequence census assays},
  author={Booeshaghi, A Sina and Min, Kyung Hoi and Gehring, Jase and Pachter, Lior},
  journal={Bioinformatics Advances},
  volume={4},
  number={1},
  pages={vbad181},
  year={2024},
  publisher={Oxford University Press}
}

BUS format

@article{melsted2019barcode,
  title={The barcode, UMI, set format and BUStools},
  author={Melsted, P{\'a}ll and Ntranos, Vasilis and Pachter, Lior},
  journal={Bioinformatics},
  volume={35},
  number={21},
  pages={4472--4473},
  year={2019},
  publisher={Oxford University Press}
}

kb-python was inspired by Sten Linnarsson’s loompy fromfq command (http://linnarssonlab.org/loompy/kallisto/index.html)

kb_python's People

Contributors

biobenkj avatar dependabot[bot] avatar lakigigar avatar lauraluebbert avatar lioscro avatar sbooeshaghi avatar trellixvulnteam avatar yenaled avatar yfarjoun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kb_python's Issues

How to use multiple indices for RNA velocity?

Apologies as this isn't so much an issue as I'm a bit confused. I followed this vignette for generating custom indices for RNA velocity (https://colab.research.google.com/github/pachterlab/kallistobustools/blob/master/notebooks/kb_velocity_index.ipynb#scrollTo=eghgsyNlxZGj). I am using the devel branch of kb_python 0.24.4. I generated 8 indices (as opposed to 4 in the vignette) just for memory reasons - the first is labeled .idx_cdna, and the remaining seven are .idx_intron.x, where x is a number from 0 to 6.

I'm confused on how to proceed with multiple indices for kb count, however. Loading multiple indices to the -i flag didn't seem to work (-i mouse_velocity.idx_*), and using multiple -i flags only used the last index. I'm not fully sure how the indexing is done under the hood, but should just run kb count with each index individually and then concatenate? I wasn't sure if this would affect UMI collapsing, for example.

Loom file incompatibility with loomR

Occurs when following "getting started" tutorial. Attempting to read the generated loom file with loomR results in the following error.

Error in validateLoom(object = self) : 
  There can only be 5 groups in the loom file: 'row_attrs', 'col_attrs', 'layers', 'row_graphs', ‘col_graphs'

ValueError: could not convert integer scalar

Hi developers, fantastic tool!

I am unable to run kb count because of an error during the combining of matrices. Im going through the exact piece of code in count.py so I can debug but still not completely clear to me the issue. Any help would be much appreciated.

kb-python=0.24.4

Not sure if its my data because the test data from google colab kb tutorials (using kb_single_nucleus.ipynb) worked well.

What is the exact command that was run?

kb ref -i geneset.dir/index.idx -g geneset.dir/t2g.txt -f1 geneset.dir/cdna.fa     -f2 geneset.dir/intron.fa -c1 geneset.dir/cdna_t2c.txt -c2 geneset.dir/intron_t2c.txt     --workflow nucleus -n 8

 kb count -i geneset.dir/index.idx.7,geneset.dir/index.idx.5,geneset.dir/index.idx.0,geneset.dir/index.idx.2,geneset.dir/index.idx.6,geneset.dir/index.idx.1,geneset.dir/index.idx.3,geneset.dir/index.idx.4 -g geneset.dir/t2g.txt      -c1 geneset.dir/cdna_t2c.txt -c2 geneset.dir/intron_t2c.txt -x DropSeq     -o kallisto.dir/D145E_S1/bus --workflow nucleus --loom  ./D145E_S1.fastq.1.gz ./D145E_S1.fastq.2.gz     2> kallisto.dir/D145E_S1/bus_kblog.log

Command output (with --verbose flag)

[2020-02-19 10:16:30,772]    INFO Generating BUS file using 8 indices
[2020-02-19 10:16:30,774]    INFO Using index geneset.dir/index.idx.7 to generate BUS file to kallisto.dir/./D145E_S1/bus/tmp/bus_part0 from
[2020-02-19 10:16:30,774]    INFO         ./D145E_S1.fastq.1.gz
[2020-02-19 10:16:30,774]    INFO         ./D145E_S1.fastq.2.gz
[2020-02-19 10:26:51,439]    INFO Using index geneset.dir/index.idx.5 to generate BUS file to kallisto.dir/./D145E_S1/bus/tmp/bus_part1 from
[2020-02-19 10:26:51,440]    INFO         ./D145E_S1.fastq.1.gz
[2020-02-19 10:26:51,440]    INFO         ./D145E_S1.fastq.2.gz
[2020-02-19 10:38:26,428]    INFO Using index geneset.dir/index.idx.0 to generate BUS file to kallisto.dir/./D145E_S1/bus/tmp/bus_part2 from
[2020-02-19 10:38:26,428]    INFO         ./D145E_S1.fastq.1.gz
[2020-02-19 10:38:26,428]    INFO         ./D145E_S1.fastq.2.gz
[2020-02-19 10:48:32,959]    INFO Using index geneset.dir/index.idx.2 to generate BUS file to kallisto.dir/./D145E_S1/bus/tmp/bus_part3 from
[2020-02-19 10:48:32,959]    INFO         ./D145E_S1.fastq.1.gz
[2020-02-19 10:48:32,959]    INFO         ./D145E_S1.fastq.2.gz
[2020-02-19 10:59:08,434]    INFO Using index geneset.dir/index.idx.6 to generate BUS file to kallisto.dir/./D145E_S1/bus/tmp/bus_part4 from
[2020-02-19 10:59:08,434]    INFO         ./D145E_S1.fastq.1.gz
[2020-02-19 10:59:08,434]    INFO         ./D145E_S1.fastq.2.gz
[2020-02-19 11:09:18,015]    INFO Using index geneset.dir/index.idx.1 to generate BUS file to kallisto.dir/./D145E_S1/bus/tmp/bus_part5 from
[2020-02-19 11:09:18,016]    INFO         ./D145E_S1.fastq.1.gz
[2020-02-19 11:09:18,016]    INFO         ./D145E_S1.fastq.2.gz
[2020-02-19 11:19:25,219]    INFO Using index geneset.dir/index.idx.3 to generate BUS file to kallisto.dir/./D145E_S1/bus/tmp/bus_part6 from
[2020-02-19 11:19:25,220]    INFO         ./D145E_S1.fastq.1.gz
[2020-02-19 11:19:25,220]    INFO         ./D145E_S1.fastq.2.gz
[2020-02-19 11:30:02,210]    INFO Using index geneset.dir/index.idx.4 to generate BUS file to kallisto.dir/./D145E_S1/bus/tmp/bus_part7 from
[2020-02-19 11:30:02,210]    INFO         ./D145E_S1.fastq.1.gz
[2020-02-19 11:30:02,210]    INFO         ./D145E_S1.fastq.2.gz
[2020-02-19 11:40:19,338]    INFO Merging BUS records to kallisto.dir/./D145E_S1/bus from
[2020-02-19 11:40:19,338]    INFO         kallisto.dir/./D145E_S1/bus/tmp/bus_part0
[2020-02-19 11:40:19,338]    INFO         kallisto.dir/./D145E_S1/bus/tmp/bus_part1
[2020-02-19 11:40:19,338]    INFO         kallisto.dir/./D145E_S1/bus/tmp/bus_part2
[2020-02-19 11:40:19,338]    INFO         kallisto.dir/./D145E_S1/bus/tmp/bus_part3
[2020-02-19 11:40:19,338]    INFO         kallisto.dir/./D145E_S1/bus/tmp/bus_part4
[2020-02-19 11:40:19,338]    INFO         kallisto.dir/./D145E_S1/bus/tmp/bus_part5
[2020-02-19 11:40:19,338]    INFO         kallisto.dir/./D145E_S1/bus/tmp/bus_part6
[2020-02-19 11:40:19,339]    INFO         kallisto.dir/./D145E_S1/bus/tmp/bus_part7
[2020-02-19 11:58:06,904]    INFO Sorting BUS file kallisto.dir/./D145E_S1/bus/output.bus to kallisto.dir/./D145E_S1/bus/tmp/output.s.bus
[2020-02-19 12:00:53,974]    INFO Whitelist not provided
[2020-02-19 12:00:53,975]    INFO Generating whitelist kallisto.dir/./D145E_S1/bus/whitelist.txt from BUS file kallisto.dir/./D145E_S1/bus/tmp/output.s.bus
[2020-02-19 12:00:55,186]    INFO Inspecting BUS file kallisto.dir/./D145E_S1/bus/tmp/output.s.bus
[2020-02-19 12:07:44,273]    INFO Correcting BUS records in kallisto.dir/./D145E_S1/bus/tmp/output.s.bus to kallisto.dir/./D145E_S1/bus/tmp/output.s.c.bus with whitelist kallisto.dir/./D145E_S1/bus/whitelist.txt
[2020-02-19 12:08:12,843]    INFO Sorting BUS file kallisto.dir/./D145E_S1/bus/tmp/output.s.c.bus to kallisto.dir/./D145E_S1/bus/output.unfiltered.bus
[2020-02-19 12:09:15,982]    INFO Capturing records from BUS file kallisto.dir/./D145E_S1/bus/output.unfiltered.bus to kallisto.dir/./D145E_S1/bus/tmp/spliced.bus with capture list geneset.dir/intron_t2c.txt
[2020-02-19 12:16:48,559]    INFO Sorting BUS file kallisto.dir/./D145E_S1/bus/tmp/spliced.bus to kallisto.dir/./D145E_S1/bus/spliced.unfiltered.bus
[2020-02-19 12:16:53,730]    INFO Inspecting BUS file kallisto.dir/./D145E_S1/bus/spliced.unfiltered.bus
[2020-02-19 12:22:22,623]    INFO Generating count matrix kallisto.dir/./D145E_S1/bus/counts_unfiltered/spliced from BUS file kallisto.dir/./D145E_S1/bus/spliced.unfiltered.bus
[2020-02-19 12:31:42,780]    INFO Capturing records from BUS file kallisto.dir/./D145E_S1/bus/output.unfiltered.bus to kallisto.dir/./D145E_S1/bus/tmp/unspliced.bus with capture list geneset.dir/cdna_t2c.txt
[2020-02-19 12:43:53,306]    INFO Sorting BUS file kallisto.dir/./D145E_S1/bus/tmp/unspliced.bus to kallisto.dir/./D145E_S1/bus/unspliced.unfiltered.bus
[2020-02-19 12:44:31,714]    INFO Inspecting BUS file kallisto.dir/./D145E_S1/bus/unspliced.unfiltered.bus
[2020-02-19 12:50:46,299]    INFO Generating count matrix kallisto.dir/./D145E_S1/bus/counts_unfiltered/unspliced from BUS file kallisto.dir/./D145E_S1/bus/unspliced.unfiltered.bus
[2020-02-19 13:01:39,061]    INFO Reading matrix kallisto.dir/./D145E_S1/bus/counts_unfiltered/spliced.mtx
[2020-02-19 13:01:43,009]    INFO Reading matrix kallisto.dir/./D145E_S1/bus/counts_unfiltered/unspliced.mtx
[2020-02-19 13:01:47,348]    INFO Combining matrices
[2020-02-19 13:13:45,216]   ERROR An exception occurred
Traceback (most recent call last):
  File "/ifs/devel/adamc/cgat-developers/conda-install/envs/single-cell/lib/python3.7/site-packages/kb_python/main.py", line 697, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/ifs/devel/adamc/cgat-developers/conda-install/envs/single-cell/lib/python3.7/site-packages/kb_python/main.py", line 194, in parse_count
    temp_dir=temp_dir
  File "/ifs/devel/adamc/cgat-developers/conda-install/envs/single-cell/lib/python3.7/site-packages/kb_python/count.py", line 1297, in count_velocity
    nucleus=nucleus
  File "/ifs/devel/adamc/cgat-developers/conda-install/envs/single-cell/lib/python3.7/site-packages/kb_python/count.py", line 584, in convert_matrices
    adata = sum_anndatas(*adatas) if nucleus else overlay_anndatas(*adatas)
  File "/ifs/devel/adamc/cgat-developers/conda-install/envs/single-cell/lib/python3.7/site-packages/kb_python/utils.py", line 681, in sum_anndatas
    spliced_unspliced = spliced_intersection.copy()
  File "/ifs/devel/adamc/cgat-developers/conda-install/envs/single-cell/lib/python3.7/site-packages/anndata/_core/anndata.py", line 1425, in copy
    X = _subset(self._adata_ref.X, (self._oidx, self._vidx)).copy()
  File "/ifs/devel/adamc/cgat-developers/conda-install/envs/single-cell/lib/python3.7/functools.py", line 827, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/ifs/devel/adamc/cgat-developers/conda-install/envs/single-cell/lib/python3.7/site-packages/anndata/_core/index.py", line 126, in _subset
    return a[subset_idx]
  File "/ifs/devel/adamc/cgat-developers/conda-install/envs/single-cell/lib/python3.7/site-packages/scipy/sparse/_index.py", line 75, in __getitem__
    return self._get_arrayXarray(row, col)
  File "/ifs/devel/adamc/cgat-developers/conda-install/envs/single-cell/lib/python3.7/site-packages/scipy/sparse/compressed.py", line 665, in _get_arrayXarray
    major.size, major.ravel(), minor.ravel(), val)
ValueError: could not convert integer scalar

barcode swapping?

Dear all,

Is there an existing solution based on the kb output for identification of swapped barcodes across samples
(e.g. https://www.nature.com/articles/s41467-018-05083-x/)?
DropletUtils::swappedDrops provides one based on CellRanger's output, but it seems to me it cannot be readily applied to kb output (see Aaron Lun's reply at https://support.bioconductor.org/p/p133254/).

Kind regards,
Mike

time kb count \
-i ${KALLISTO_RESOURCE}/transcriptome.idx \
-g ${KALLISTO_RESOURCE}/transcripts_to_genes.txt \
-x 10xv3 --verbose -o ${WORK_DIR}/kallisto_output/ \
${FASTQ_DIR}/*.fastq.gz

Non-zero Exit Status 3221225477.

KB-python fails when it gets to the sorting step. I ran it with --verbose, as shown below and with --lamanno for Velocity. In addition, regardless of whether I input my own barcodes are use the default whitelist, the issue persists. I see that it's stating DEBUG Read in 0 BUS records so I'm curious if it's unable to read in the BUS file produced in the step before.

[2020-02-06 12:11:15,758]   DEBUG Printing verbose output
[2020-02-06 12:11:15,758]   DEBUG Creating tmp directory
[2020-02-06 12:11:15,759]   DEBUG Namespace(c1='spliced_t2c.txt', c2='unspliced_t2c.txt', command='count', fastqs=['nuclei_900_S1_L001_R1_001.fastq.gz', 'nuclei_900_S1_L001_R2_001.fastq.gz'], filter=None, g='t2g.txt', h5ad=False, i='index.idx', keep_tmp=True, lamanno=True, list=False, loom=False, m='4G', nucleus=False, o='out', overwrite=True, t=2, verbose=True, w='barcodes.txt', x='10XV2')
[2020-02-06 12:11:15,759]    INFO Generating BUS file from
[2020-02-06 12:11:15,759]    INFO         nuclei_900_S1_L001_R1_001.fastq.gz
[2020-02-06 12:11:15,759]    INFO         nuclei_900_S1_L001_R2_001.fastq.gz
[2020-02-06 12:11:15,759]   DEBUG c:\users\user1\appdata\local\r-mini~1\envs\r-reti~1\lib\site-packages\kb_python\bins\windows\kallisto\kallisto.exe bus -i index.idx -o out -x 10XV2 -t 2 nuclei_900_S1_L001_R1_001.fastq.gz nuclei_900_S1_L001_R2_001.fastq.gz
[2020-02-06 12:16:46,896]   DEBUG 
[2020-02-06 12:16:46,896]   DEBUG [index] k-mer length: 31
[2020-02-06 12:16:46,896]   DEBUG [index] number of targets: 845,338
[2020-02-06 12:16:46,896]   DEBUG [index] number of k-mers: 271,648,279
[2020-02-06 12:16:46,896]   DEBUG [index] number of equivalence classes: 4,776,424
[2020-02-06 12:16:46,896]   DEBUG [quant] will process sample 1: nuclei_900_S1_L001_R1_001.fastq.gz
[2020-02-06 12:16:46,896]   DEBUG nuclei_900_S1_L001_R2_001.fastq.gz
[2020-02-06 12:16:46,896]   DEBUG [quant] finding pseudoalignments for the reads ... done
[2020-02-06 12:16:46,896]   DEBUG [quant] processed 24,393,189 reads, 3,476,240 reads pseudoaligned
[2020-02-06 12:16:46,943]    INFO Sorting BUS file out\output.bus to tmp\output.s.bus
[2020-02-06 12:16:46,943]   DEBUG c:\users\user1\appdata\local\r-mini~1\envs\r-reti~1\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe sort -o tmp\output.s.bus -T tmp -t 2 -m 4G out\output.bus
[2020-02-06 12:16:49,176]   DEBUG Read in 0 BUS records
[2020-02-06 12:16:49,176]   ERROR An exception occurred
Traceback (most recent call last):
  File "c:\users\user1\appdata\local\r-mini~1\envs\r-reti~1\lib\site-packages\kb_python\main.py", line 483, in main
    COMMAND_TO_FUNCTION[args.command](args)
  File "c:\users\user1\appdata\local\r-mini~1\envs\r-reti~1\lib\site-packages\kb_python\main.py", line 120, in parse_count
    count_velocity(
  File "c:\users\user1\appdata\local\r-mini~1\envs\r-reti~1\lib\site-packages\kb_python\count.py", line 684, in count_velocity
    sort_result = bustools_sort(
  File "c:\users\user1\appdata\local\r-mini~1\envs\r-reti~1\lib\site-packages\kb_python\count.py", line 97, in bustools_sort
    run_executable(command)
  File "c:\users\user1\appdata\local\r-mini~1\envs\r-reti~1\lib\site-packages\kb_python\utils.py", line 147, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command 'c:\users\user1\appdata\local\r-mini~1\envs\r-reti~1\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe sort -o tmp\output.s.bus -T tmp -t 2 -m 4G out\output.bus' returned non-zero exit status 32212
```25477.

Error in merging bus records in kb count

Hi!

I am currently learning how to use kb ref and kb count to be able to perform scRNA velocity analysis with scvelo. The code works otherwise perfectly, but an error occurs every time in the step of merging bus records. This is the code I have been using for the counting:

!kb count --h5ad -i index.idx.cdna, index.idx_intron.0, index.idx_intron.1, index.idx_intron.2, index.idx_intron.3, index.idx_intron.4, index.idx_intron.5, index.idx_intron.6
-g t2g.txt -x 10xv2 -o H22
-c1 spliced_t2c.txt -c2 unspliced_t2c.txt --workflow lamanno --filter bustools -t 2
H22_HFD_5d_S21_L001_R1_001.fastq.gz
H22_HFD_5d_S21_L001_R2_001.fastq.gz

And the output looks like this:

[2020-12-08 16:00:30,178] WARNING Multiple indices were provided. Aligning to split indices is currently EXPERIMENTAL and results in loss of reads. It is recommended to use a single index until this feature is fully supported. Use at your own risk!
[2020-12-08 16:00:30,178] INFO Generating BUS file using 8 indices
[2020-12-08 16:00:30,178] INFO Using index index.idx.cdna to generate BUS file to H22/tmp/bus_part0 from
[2020-12-08 16:00:30,178] INFO H22_HFD_5d_S21_L001_R1_001.fastq.gz
[2020-12-08 16:00:30,178] INFO H22_HFD_5d_S21_L001_R2_001.fastq.gz
[2020-12-08 16:02:08,187] INFO Sorting BUS file H22/tmp/bus_part0/output.bus to /Users/lackmama/Desktop/scvelo5/H22/tmp/tmplxak7mqk
[2020-12-08 16:02:12,964] INFO Using index index.idx_intron.0 to generate BUS file to H22/tmp/bus_part1 from
[2020-12-08 16:02:12,965] INFO H22_HFD_5d_S21_L001_R1_001.fastq.gz
[2020-12-08 16:02:12,965] INFO H22_HFD_5d_S21_L001_R2_001.fastq.gz
[2020-12-08 16:04:55,613] INFO Sorting BUS file H22/tmp/bus_part1/output.bus to /Users/lackmama/Desktop/scvelo5/H22/tmp/tmp624o_rum
[2020-12-08 16:04:57,691] INFO Using index index.idx_intron.1 to generate BUS file to H22/tmp/bus_part2 from
[2020-12-08 16:04:57,691] INFO H22_HFD_5d_S21_L001_R1_001.fastq.gz
[2020-12-08 16:04:57,691] INFO H22_HFD_5d_S21_L001_R2_001.fastq.gz
[2020-12-08 16:07:38,282] INFO Sorting BUS file H22/tmp/bus_part2/output.bus to /Users/lackmama/Desktop/scvelo5/H22/tmp/tmp8dldfqyk
[2020-12-08 16:07:40,419] INFO Using index index.idx_intron.2 to generate BUS file to H22/tmp/bus_part3 from
[2020-12-08 16:07:40,419] INFO H22_HFD_5d_S21_L001_R1_001.fastq.gz
[2020-12-08 16:07:40,419] INFO H22_HFD_5d_S21_L001_R2_001.fastq.gz
[2020-12-08 16:10:17,469] INFO Sorting BUS file H22/tmp/bus_part3/output.bus to /Users/lackmama/Desktop/scvelo5/H22/tmp/tmpj2_npgbv
[2020-12-08 16:10:19,550] INFO Using index index.idx_intron.3 to generate BUS file to H22/tmp/bus_part4 from
[2020-12-08 16:10:19,550] INFO H22_HFD_5d_S21_L001_R1_001.fastq.gz
[2020-12-08 16:10:19,550] INFO H22_HFD_5d_S21_L001_R2_001.fastq.gz
[2020-12-08 16:12:58,836] INFO Sorting BUS file H22/tmp/bus_part4/output.bus to /Users/lackmama/Desktop/scvelo5/H22/tmp/tmptxxr9sx8
[2020-12-08 16:13:01,016] INFO Using index index.idx_intron.4 to generate BUS file to H22/tmp/bus_part5 from
[2020-12-08 16:13:01,016] INFO H22_HFD_5d_S21_L001_R1_001.fastq.gz
[2020-12-08 16:13:01,016] INFO H22_HFD_5d_S21_L001_R2_001.fastq.gz
[2020-12-08 16:15:43,316] INFO Sorting BUS file H22/tmp/bus_part5/output.bus to /Users/lackmama/Desktop/scvelo5/H22/tmp/tmp72sn8l46
[2020-12-08 16:15:45,361] INFO Using index index.idx_intron.5 to generate BUS file to H22/tmp/bus_part6 from
[2020-12-08 16:15:45,361] INFO H22_HFD_5d_S21_L001_R1_001.fastq.gz
[2020-12-08 16:15:45,361] INFO H22_HFD_5d_S21_L001_R2_001.fastq.gz
[2020-12-08 16:18:23,235] INFO Sorting BUS file H22/tmp/bus_part6/output.bus to /Users/lackmama/Desktop/scvelo5/H22/tmp/tmp5ar_yaoi
[2020-12-08 16:18:25,443] INFO Using index index.idx_intron.6 to generate BUS file to H22/tmp/bus_part7 from
[2020-12-08 16:18:25,443] INFO H22_HFD_5d_S21_L001_R1_001.fastq.gz
[2020-12-08 16:18:25,443] INFO H22_HFD_5d_S21_L001_R2_001.fastq.gz
[2020-12-08 16:21:11,127] INFO Sorting BUS file H22/tmp/bus_part7/output.bus to /Users/lackmama/Desktop/scvelo5/H22/tmp/tmpy29yy5h9
[2020-12-08 16:21:13,165] INFO Mashing BUS records to H22 from
[2020-12-08 16:21:13,165] INFO H22/tmp/bus_part0
[2020-12-08 16:21:13,165] INFO H22/tmp/bus_part1
[2020-12-08 16:21:13,165] INFO H22/tmp/bus_part2
[2020-12-08 16:21:13,165] INFO H22/tmp/bus_part3
[2020-12-08 16:21:13,165] INFO H22/tmp/bus_part4
[2020-12-08 16:21:13,165] INFO H22/tmp/bus_part5
[2020-12-08 16:21:13,165] INFO H22/tmp/bus_part6
[2020-12-08 16:21:13,165] INFO H22/tmp/bus_part7
[2020-12-08 16:22:15,564] INFO Sorting BUS file H22/mashed.bus to /Users/lackmama/Desktop/scvelo5/H22/tmp/tmp3uisw2bq
[2020-12-08 16:22:25,219] INFO Merging BUS records in H22/mashed.bus to H22
[2020-12-08 16:23:20,468] ERROR An exception occurred
Traceback (most recent call last):
File "/Users/lackmama/miniconda3/lib/python3.8/site-packages/kb_python/main.py", line 785, in main
COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
File "/Users/lackmama/miniconda3/lib/python3.8/site-packages/kb_python/main.py", line 195, in parse_count
count_velocity(
File "/Users/lackmama/miniconda3/lib/python3.8/site-packages/kb_python/count.py", line 1450, in count_velocity
bus_result = kallisto_bus_split(
File "/Users/lackmama/miniconda3/lib/python3.8/site-packages/kb_python/count.py", line 228, in kallisto_bus_split
merge_result = bustools_merge(
File "/Users/lackmama/miniconda3/lib/python3.8/site-packages/kb_python/validate.py", line 124, in inner
validate(arg)
File "/Users/lackmama/miniconda3/lib/python3.8/site-packages/kb_python/validate.py", line 85, in validate
VALIDATORSext
File "/Users/lackmama/miniconda3/lib/python3.8/site-packages/kb_python/validate.py", line 40, in validate_bus
raise FileVerificationFailed('{} has no BUS records'.format(path))
kb_python.validate.FileVerificationFailed: H22/merged.bus has no BUS records

What would be the best solution for tackling this issue?

Thanks in advance! :)

Br,
Madeleine

Error while sorting

I am trying to run the R notebook Analysis of single-cell RNA-seq data: building and annotating an atlas tutorial locally on a PC running Windows 10. I'm having issues with the following code:

system("kb count -i index.idx -g t2g.txt -x 10xv3 -o output --filter bustools -t 12 pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz",intern=TRUE)

The count function appears to run normally for a few minutes and generates several output files, but then gives the following error when it comes time to sort:

[1] "[2020-03-24 14:03:32,044] INFO Generating BUS file from"
[2] "[2020-03-24 14:03:32,044] INFO pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz"
[3] "[2020-03-24 14:03:32,044] INFO pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz"
[4] "[2020-03-24 14:03:32,044] INFO pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz"
[5] "[2020-03-24 14:03:32,044] INFO pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz"
[6] "[2020-03-24 14:06:23,556] INFO Sorting BUS file output\output.bus to tmp\output.s.bus"
[7] "[2020-03-24 14:06:27,088] ERROR An exception occurred"
[8] "Traceback (most recent call last):"
[9] " File "c:\users\brand\anaconda3\lib\site-packages\kb_python\main.py", line 476, in main"
[10] " COMMAND_TO_FUNCTIONargs.command"
[11] " File "c:\users\brand\anaconda3\lib\site-packages\kb_python\main.py", line 148, in parse_count"
[12] " h5ad=args.h5ad,"
[13] " File "c:\users\brand\anaconda3\lib\site-packages\kb_python\count.py", line 418, in count"
[14] " memory=memory"
[15] " File "c:\users\brand\anaconda3\lib\site-packages\kb_python\count.py", line 94, in bustools_sort"
[16] " run_executable(command)"
[17] " File "c:\users\brand\anaconda3\lib\site-packages\kb_python\utils.py", line 114, in run_executable"
[18] " raise sp.CalledProcessError(p.returncode, ' '.join(command))"
[19] "subprocess.CalledProcessError: Command 'c:\users\brand\anaconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe sort -o tmp\output.s.bus -T tmp -t 12 -m 4G output\output.bus' returned non-zero exit status 3221225477."

An output folder is created and contains a 1.2 Gb output.bus file, but when I rerun:

C:\Users\brand\Documents\R\Kallisto.Bustools_test>c:\users\brand\anaconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe sort -o tmp\output.s.bus -T tmp -t 12 -m 4G output\output.bus

I get the following output:

Read in 0 BUS records

kb count pseudoalign to multiple genes

Hi,

I read the document and it seems that kb count could support multiple mapping, as mentioned here: https://www.kallistobus.tools/introduction:
--mm Include reads that pseudoalign to multiple genes.
But when I tried to use this, the error says kb: error: unrecognized arguments: --mm and I am using kb_python 0.24.4

Please let me know if there is anything I missed. Thank you very much for your help in advance.

Best,
Alice

filtered gene count matrix

Hi,

I was wondering what kind of filtering is actually happening when the --filter bustools flag is applied. Couldn't really find out whats going on and in your example notebooks you just use the unfiltered output. Could you elaborate on that?
Sorry, if this is not the correct platform to ask such questions...

Cheers,
Manuel.

Issues with count-kite?

I tried reinstalling count-kite (want to analyze nuclei data) using the command in the kb tutorial:

pip install git+https://github.com/pachterlab/kb_python@count_kite

But got the error:

Cloning https://github.com/pachterlab/kb_python (to revision count-kite) to /tmp/18804074.1.interactive/pip-req-build-5agd4eia
  Running command git clone -q https://github.com/pachterlab/kb_python /tmp/18804074.1.interactive/pip-req-build-5agd4eia
  WARNING: Did not find branch or tag 'count-kite', assuming revision or ref.
  Running command git checkout -q count-kite
  error: pathspec 'count-kite' did not match any file(s) known to git.

Looking at the kb_python github can't seem to find the count_kite branch. Trying to figure out what is going on--is it an issue on my end or something else?

kb count on single-end reads

I need to run kb count on single cells from an experiment with single-end reads generated with smartseq2 technology.

Going through the documentation I couldn't find how to deal with single reads in kb. Does anyone know how to deal with this?

(I know one option is use kallisto + bustools (--single) but I don't think it's possible in this case because they don't support smartseq2)

Optional parameter to specify the temporal directory

Hi,

Could you add the --tempdir option for the command kb count?

Although the temporal directory in the python code seems to be hardcoded as "tmp",

temp_dir='tmp',

there seems not implemented as a argument of the command kb count.

In my environment (grid engine), this part might have caused an error;
I tried to perform multiple kb count in a common working directory, but some of them were terminated.

I'm not sure but perhaps this is caused because multiple jobs tried to access the same temporal directory (tmp).

After cd and kb count in each output directory, this type of error has no more occurred.

returned non-zero exit status 1 error

Hi I would like to run a kallisto and bustolls analyze on my data at university server , however, I always got this error.

[ts15433@bp1-login01 ts15433]$ kb count --overwrite --h5ad -i index.idx -g t2g.txt -x 10XV1 -o output --filter bustools -t 2 /work/ts15433/SRR1853178_1.fastq.gz /work/ts15433/SRR1853178_2.fastq.gz

[2020-06-24 19:39:10,280] INFO Generating BUS file from
[2020-06-24 19:39:10,280] INFO /work/ts15433/SRR1853178_1.fastq.gz
[2020-06-24 19:39:10,280] INFO /work/ts15433/SRR1853178_2.fastq.gz
[2020-06-24 19:39:10,291] ERROR An exception occurred
Traceback (most recent call last):
File "/home/ts15433/.local/lib/python3.7/site-packages/kb_python/main.py", line 483, in main
COMMAND_TO_FUNCTIONargs.command
File "/home/ts15433/.local/lib/python3.7/site-packages/kb_python/main.py", line 150, in parse_count
h5ad=args.h5ad,
File "/home/ts15433/.local/lib/python3.7/site-packages/kb_python/count.py", line 508, in count
fastqs, index_path, technology, out_dir, threads=threads
File "/home/ts15433/.local/lib/python3.7/site-packages/kb_python/count.py", line 65, in kallisto_bus
run_executable(command)
File "/home/ts15433/.local/lib/python3.7/site-packages/kb_python/utils.py", line 147, in run_executable
raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/ts15433/.local/lib/python3.7/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i index.idx -o output -x 10XV1 -t 2 /work/ts15433/SRR1853178_1.fastq.gz /work/ts15433/SRR1853178_2.fastq.gz' returned non-zero exit status 1.

Key error "KeyError: 'DN100000_c0_g1_i1'"

Describe the issue
I am working on a non-model organism, and trying to use our home-brew GTF file to create the kallisto index so I can map.
I get a key error and I can't figure out what part of my GTF is not formatted the way kb expects. I looked at #48 too, but couldn't quite make it work :(

What is the exact command that was run?

kb ref --keep-tmp --verbose -i /path/to/pdum_kallisto_index.idx -g /path/to/pdum_kallisto_t2g.txt -f1 /path/to/pdum_kallisto_cdna.fa /path/to/trinity_out_dir.Trinity_shortheader.fasta /path/to/190929_platy_kevin1million_stranded.gtf

Command output (with --verbose flag)

[2020-07-22 09:37:02,473]   DEBUG Printing verbose output
[2020-07-22 09:37:02,473]   DEBUG Creating tmp directory
[2020-07-22 09:37:02,473]   DEBUG Namespace(c1=None, c2=None, command='ref', d=None, f1='/path/to/pdum_kallisto_cdna.fa', f2=None, fasta='/path/to/trinity_out_dir.Trinity_shortheader.fasta', g='/path/to/pdum_kallisto_t2g.txt', gtf='/path/to/190929_platy_kevin1million_stranded.gtf', i='/path/to/pdum_kallisto_index.idx', keep_tmp=True, lamanno=False, list=False, overwrite=False, verbose=True)
[2020-07-22 09:37:02,473]    INFO Decompressing /path/to/190929_platy_kevin1million_stranded.gtf to tmp
[2020-07-22 09:37:02,473]    INFO Creating transcript-to-gene mapping at /path/to/pdum_kallisto_t2g.txt
[2020-07-22 09:37:07,693]    INFO Decompressing /path/to/trinity_out_dir.Trinity_shortheader.fasta to tmp
[2020-07-22 09:37:07,693]    INFO Sorting /path/to/trinity_out_dir.Trinity_shortheader.fasta
[2020-07-22 09:37:56,909]   DEBUG Sorting 1085375 FASTA entries
[2020-07-22 09:37:57,391]   DEBUG Writing sorted FASTA tmp/sorted.fa
[2020-07-22 09:38:45,022]    INFO Sorting /path/to/190929_platy_kevin1million_stranded.gtf
[2020-07-22 09:38:54,723]   DEBUG Sorting 1085375 GTF entries
[2020-07-22 09:38:54,991]   DEBUG Writing sorted GTF tmp/sorted.gtf
[2020-07-22 09:39:00,938]    INFO Splitting genome into cDNA at /path/to/pdum_kallisto_cdna.fa
[2020-07-22 09:39:01,020]   DEBUG Generating cDNA from chromosome DN100000_c0_g1_i1
[2020-07-22 09:39:01,037]   DEBUG Writing 1 cDNA transcripts
[2020-07-22 09:39:01,037]   ERROR An exception occurred
Traceback (most recent call last):
  File "/conda/path/lib/python3.7/site-packages/kb_python/main.py", line 483, in main
    COMMAND_TO_FUNCTION[args.command](args)
  File "/conda/path/lib/python3.7/site-packages/kb_python/main.py", line 109, in parse_ref
    overwrite=args.overwrite
  File "/conda/path/lib/python3.7/site-packages/kb_python/ref.py", line 284, in ref
    sorted_fasta_path, sorted_gtf_path, cdna_path
  File "/conda/path/lib/python3.7/site-packages/kb_python/fasta.py", line 237, in generate_cdna_fasta
    attributes = transcript_infos[transcript]
KeyError: 'DN100000_c0_g1_i1'

Sample from the GTF file

DN249562_c0_g1_i1	transcriptome	exon	1	276	.	-	.	gene_id "DN249562_c0_g1"; transcript_id "DN249562_c0_g1_i1"; gene_name "DN249562_c0_g1";
DN249518_c0_g1_i1	transcriptome	exon	1	226	.	-	.	gene_id "DN249518_c0_g1"; transcript_id "DN249518_c0_g1_i1"; gene_name "DN249518_c0_g1";
DN249553_c0_g1_i1	transcriptome	exon	1	256	.	-	.	gene_id "DN249553_c0_g1"; transcript_id "DN249553_c0_g1_i1"; gene_name "DN249553_c0_g1";
DN249519_c0_g1_i1	transcriptome	exon	1	208	.	-	.	gene_id "DN249519_c0_g1"; transcript_id "DN249519_c0_g1_i1"; gene_name "DN249519_c0_g1";
DN249548_c0_g1_i1	transcriptome	exon	1	229	.	-	.	gene_id "DN249548_c0_g1"; transcript_id "DN249548_c0_g1_i1"; gene_name "DN249548_c0_g1";
DN249501_c0_g1_i1	transcriptome	exon	1	213	.	-	.	gene_id "DN249501_c0_g1"; transcript_id "DN249501_c0_g1_i1"; gene_name "DN249501_c0_g1";
DN249580_c0_g1_i1	transcriptome	exon	1	389	.	-	.	gene_id "DN249580_c0_g1"; transcript_id "DN249580_c0_g1_i1"; gene_name "DN249580_c0_g1";
DN249557_c0_g1_i1	transcriptome	exon	1	217	.	-	.	gene_id "DN249557_c0_g1"; transcript_id "DN249557_c0_g1_i1"; gene_name "DN249557_c0_g1";
DN249507_c0_g1_i1	transcriptome	exon	1	333	.	-	.	gene_id "DN249507_c0_g1"; transcript_id "DN249507_c0_g1_i1"; gene_name "DN249507_c0_g1";
DN249505_c0_g1_i1	transcriptome	exon	1	234	.	-	.	gene_id "DN249505_c0_g1"; transcript_id "DN249505_c0_g1_i1"; gene_name "DN249505_c0_g1";

Browsing the code gave me the impression that kallisto is looking for the "transcript" keyword. Should I replace "exon" with "transcript"?

Errors in merged ref index: no BUS records

Hi,

I'm learning how to use kallisto to generate loom files and then run scvelo.
I'm working on mouse genome, so I followed a tutorial from kb to build the index, I split the intron into 8 parts, and then I put all the parts of the index of the intron to preprocess my samples.

Here's my code:
!kb count --loom -i index.idx,index.idx_intron.6,index.idx_intron.5,index.idx_intron.4,index.idx_intron.3,index.idx_intron.2,index.idx_intron.1,index.idx_intron.0,index.idx_cdna -g t2g.txt -x 10xv3 -o AIRE_GW_Het_fastqs
-c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno --filter bustools
AIRE_GW_Het_S27_L004_R1_001.fastq.gz
AIRE_GW_Het_S27_L004_R2_001.fastq.gz

The error:
[2021-03-01 08:36:41,313] WARNING Multiple indices were provided. Aligning to split indices is currently EXPERIMENTAL and results in loss of reads. It is recommended to use a single index until this feature is fully supported. Use at your own risk!
[2021-03-01 08:36:41,313] INFO Skipping kallisto bus because output files already exist. Use the --overwrite flag to overwrite.
[2021-03-01 08:36:41,313] INFO Sorting BUS file AIRE_GW_Het_fastqs/output.bus to AIRE_GW_Het_fastqs/tmp/output.s.bus
[2021-03-01 08:38:33,678] INFO Whitelist not provided
[2021-03-01 08:38:33,678] INFO Copying pre-packaged 10XV3 whitelist to AIRE_GW_Het_fastqs
[2021-03-01 08:38:34,318] INFO Inspecting BUS file AIRE_GW_Het_fastqs/tmp/output.s.bus
[2021-03-01 08:39:07,192] INFO Correcting BUS records in AIRE_GW_Het_fastqs/tmp/output.s.bus to AIRE_GW_Het_fastqs/tmp/output.s.c.bus with whitelist AIRE_GW_Het_fastqs/10xv3_whitelist.txt
[2021-03-01 08:39:46,125] INFO Sorting BUS file AIRE_GW_Het_fastqs/tmp/output.s.c.bus to AIRE_GW_Het_fastqs/output.unfiltered.bus
[2021-03-01 08:40:33,482] INFO Capturing records from BUS file AIRE_GW_Het_fastqs/output.unfiltered.bus to AIRE_GW_Het_fastqs/tmp/spliced.bus with capture list intron_t2c.txt
[2021-03-01 08:41:06,922] INFO Sorting BUS file AIRE_GW_Het_fastqs/tmp/spliced.bus to AIRE_GW_Het_fastqs/spliced.unfiltered.bus
[2021-03-01 08:41:46,465] INFO Inspecting BUS file AIRE_GW_Het_fastqs/spliced.unfiltered.bus
[2021-03-01 08:42:18,782] INFO Generating count matrix AIRE_GW_Het_fastqs/counts_unfiltered/spliced from BUS file AIRE_GW_Het_fastqs/spliced.unfiltered.bus
[2021-03-01 08:42:58,084] INFO Capturing records from BUS file AIRE_GW_Het_fastqs/output.unfiltered.bus to AIRE_GW_Het_fastqs/tmp/unspliced.bus with capture list cdna_t2c.txt
[2021-03-01 08:43:04,782] ERROR An exception occurred
Traceback (most recent call last):
File "/Users/Yi/opt/anaconda3/lib/python3.8/site-packages/kb_python/main.py", line 837, in main
COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
File "/Users/Yi/opt/anaconda3/lib/python3.8/site-packages/kb_python/main.py", line 197, in parse_count
count_velocity(
File "/Users/Yi/opt/anaconda3/lib/python3.8/site-packages/kb_python/count.py", line 1567, in count_velocity
capture_result = bustools_capture(
File "/Users/Yi/opt/anaconda3/lib/python3.8/site-packages/kb_python/validate.py", line 124, in inner
validate(arg)
File "/Users/Yi/opt/anaconda3/lib/python3.8/site-packages/kb_python/validate.py", line 85, in validate
VALIDATORSext
File "/Users/Yi/opt/anaconda3/lib/python3.8/site-packages/kb_python/validate.py", line 40, in validate_bus
raise FileVerificationFailed('{} has no BUS records'.format(path))
kb_python.validate.FileVerificationFailed: AIRE_GW_Het_fastqs/tmp/unspliced.bus has no BUS records

I also tried to generate one index for the intron, however, the code was running for over 72h that I had to kill it.

Could you please give me some suggestions?

Thank you so much in advance!

Best,
Yi

Issues with combining matrices for single cell nuclei data

Hi
We have some single cell nuclei data. I was following the google colab notebook : https://colab.research.google.com/github/pachterlab/kallistobustools/blob/master/notebooks/kb_single_nucleus.ipynb

I was successfully able to generate the index.

We had run cell ranger before so we wanted to do a comparision.Hence I provided the --cellranger flag. It did generate the spliced and unspliced matrix (in the cellranger folder). But it did not add them up to give me the final summed matrix.

Command run

kb count -i $INDEX_PATH/grch38.idx -g $INDEX_PATH/t2g.txt -c1 $INDEX_PATH/cdna_t2c.txt -c2 $INDEX_PATH/intron_t2c.txt -x 10xv3 -o kaliisto_Bustools_out -t 32 --workflow nucleus --cellranger \
 Sample-100-ATCTTTAG_S72_L003_R1_001.fastq.gz Sample-100-ATCTTTAG_S72_L003_R2_001.fastq.gz \
 Sample-100-ATCTTTAG_S72_L004_R1_001.fastq.gz Sample-100-ATCTTTAG_S72_L004_R2_001.fastq.gz \
 Sample-100-CAGAGGCC_S70_L003_R1_001.fastq.gz Sample-100-CAGAGGCC_S70_L003_R2_001.fastq.gz \
 Sample-100-CAGAGGCC_S70_L004_R1_001.fastq.gz Sample-100-CAGAGGCC_S70_L004_R2_001.fastq.gz \
 Sample-100-GGTCAATA_S71_L003_R1_001.fastq.gz Sample-100-GGTCAATA_S71_L003_R2_001.fastq.gz \
 Sample-100-GGTCAATA_S71_L004_R1_001.fastq.gz Sample-100-GGTCAATA_S71_L004_R2_001.fastq.gz \
 Sample-100-TCAGCCGT_S69_L003_R1_001.fastq.gz Sample-100-TCAGCCGT_S69_L003_R2_001.fastq.gz \
 Sample-100-TCAGCCGT_S69_L004_R1_001.fastq.gz Sample-100-TCAGCCGT_S69_L004_R2_001.fastq.gz  

log

[2020-02-27 20:04:21,204]    INFO Using index /pub/hgshukla/Swarup_Lab/scRNA_seq/reference_index/Kallisto_bustools/grch38.idx to generate BUS file to kaliisto_Bustools_out from
[2020-02-27 20:04:21,204]    INFO         Sample-100-ATCTTTAG_S72_L003_R1_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-ATCTTTAG_S72_L003_R2_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-ATCTTTAG_S72_L004_R1_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-ATCTTTAG_S72_L004_R2_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-CAGAGGCC_S70_L003_R1_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-CAGAGGCC_S70_L003_R2_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-CAGAGGCC_S70_L004_R1_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-CAGAGGCC_S70_L004_R2_001.fastq.gz
[2020-02-27 20:04:21,204]    INFO         Sample-100-GGTCAATA_S71_L003_R1_001.fastq.gz
[2020-02-27 20:04:21,205]    INFO         Sample-100-GGTCAATA_S71_L003_R2_001.fastq.gz
[2020-02-27 20:04:21,205]    INFO         Sample-100-GGTCAATA_S71_L004_R1_001.fastq.gz
[2020-02-27 20:04:21,205]    INFO         Sample-100-GGTCAATA_S71_L004_R2_001.fastq.gz
[2020-02-27 20:04:21,205]    INFO         Sample-100-TCAGCCGT_S69_L003_R1_001.fastq.gz
[2020-02-27 20:04:21,205]    INFO         Sample-100-TCAGCCGT_S69_L003_R2_001.fastq.gz
[2020-02-27 20:04:21,205]    INFO         Sample-100-TCAGCCGT_S69_L004_R1_001.fastq.gz
[2020-02-27 20:04:21,205]    INFO         Sample-100-TCAGCCGT_S69_L004_R2_001.fastq.gz
[2020-02-27 21:53:27,535]    INFO Sorting BUS file kaliisto_Bustools_out/output.bus to kaliisto_Bustools_out/tmp/output.s.bus
[2020-02-27 21:56:01,947]    INFO Whitelist not provided
[2020-02-27 21:56:01,947]    INFO Copying pre-packaged 10XV3 whitelist to kaliisto_Bustools_out
[2020-02-27 21:56:03,236]    INFO Inspecting BUS file kaliisto_Bustools_out/tmp/output.s.bus
[2020-02-27 21:58:58,426]    INFO Correcting BUS records in kaliisto_Bustools_out/tmp/output.s.bus to kaliisto_Bustools_out/tmp/output.s.c.bus with whitelist kaliisto_Bustools_out/10xv3_whitelist.txt
[2020-02-27 22:00:13,737]    INFO Sorting BUS file kaliisto_Bustools_out/tmp/output.s.c.bus to kaliisto_Bustools_out/output.unfiltered.bus
[2020-02-27 22:01:16,924]    INFO Capturing records from BUS file kaliisto_Bustools_out/output.unfiltered.bus to kaliisto_Bustools_out/tmp/spliced.bus with capture list /pub/hgshukla/Swarup_Lab/scRNA_seq/reference_index/Kallisto_bustools/intron_t2c.txt
[2020-02-27 22:04:10,304]    INFO Sorting BUS file kaliisto_Bustools_out/tmp/spliced.bus to kaliisto_Bustools_out/spliced.unfiltered.bus
[2020-02-27 22:04:24,324]    INFO Inspecting BUS file kaliisto_Bustools_out/spliced.unfiltered.bus
[2020-02-27 22:06:27,978]    INFO Generating count matrix kaliisto_Bustools_out/counts_unfiltered/spliced from BUS file kaliisto_Bustools_out/spliced.unfiltered.bus
[2020-02-27 22:09:49,903]    INFO Writing matrix in cellranger format to kaliisto_Bustools_out/counts_unfiltered/cellranger_spliced
[2020-02-27 22:10:49,319]    INFO Capturing records from BUS file kaliisto_Bustools_out/output.unfiltered.bus to kaliisto_Bustools_out/tmp/unspliced.bus with capture list /pub/hgshukla/Swarup_Lab/scRNA_seq/reference_index/Kallisto_bustools/cdna_t2c.txt
[2020-02-27 22:14:55,311]    INFO Sorting BUS file kaliisto_Bustools_out/tmp/unspliced.bus to kaliisto_Bustools_out/unspliced.unfiltered.bus
[2020-02-27 22:15:37,445]    INFO Inspecting BUS file kaliisto_Bustools_out/unspliced.unfiltered.bus
[2020-02-27 22:18:03,936]    INFO Generating count matrix kaliisto_Bustools_out/counts_unfiltered/unspliced from BUS file kaliisto_Bustools_out/unspliced.unfiltered.bus
[2020-02-27 22:22:13,910]    INFO Writing matrix in cellranger format to kaliisto_Bustools_out/counts_unfiltered/cellranger_unspliced

So I decided to run with parameters similar to the ones given in the notebook (adding the --h5ad flag and skipping the --cellranger flag) . This time it does start combining the matrices but throws some error while doing so.

Command run

kb count -i $INDEX_PATH/grch38.idx -g $INDEX_PATH/t2g.txt -c1 $INDEX_PATH/cdna_t2c.txt -c2 $INDEX_PATH/intron_t2c.txt -x 10xv3 -o kaliisto_Bustools_out -t 32 --workflow nucleus --h5ad \
 Sample-100-ATCTTTAG_S72_L003_R1_001.fastq.gz Sample-100-ATCTTTAG_S72_L003_R2_001.fastq.gz \
 Sample-100-ATCTTTAG_S72_L004_R1_001.fastq.gz Sample-100-ATCTTTAG_S72_L004_R2_001.fastq.gz \
 Sample-100-CAGAGGCC_S70_L003_R1_001.fastq.gz Sample-100-CAGAGGCC_S70_L003_R2_001.fastq.gz \
 Sample-100-CAGAGGCC_S70_L004_R1_001.fastq.gz Sample-100-CAGAGGCC_S70_L004_R2_001.fastq.gz \
 Sample-100-GGTCAATA_S71_L003_R1_001.fastq.gz Sample-100-GGTCAATA_S71_L003_R2_001.fastq.gz \
 Sample-100-GGTCAATA_S71_L004_R1_001.fastq.gz Sample-100-GGTCAATA_S71_L004_R2_001.fastq.gz \
 Sample-100-TCAGCCGT_S69_L003_R1_001.fastq.gz Sample-100-TCAGCCGT_S69_L003_R2_001.fastq.gz \
 Sample-100-TCAGCCGT_S69_L004_R1_001.fastq.gz Sample-100-TCAGCCGT_S69_L004_R2_001.fastq.gz

Log

/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/anndata/_core/anndata.py:21: FutureWarning:

pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.

[2020-02-27 23:12:18,792]    INFO Using index /pub/hgshukla/Swarup_Lab/scRNA_seq/reference_index/Kallisto_bustools/grch38.idx to generate BUS file to kaliisto_Bustools_out from
[2020-02-27 23:12:18,792]    INFO         Sample-100-ATCTTTAG_S72_L003_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-ATCTTTAG_S72_L003_R2_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-ATCTTTAG_S72_L004_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-ATCTTTAG_S72_L004_R2_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-CAGAGGCC_S70_L003_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-CAGAGGCC_S70_L003_R2_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-CAGAGGCC_S70_L004_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-CAGAGGCC_S70_L004_R2_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-GGTCAATA_S71_L003_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-GGTCAATA_S71_L003_R2_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-GGTCAATA_S71_L004_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-GGTCAATA_S71_L004_R2_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-TCAGCCGT_S69_L003_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-TCAGCCGT_S69_L003_R2_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-TCAGCCGT_S69_L004_R1_001.fastq.gz
[2020-02-27 23:12:18,792]    INFO         Sample-100-TCAGCCGT_S69_L004_R2_001.fastq.gz
[2020-02-28 00:31:28,302]    INFO Sorting BUS file kaliisto_Bustools_out/output.bus to kaliisto_Bustools_out/tmp/output.s.bus
[2020-02-28 00:33:53,887]    INFO Whitelist not provided
[2020-02-28 00:33:53,887]    INFO Copying pre-packaged 10XV3 whitelist to kaliisto_Bustools_out
[2020-02-28 00:33:55,089]    INFO Inspecting BUS file kaliisto_Bustools_out/tmp/output.s.bus
[2020-02-28 00:36:38,660]    INFO Correcting BUS records in kaliisto_Bustools_out/tmp/output.s.bus to kaliisto_Bustools_out/tmp/output.s.c.bus with whitelist kaliisto_Bustools_out/10xv3_whitelist.txt
[2020-02-28 00:37:54,364]    INFO Sorting BUS file kaliisto_Bustools_out/tmp/output.s.c.bus to kaliisto_Bustools_out/output.unfiltered.bus
[2020-02-28 00:38:56,587]    INFO Capturing records from BUS file kaliisto_Bustools_out/output.unfiltered.bus to kaliisto_Bustools_out/tmp/spliced.bus with capture list /pub/hgshukla/Swarup_Lab/scRNA_seq/reference_index/Kallisto_bustools/intron_t2c.txt
[2020-02-28 00:41:44,099]    INFO Sorting BUS file kaliisto_Bustools_out/tmp/spliced.bus to kaliisto_Bustools_out/spliced.unfiltered.bus
[2020-02-28 00:41:55,983]    INFO Inspecting BUS file kaliisto_Bustools_out/spliced.unfiltered.bus
[2020-02-28 00:43:53,040]    INFO Generating count matrix kaliisto_Bustools_out/counts_unfiltered/spliced from BUS file kaliisto_Bustools_out/spliced.unfiltered.bus
[2020-02-28 00:47:00,535]    INFO Capturing records from BUS file kaliisto_Bustools_out/output.unfiltered.bus to kaliisto_Bustools_out/tmp/unspliced.bus with capture list /pub/hgshukla/Swarup_Lab/scRNA_seq/reference_index/Kallisto_bustools/cdna_t2c.txt
[2020-02-28 00:50:56,512]    INFO Sorting BUS file kaliisto_Bustools_out/tmp/unspliced.bus to kaliisto_Bustools_out/unspliced.unfiltered.bus
[2020-02-28 00:51:42,641]    INFO Inspecting BUS file kaliisto_Bustools_out/unspliced.unfiltered.bus
[2020-02-28 00:54:05,254]    INFO Generating count matrix kaliisto_Bustools_out/counts_unfiltered/unspliced from BUS file kaliisto_Bustools_out/unspliced.unfiltered.bus
[2020-02-28 00:58:12,188]    INFO Reading matrix kaliisto_Bustools_out/counts_unfiltered/spliced.mtx
[2020-02-28 00:58:54,650]    INFO Reading matrix kaliisto_Bustools_out/counts_unfiltered/unspliced.mtx
[2020-02-28 01:00:01,170]    INFO Combining matrices
[2020-02-28 01:02:59,683]   ERROR An exception occurred
Traceback (most recent call last):
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/main.py", line 700, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/main.py", line 174, in parse_count
    count_velocity(
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/count.py", line 1285, in count_velocity
    convert_matrices(
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/count.py", line 588, in convert_matrices
    adata = sum_anndatas(*adatas) if nucleus else overlay_anndatas(*adatas)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/utils.py", line 692, in sum_anndatas
    X=spliced_intersection.X + unspliced_intersection.X,
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/anndata/_core/anndata.py", line 579, in X
    _subset(self._adata_ref.X, (self._oidx, self._vidx)),
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/functools.py", line 874, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/anndata/_core/index.py", line 126, in _subset
    return a[subset_idx]
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/scipy/sparse/_index.py", line 75, in __getitem__
    return self._get_arrayXarray(row, col)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 664, in _get_arrayXarray
    csr_sample_values(M, N, self.indptr, self.indices, self.data,
ValueError: could not convert integer scalar

P.S Also during the first run (with --cellranger option) if i give --report option to generate a report it throws an error while generating the jupyter notebook. I am running the pipeline in a new conda environment and I think it has to do something with that.

The end part of the log

.
[2020-02-27 19:25:36,193]    INFO Writing matrix in cellranger format to kaliisto_Bustools_out/counts_unfiltered/cellranger_unspliced
[2020-02-27 19:27:39,358]    INFO Writing report Jupyter notebook at kaliisto_Bustools_out/report.ipynb and rendering it to kaliisto_Bustools_out/report.html
[2020-02-27 19:27:40,198]   ERROR An exception occurred
Traceback (most recent call last):
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/main.py", line 700, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/main.py", line 174, in parse_count
    count_velocity(
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/count.py", line 1408, in count_velocity
    report_result = render_report(
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/utils.py", line 103, in inner
    return func(*args, **kwargs)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/report.py", line 296, in render_report
    execute_report(temp_path, nb_path, html_path)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/kb_python/report.py", line 240, in execute_report
    ep.preprocess(nb)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py", line 403, in preprocess
    with self.setup_preprocessor(nb, resources, km=km):
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py", line 345, in setup_preprocessor
    self.km, self.kc = self.start_new_kernel(**kwargs)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py", line 291, in start_new_kernel
    km.start_kernel(extra_arguments=self.extra_arguments, **kwargs)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/jupyter_client/manager.py", line 253, in start_kernel
    kernel_cmd = self.format_kernel_cmd(extra_arguments=extra_arguments)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/jupyter_client/manager.py", line 177, in format_kernel_cmd
    cmd = self.kernel_spec.argv + extra_arguments
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/jupyter_client/manager.py", line 83, in kernel_spec
    self._kernel_spec = self.kernel_spec_manager.get_kernel_spec(self.kernel_name)
  File "/data/users/hgshukla/Softwares/miniconda_INS_DIR/miniconda3/envs/kallisto_bustools/lib/python3.8/site-packages/jupyter_client/kernelspec.py", line 235, in get_kernel_spec
    raise NoSuchKernel(kernel_name)
jupyter_client.kernelspec.NoSuchKernel: No such kernel named python3

Any patch for this ??

Also I did refer to this issue(#60). This sample has around 330 million reads. But as far as I understand the sample is OK (or so what I have been told)

Best,
Harsh

kb unable to build index died with <Signals.SIGABRT: 6>

Hi,

I am trying to build velocity index using kb as shown here:https://www.kallistobus.tools/kb_velocity_index.html

This is my command and corresponding log:
kb ref -i index.idx -g transcripts_to_genes.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_transcripts_to_capture.txt -c2 intron_transcripts_to_capture.txt --lamanno GRCm38.primary_assembly.genome.fa gencode.vM23.annotation.gtf
[2019-11-12 15:19:24,787] INFO Decompressing GRCm38.primary_assembly.genome.fa to tmp
[2019-11-12 15:19:24,787] INFO Sorting GRCm38.primary_assembly.genome.fa
[2019-11-12 15:26:02,214] INFO Decompressing gencode.vM23.annotation.gtf to tmp
[2019-11-12 15:26:02,215] INFO Sorting gencode.vM23.annotation.gtf
[2019-11-12 15:26:50,436] INFO Splitting genome into cDNA at cdna.fa
[2019-11-12 15:28:23,864] INFO Creating cDNA transcripts-to-capture at cdna_transcripts_to_capture.txt
[2019-11-12 15:28:24,880] INFO Splitting genome into introns at intron.fa
[2019-11-12 15:32:42,591] INFO Creating intron transcripts-to-capture at cdna_transcripts_to_capture.txt
[2019-11-12 15:32:48,821] INFO Concatenating cDNA and intron FASTAs
[2019-11-12 15:32:54,748] INFO Creating transcript-to-gene mapping at transcripts_to_genes.txt
[2019-11-12 15:33:02,317] INFO Indexing to index.idx
[2019-11-12 15:52:50,171] ERROR An exception occurred
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/kb_python/main.py", line 483, in main
COMMAND_TO_FUNCTIONargs.command
File "/usr/local/lib/python3.7/site-packages/kb_python/main.py", line 100, in parse_ref
overwrite=args.overwrite
File "/usr/local/lib/python3.7/site-packages/kb_python/ref.py", line 373, in ref_lamanno
index_result = kallisto_index(combined_path, index_path)
File "/usr/local/lib/python3.7/site-packages/kb_python/ref.py", line 166, in kallisto_index
run_executable(command)
File "/usr/local/lib/python3.7/site-packages/kb_python/utils.py", line 147, in run_executable
raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/usr/local/lib/python3.7/site-packages/kb_python/bins/linux/kallisto/kallisto index -i index.idx -k 31 tmp/combined.fa' died with <Signals.SIGABRT: 6>.

I am not sure why does this error occur. I checked this https://unix.stackexchange.com/questions/529229/why-is-command-line-program-killed-by-sigabrt but that didn't help much.

Any suggesion on how to resolve it will be helpful.

Thanks,
Gaurav

KITE demultiplexing fails on google colab

Thanks for the great tool, really enjoying working with kallisto!

Describe the issue

kb KITE mismatch index will not build on google colab or on my local machine

What is the exact command that was run?

kb ref -i mismatch.idx -f1 mismatch.fa -g t2g.txt --workflow kite features.tsv
An exception occurred
Traceback (most recent call last):
  File "/tools/anaconda/envs/lhv464/kallisto/lib/python3.7/site-packages/kb_python/main.py", line 670, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/tools/anaconda/envs/lhv464/kallisto/lib/python3.7/site-packages/kb_python/main.py", line 101, in parse_ref
    if len(args.fasta) != len(args.gtf):
TypeError: object of type 'NoneType' has no len()
[2020-01-24 17:29:39,842]   ERROR An exception occurred

kb ref error KeyError: 'gene1'

I want to build a reference using kb ref and it gave me KeyError: 'gene1'

wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/004/665/GCF_000004665.1_Callithrix_jacchus-3.2/GCF_000004665.1_Callithrix_jacchus-3.2_genomic.fna.gz

wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/004/665/GCF_000004665.1_Callithrix_jacchus-3.2/GCF_000004665.1_Callithrix_jacchus-3.2_genomic.gff.gz

kb ref -i marmosets_NCBI.idx -g marmoset_NCBI.tsv -f1 marmoset_NCBI_cdna.fa GCF_000004665.1_Callithrix_jacchus-3.2_genomic_annotated.fa GCF_000004665.1_Callithrix_jacchus-3.2_genomic.gtf

Thanks for looking into it.

Question about choice of flank parameter for intron fasta

I am curious about how the flank parameter (number of flanking exon nucleotides) on each side of the intron is chosen for the intron fasta file. In kb_python/fasta.py generate_intron_fasta(), 30 nt is used (related to kmer = 31 -1).

Whereas with this other source I found they use the sequencing read length -1: https://rdrr.io/github/lambdamoses/BUStoolsR/man/get_velocity_files.html (e.g. for 80 bp reads, flank = 79).

The second approach is obviously more inconvenient because you would have to rebuild the index for libraries of different read lengths instead of using the same index for different libraries. Because the max kmer length (-k) in Kallisto index is 31, does this mean that there is no benefit to pseudoalignment if you set flank > 30? It does seem like in theory you would have reads that overlap the intron only at the end of a long read, but would Kallisto detect them better
if you set the flank parameter to be longer?

0 reads pseudoaligned[~warn] no reads pseudoaligned

Hi,
I am trying to follow the RNA velocity tutorial and using kb-python, after generating the index file, the following command throws an error.
I noticed that during the index generation I got a warning that more than 2M 0 non-ACGUT characters were found and replaced.

kb count -i /home/salmon/Documents/Github/BUS_notebooks_R-master/analysis/output/hs_cDNA_introns_97.idx -g tr2g.tsv -x 10xv3 -o kb -c1 cDNA_tx_to_capture.txt -c2 introns_tx_to_capture.txt --lamanno /mnt/sda3/data/public_data/fastq/sample1_S1_L001_R1_001.fastq.gz /mnt/sda3/data/public_data/fastq/sample1_S1_L001_R2_001.fastq.gz -t 16 -m 70G
[2021-01-29 13:12:08,965] WARNING The `--lamanno` and `-`-n`ucleus` flags are deprecated. These options will be removed in a future release. Please use `--workflow lamanno` or `--workflow nucleus` instead.
[2021-01-29 13:12:08,965]    INFO Using index /home/salmon/Documents/Github/BUS_notebooks_R-master/analysis/output/hs_cDNA_introns_97.idx to generate BUS file to kb from
[2021-01-29 13:12:08,965]    INFO         /mnt/sda3/data/public_data/fastq/sample1_S1_L001_R1_001.fastq.gz
[2021-01-29 13:12:08,965]    INFO         /mnt/sda3/data/public_data/fastq/sample1_S1_L001_R2_001.fastq.gz
[2021-01-29 13:36:32,560]   ERROR 
[index] k-mer length: 31
[index] number of targets: 1,378,373
[index] number of k-mers: 1,560,141,285
[index] number of equivalence classes: 12,887,284
[quant] will process sample 1:  /mnt/sda3/data/public_data/fastq/sample1_S1_L001_R1_001.fastq.gz
/mnt/sda3/data/public_data/fastq/sample1_S1_L001_R2_001.fastq.gz
[quant] finding pseudoalignments for the reads ... done
[quant] processed 819,658,242 reads, 0 reads pseudoaligned[~warn] no reads pseudoaligned.

[2021-01-29 13:36:32,561]   ERROR An exception occurred
Traceback (most recent call last):
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/main.py", line 837, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/main.py", line 218, in parse_count
    temp_dir=temp_dir
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/count.py", line 1510, in count_velocity
    fastqs, index_paths[0], technology, out_dir, threads=threads
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/validate.py", line 112, in inner
    results = func(*args, **kwargs)
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/count.py", line 149, in kallisto_bus
    run_executable(command)
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/dry/__init__.py", line 24, in inner
    return func(*args, **kwargs)
  File "/home/salmon/.local/lib/python3.6/site-packages/kb_python/utils.py", line 233, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/salmon/.local/lib/python3.6/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i /home/salmon/Documents/Github/BUS_notebooks_R-master/analysis/output/hs_cDNA_introns_97.idx -o kb -x 10xv3 -t 16 /mnt/sda3/data/public_data/fastq/sample1_S1_L001_R1_001.fastq.gz /mnt/sda3/data/public_data/fastq/sample1_S1_L001_R2_001.fastq.gz' returned non-zero exit status 1.

Does anyone know how to solve this issue and what is causing it?

kb count returned non-zero exit status 3221225477.

When I input the following code in miniconda3 of windows10:

kb count -i "M:\My Drive\tools\kallisto\intron\index.idx" -m 100000000 --overwrite -g t2g.txt -c1 cdna_t2c.txt -c2 intron_t2c.txt -o bus_output -x 0,0,12:0,12,24:1,0,0 -t 24 --lamanno --h5ad --verbose --filter bustools BCUMI.gz fixed_R1_50.gz

It showed the error below:

c:\programdata\miniconda3\lib\site-packages\anndata_core\anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
from pandas.core.index import RangeIndex
[2020-04-15 12:18:46,979] DEBUG Printing verbose output
[2020-04-15 12:18:46,979] DEBUG Creating tmp directory
[2020-04-15 12:18:46,979] DEBUG Namespace(c1='cdna_t2c.txt', c2='intron_t2c.txt', command='count', fastqs=['BCUMI.gz', 'fixed_R1_50.gz'], filter='bustools', g='t2g.txt', h5ad=True, i='M:\My Drive\tools\kallisto\intron\index.idx', keep_tmp=False, lamanno=True, list=False, loom=False, m='100000000', nucleus=False, o='bus_output', overwrite=True, t=24, verbose=True, w=None, x='0,0,12:0,12,24:1,0,0')
[2020-04-15 12:18:46,979] INFO Generating BUS file from
[2020-04-15 12:18:46,979] INFO BCUMI.gz
[2020-04-15 12:18:46,979] INFO fixed_R1_50.gz
[2020-04-15 12:18:46,979] DEBUG c:\programdata\miniconda3\lib\site-packages\kb_python\bins\windows\kallisto\kallisto.exe bus -i M:\My Drive\tools\kallisto\intron\index.idx -o bus_output -x 0,0,12:0,12,24:1,0,0 -t 24 BCUMI.gz fixed_R1_50.gz
[2020-04-15 22:28:00,962] DEBUG
[2020-04-15 22:28:00,963] DEBUG [index] k-mer length: 31
[2020-04-15 22:28:00,963] DEBUG [index] number of targets: 2,080,292
[2020-04-15 22:28:00,964] DEBUG [index] number of k-mers: 2,670,597,649
[2020-04-15 22:28:00,964] DEBUG [index] number of equivalence classes: 18,250,913
[2020-04-15 22:28:00,965] DEBUG [quant] will process sample 1: BCUMI.gz
[2020-04-15 22:28:00,966] DEBUG fixed_R1_50.gz
[2020-04-15 22:28:00,967] DEBUG [quant] finding pseudoalignments for the reads ... done
[2020-04-15 22:28:00,968] DEBUG [quant] processed 63,142,586 reads, 51,961,068 reads pseudoaligned
[2020-04-15 22:28:02,139] INFO Sorting BUS file bus_output\output.bus to tmp\output.s.bus
[2020-04-15 22:28:02,140] DEBUG c:\programdata\miniconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe sort -o tmp\output.s.bus -T tmp -t 24 -m 100000000 bus_output\output.bus
[2020-04-15 23:18:16,381] DEBUG Read in 51961068 BUS records
[2020-04-15 23:18:16,382] INFO Whitelist not provided
[2020-04-15 23:18:16,383] INFO Generating whitelist bus_output\whitelist.txt from BUS file tmp\output.s.bus
[2020-04-15 23:18:16,384] DEBUG c:\programdata\miniconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe whitelist -o bus_output\whitelist.txt tmp\output.s.bus
[2020-04-15 23:18:22,288] DEBUG Read in 48485010 BUS records, wrote 60882 barcodes to whitelist with threshold 122
[2020-04-15 23:18:22,289] INFO Inspecting BUS file tmp\output.s.bus
[2020-04-15 23:18:22,292] DEBUG c:\programdata\miniconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe inspect -o bus_output\inspect.json -w bus_output\whitelist.txt -e bus_output\matrix.ec tmp\output.s.bus
[2020-04-16 00:38:01,889] INFO Correcting BUS records in tmp\output.s.bus to tmp\output.s.c.bus with whitelist bus_output\whitelist.txt
[2020-04-16 00:38:01,891] DEBUG c:\programdata\miniconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe correct -o tmp\output.s.c.bus -w bus_output\whitelist.txt tmp\output.s.bus
[2020-04-16 00:50:02,128] DEBUG Found 60881 barcodes in the whitelist
[2020-04-16 00:50:02,129] DEBUG Number of hamming dist 1 barcodes = 793266
[2020-04-16 00:50:02,130] DEBUG Processed 48485010 bus records
[2020-04-16 00:50:02,131] DEBUG In whitelist = 26843542
[2020-04-16 00:50:02,132] DEBUG Corrected = 5521574
[2020-04-16 00:50:02,132] DEBUG Uncorrected = 16119894
[2020-04-16 00:50:02,133] INFO Sorting BUS file tmp\output.s.c.bus to bus_output\output.unfiltered.bus
[2020-04-16 00:50:02,134] DEBUG c:\programdata\miniconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe sort -o bus_output\output.unfiltered.bus -T tmp -t 24 -m 100000000 tmp\output.s.c.bus
[2020-04-16 01:26:02,989] DEBUG Read in 32365116 BUS records
[2020-04-16 01:26:02,992] INFO Capturing records from BUS file bus_output\output.unfiltered.bus to tmp\spliced.bus with capture list cdna_t2c.txt
[2020-04-16 01:26:02,993] DEBUG c:\programdata\miniconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe capture -o tmp\spliced.bus -c cdna_t2c.txt -e bus_output\matrix.ec -t bus_output\transcripts.txt --transcripts bus_output\output.unfiltered.bus
[2020-04-16 03:03:44,511] DEBUG Parsing transcripts .. done
[2020-04-16 03:03:44,511] DEBUG Parsing ECs .. done
[2020-04-16 03:03:44,513] DEBUG Parsing capture list .. done
[2020-04-16 03:03:44,514] DEBUG Read in 32226847 BUS records, wrote 19893640 BUS records
[2020-04-16 03:03:45,004] INFO Sorting BUS file tmp\spliced.bus to bus_output\spliced.unfiltered.bus
[2020-04-16 03:03:45,005] DEBUG c:\programdata\miniconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe sort -o bus_output\spliced.unfiltered.bus -T tmp -t 24 -m 100000000 tmp\spliced.bus
[2020-04-16 03:21:37,083] DEBUG Read in 19907597 BUS records
[2020-04-16 03:21:37,084] INFO Generating count matrix bus_output\counts_unfiltered\spliced from BUS file bus_output\spliced.unfiltered.bus
[2020-04-16 03:21:37,086] DEBUG c:\programdata\miniconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe count -o bus_output\counts_unfiltered\spliced -g t2g.txt -e bus_output\matrix.ec -t bus_output\transcripts.txt --genecounts bus_output\spliced.unfiltered.bus
[2020-04-16 04:43:14,841] ERROR An exception occurred
Traceback (most recent call last):
File "c:\programdata\miniconda3\lib\site-packages\kb_python\main.py", line 483, in main
COMMAND_TO_FUNCTIONargs.command
File "c:\programdata\miniconda3\lib\site-packages\kb_python\main.py", line 135, in parse_count
nucleus=args.nucleus,
File "c:\programdata\miniconda3\lib\site-packages\kb_python\count.py", line 746, in count_velocity
bus_result['txnames'],
File "c:\programdata\miniconda3\lib\site-packages\kb_python\count.py", line 181, in bustools_count
run_executable(command)
File "c:\programdata\miniconda3\lib\site-packages\kb_python\utils.py", line 147, in run_executable
raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command 'c:\programdata\miniconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe count -o bus_output\counts_unfiltered\spliced -g t2g.txt -e bus_output\matrix.ec -t bus_output\transcripts.txt --genecounts bus_output\spliced.unfiltered.bus' returned non-zero exit status 3221225477.
[2020-04-16 04:43:14,884] DEBUG Removing tmp directory

However, the last step ran fine when it was input alone:

c:\programdata\miniconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe count -o bus_output\counts_unfiltered\spliced -g t2g.txt -e bus_output\matrix.ec -t bus_output\transcripts.txt --genecounts bus_output\spliced.unfiltered.bus

What could be the potential cause? If the problem cannot be resolved, how shall I continue the analysis with the output already there? Thanks.

Instructions for use with single-end SMART-seq data

Describe the issue
I would like to run kb count on a list of FASTQs from a SMART-seq (v1, bulk RNA-seq) experiment. The reads are single-end. The kb count instructions specify that counting only works with paired FASTQs. Is this strictly true, or is there a way to also run the command using single-end data? Also, why does the SMARTSEQ technology not work with the lamanno workflow?

What is the exact command that was run?

kb count --h5ad            -i index.idx            -g t2g.txt            -x SMARTSEQ            -o  $(dirname results/lamanno/human/kb_count.h5)            -c1 results/lamanno/human/cdna_t2c.txt            -c2 results/lamanno/human/intron_t2c.txt            --workflow lamanno            --filter bustools             results/data/SRR2144273/SRR2144273.fa.gz

Command output (with --verbose flag)

[2021-02-24 13:26:49,643]   DEBUG Printing verbose output
[2021-02-24 13:26:49,643]   DEBUG kallisto binary located at /home/warkre/miniconda3/envs/kb-python/lib/python3.8/site-packages/kb_python/bins/linux/kallisto/kallisto
[2021-02-24 13:26:49,643]   DEBUG bustools binary located at /home/warkre/miniconda3/envs/kb-python/lib/python3.8/site-packages/kb_python/bins/linux/bustools/bustools
[2021-02-24 13:26:49,643]   DEBUG Creating results/lamanno/human/tmp directory
[2021-02-24 13:26:49,644]   DEBUG Namespace(c1='results/lamanno/human/cdna_t2c.txt', c2='results/lamanno/human/intron_t2c.txt', cellranger=False, command='count', dry_run=False, fastqs=['results/data/SRR2144273/SRR2144273.fa.gz'], filter='bustools', g='t2g.txt', h5ad=True, i='index.idx', keep_tmp=False, lamanno=False, list=False, loom=False, m='4G', mm=False, no_inspect=False, no_validate=False, nucleus=False, o='results/lamanno/human', overwrite=False, report=False, t=8, tcc=False, tmp=None, verbose=True, w=None, workflow='lamanno', x='SMARTSEQ')
usage: kb [-h] [--list] <CMD> ...
kb: error: Technology `SMARTSEQ` can not be used with workflow lamanno.
[2021-02-24 13:26:49,644]   DEBUG Removing results/lamanno/human/tmp directory

Issue with specifiying a new technology: no reads pseudoaligned and empty bus file

Thank you so much for your great tool, I really like the possibility of specifying new technologies, RNA-velocity, and the upcoming feature barcoding option!

Describe the issue
I tried to specify a new technology with a library index BC in R3 from 1-8 bp, a cell BC in R2 from 1-8 bp, a cell BC in R4 from 1-8 bp, a UMI in R3 from 9-14 bp, and the biological read in R1. However, I get 0 pseudo alignments (I can map the reads successfully with another mapper), and an empty bus file (see below for file sizes in kb count output directory).

What is the exact command that was run?

refdir=/n/data1/mgh/csb/pittet/references/mouse_mm10_99_kallisto_updateFeb2020

kb count --verbose --h5ad -i ${refdir}/index.idx -g ${refdir}/t2g.txt -x 2,0,8,1,0,8,3,0,8:3,8,14:0,0,0 -o ./pool1_7 \
-c1 ${refdir}/cdna_t2c.txt -c2 ${refdir}/intron_t2c.txt --lamanno --filter bustools -t 12 -m 100000 \
./SUB07818pool1_S1_R1_001.fastq.gz \
./SUB07818pool1_S1_R2_001.fastq.gz \
./SUB07818pool1_S1_R3_001.fastq.gz \
./SUB07818pool1_S1_R4_001.fastq.gz

Command output of kb count (with --verbose flag)

[2020-02-17 20:58:51,502]   DEBUG Printing verbose output
[2020-02-17 20:58:51,502]   DEBUG Creating tmp directory
[2020-02-17 20:58:51,505]   DEBUG Namespace(c1='/n/data1/mgh/csb/pittet/references/mouse_mm10_99_kallisto_updateFeb2020/cdna_t2c.txt', c2='/n/data1/mgh/csb/pittet/references/mouse_mm10_99_kallisto_updateFeb2020/intron_t2c.txt', command='count', fastqs=['./SUB07818pool1_S1_R1_001.fastq.gz', './SUB07818pool1_S1_R2_001.fastq.gz', './SUB07818pool1_S1_R3_001.fastq.gz', './SUB07818pool1_S1_R4_001.fastq.gz'], filter='bustools', g='/n/data1/mgh/csb/pittet/references/mouse_mm10_99_kallisto_updateFeb2020/t2g.txt', h5ad=True, i='/n/data1/mgh/csb/pittet/references/mouse_mm10_99_kallisto_updateFeb2020/index.idx', keep_tmp=False, lamanno=True, list=False, loom=False, m='100000', nucleus=False, o='./pool1_7', overwrite=False, t=12, verbose=True, w=None, x='2,0,8,1,0,8,3,0,8:3,8,14:0,0,0')
[2020-02-17 20:58:51,505]    INFO Generating BUS file from
[2020-02-17 20:58:51,505]    INFO         ./SUB07818pool1_S1_R1_001.fastq.gz
[2020-02-17 20:58:51,505]    INFO         ./SUB07818pool1_S1_R2_001.fastq.gz
[2020-02-17 20:58:51,505]    INFO         ./SUB07818pool1_S1_R3_001.fastq.gz
[2020-02-17 20:58:51,505]    INFO         ./SUB07818pool1_S1_R4_001.fastq.gz
[2020-02-17 20:58:51,509]   DEBUG /home/mm884/py37-kb/lib/python3.7/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i /n/data1/mgh/csb/pittet/references/mouse_mm10_99_kallisto_updateFeb2020/index.idx -o ./pool1_7 -x 2,0,8,1,0,8,3,0,8:3,8,14:0,0,0 -t 12 ./SUB07818pool1_S1_R1_001.fastq.gz ./SUB07818pool1_S1_R2_001.fastq.gz ./SUB07818pool1_S1_R3_001.fastq.gz ./SUB07818pool1_S1_R4_001.fastq.gz
[2020-02-17 21:28:20,686]   DEBUG
[2020-02-17 21:28:20,686]   DEBUG [index] k-mer length: 31
[2020-02-17 21:28:20,686]   DEBUG [index] number of targets: 791,598
[2020-02-17 21:28:20,686]   DEBUG [index] number of k-mers: 1,112,499,218
[2020-02-17 21:28:20,686]   DEBUG [index] number of equivalence classes: 5,490,023
[2020-02-17 21:28:20,686]   DEBUG [quant] will process sample 1: ./SUB07818pool1_S1_R1_001.fastq.gz
[2020-02-17 21:28:20,687]   DEBUG ./SUB07818pool1_S1_R2_001.fastq.gz
[2020-02-17 21:28:20,687]   DEBUG ./SUB07818pool1_S1_R3_001.fastq.gz
[2020-02-17 21:28:20,687]   DEBUG ./SUB07818pool1_S1_R4_001.fastq.gz
[2020-02-17 21:28:20,687]   DEBUG [quant] finding pseudoalignments for the reads ... done
[2020-02-17 21:28:20,687]   DEBUG [quant] processed 867,926,292 reads, 0 reads pseudoaligned
[2020-02-17 21:28:20,687]   DEBUG [~warn] no reads pseudoaligned.
[2020-02-17 21:28:20,687]    INFO Sorting BUS file ./pool1_7/output.bus to tmp/output.s.bus
[2020-02-17 21:28:20,693]   DEBUG /home/mm884/py37-kb/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools sort -o tmp/output.s.bus -T tmp -t 12 -m 100000 ./pool1_7/output.bus
[2020-02-17 21:28:20,753]   DEBUG Warning: low number supplied for maximum memory, defaulting to 64Mb
[2020-02-17 21:28:20,753]   DEBUG Read in 0 BUS records
[2020-02-17 21:28:20,753]   ERROR An exception occurred
Traceback (most recent call last):
  File "/home/mm884/py37-kb/lib/python3.7/site-packages/kb_python/main.py", line 483, in main
    COMMAND_TO_FUNCTION[args.command](args)
  File "/home/mm884/py37-kb/lib/python3.7/site-packages/kb_python/main.py", line 135, in parse_count
    nucleus=args.nucleus,
  File "/home/mm884/py37-kb/lib/python3.7/site-packages/kb_python/count.py", line 689, in count_velocity
    memory=memory
  File "/home/mm884/py37-kb/lib/python3.7/site-packages/kb_python/count.py", line 97, in bustools_sort
    run_executable(command)
  File "/home/mm884/py37-kb/lib/python3.7/site-packages/kb_python/utils.py", line 147, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/mm884/py37-kb/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools sort -o tmp/output.s.bus -T tmp -t 12 -m 100000 ./pool1_7/output.bus' died with <Signals.SIGSEGV: 11>.
[2020-02-17 21:28:20,781]   DEBUG Removing tmp directory

The command output of kb ref:

[2020-02-17 01:48:55,434]    INFO Decompressing /n/data1/mgh/csb/pittet/references/mouse_mm10_99_star_updateFeb2020/Mus_musculus.GRCm38.dna_sm.primary_assembly.fa to tmp
[2020-02-17 01:48:55,435]    INFO Sorting /n/data1/mgh/csb/pittet/references/mouse_mm10_99_star_updateFeb2020/Mus_musculus.GRCm38.dna_sm.primary_assembly.fa
[2020-02-17 01:59:04,404]    INFO Decompressing /n/data1/mgh/csb/pittet/references/mouse_mm10_99_star_updateFeb2020/Mus_musculus.GRCm38.99.gtf to tmp
[2020-02-17 01:59:04,404]    INFO Sorting /n/data1/mgh/csb/pittet/references/mouse_mm10_99_star_updateFeb2020/Mus_musculus.GRCm38.99.gtf
[2020-02-17 02:00:11,059]    INFO Splitting genome into cDNA at /n/data1/mgh/csb/pittet/references/mouse_mm10_99_star_updateFeb2020/cdna.fa
[2020-02-17 02:01:53,486]    INFO Creating cDNA transcripts-to-capture at /n/data1/mgh/csb/pittet/references/mouse_mm10_99_star_updateFeb2020/cdna_t2c.txt
[2020-02-17 02:01:55,118]    INFO Splitting genome into introns at /n/data1/mgh/csb/pittet/references/mouse_mm10_99_star_updateFeb2020/intron.fa
[2020-02-17 02:07:16,038]    INFO Creating intron transcripts-to-capture at /n/data1/mgh/csb/pittet/references/mouse_mm10_99_star_updateFeb2020/cdna_t2c.txt
[2020-02-17 02:07:27,094]    INFO Concatenating cDNA and intron FASTAs
[2020-02-17 02:08:02,802]    INFO Creating transcript-to-gene mapping at /n/data1/mgh/csb/pittet/references/mouse_mm10_99_star_updateFeb2020/t2g.txt
[2020-02-17 02:08:16,075]    INFO Indexing to /n/data1/mgh/csb/pittet/references/mouse_mm10_99_star_updateFeb2020/index.idx

The file sizes in the kb count output directory:

-rw-rw-r-- 1 mm884 mm884 1.5G Feb 17 21:28 matrix.ec
-rw-rw-r-- 1 mm884 mm884 1.5K Feb 17 21:28 run_info.json
-rw-rw-r-- 1 mm884 mm884  19M Feb 17 21:28 transcripts.txt
-rw-rw-r-- 1 mm884 mm884   49 Feb 17 21:27 output.bus

And the file sizes in the kb ref output directory:

-rw-rw-r-- 1 mm884 pittet  26G Feb 17 03:54 index.idx
-rw-rw-r-- 1 mm884 pittet  41M Feb 17 02:08 t2g.txt
-rw-rw-r-- 1 mm884 pittet  16M Feb 17 02:07 intron_t2c.txt
-rw-rw-r-- 1 mm884 pittet 3.7G Feb 17 02:07 intron.fa
-rw-rw-r-- 1 mm884 pittet 2.9M Feb 17 02:01 cdna_t2c.txt
-rw-rw-r-- 1 mm884 pittet 247M Feb 17 02:01 cdna.fa

The versions that I used:
python 3.7.4, kb-python 0.24.4, kallisto 0.46.1, and bustools 0.39.3

I run the command on a hpc cluster and allocated 12 cores and 100G memory.

Let me know if you need more information such as the fasta files. Thank you so much in advance for your help!

kb count ERROR with inDrops3 data

Hello!
Thanks for developing and supporting kb_python!

Describe the issue
I'm running this workflow with inDrops3 data. This thread helped me to clarify how to use inDrops3 input. I'm running a test set of 87,347,261 reads.

  1. I installed R4.0
  2. Generated reference files (using 61bp for inDrops3 transcript read length)
library(BUSpaRse)
library(BSgenome.Mmusculus.UCSC.mm10)
library(AnnotationHub)

ah <- AnnotationHub()
query(ah, pattern = c("Ensembl", "97", "Mus musculus", "EnsDb"))
# Get mouse Ensembl 97 annotation
edb <- ah[["AH73905"]]
# note L = read lengths for transcript read, L=61 for indrops3/61
get_velocity_files(edb, L=61, Genome=BSgenome.Mmusculus.UCSC.mm10, out_path = "./veloindex", isoform_action = "separate")
  1. Indexed the reference with kallisto:
    kallisto index -i mm_cDNA_introns_97.idx cDNA_introns.fa

the next step kb count fails

What is the exact command that was run?

kreference_prefix=/path/veloindex_indrops3

# 100G RAM max
kb count \
-i ${kreference_prefix}/mm_cDNA_introns_97.idx \
-g ${kreference_prefix}/neuron10k_velocity/tr2g.tsv \
-x INDROPSV3 \
-o kallisto_bus_output \
-c1 ${kreference_prefix}/cDNA_tx_to_capture.txt \
-c2 ${kreference_prefix}/introns_tx_to_capture.txt \
--lamanno \
--verbose \
-t 8 \
${1}_2.fq.gz ${1}_4.fq.gz ${1}_1.fq.gz

fastq input:

  • 2.fq.gz - 8bp - first part of CELL id
  • 4.fq.gz - 14bp = 8 bp - second part of CELL id + 6 bp of transcript UMI
  • 1.fq.gz - 61bp transcript read

Command output (with --verbose flag)

/home/sn240/.conda/envs/r/lib/python3.7/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
[2020-03-26 21:41:36,087]   DEBUG Printing verbose output
[2020-03-26 21:41:36,087]   DEBUG Creating tmp directory
[2020-03-26 21:41:36,103]   DEBUG Namespace(c1='/n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/cDNA_tx_to_capture.txt', c2='/n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/introns_tx_to_capture.txt', command='count', fastqs=['test_2.fq.gz', 'test_4.fq.gz', 'test_1.fq.gz'], filter=None, g='/n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/neuron10k_velocity/tr2g.tsv', h5ad=False, i='/n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/mm_cDNA_introns_97.idx', keep_tmp=False, lamanno=True, list=False, loom=False, m='4G', nucleus=False, o='kallisto_bus_output', overwrite=False, t=8, verbose=True, w=None, x='INDROPSV3')
[2020-03-26 21:41:36,104]    INFO Generating BUS file from
[2020-03-26 21:41:36,104]    INFO         test_2.fq.gz
[2020-03-26 21:41:36,104]    INFO         test_4.fq.gz
[2020-03-26 21:41:36,104]    INFO         test_1.fq.gz
[2020-03-26 21:41:36,106]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i /n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/mm_cDNA_introns_97.idx -o kallisto_bus_output -x INDROPSV3 -t 8 test_2.fq.gz test_4.fq.gz test_1.fq.gz
[2020-03-26 21:51:04,483]   DEBUG 
[2020-03-26 21:51:04,483]   DEBUG [index] k-mer length: 31
[2020-03-26 21:51:04,483]   DEBUG [index] number of targets: 838,802
[2020-03-26 21:51:04,483]   DEBUG [index] number of k-mers: 1,112,521,288
[2020-03-26 21:51:04,484]   DEBUG [index] number of equivalence classes: 5,715,566
[2020-03-26 21:51:04,484]   DEBUG [quant] will process sample 1: test_2.fq.gz
[2020-03-26 21:51:04,484]   DEBUG test_4.fq.gz
[2020-03-26 21:51:04,484]   DEBUG test_1.fq.gz
[2020-03-26 21:51:04,484]   DEBUG [quant] finding pseudoalignments for the reads ... done
[2020-03-26 21:51:04,484]   DEBUG [quant] processed 131,020,444 reads, 69,153,075 reads pseudoaligned
[2020-03-26 21:51:04,484]    INFO Sorting BUS file kallisto_bus_output/output.bus to tmp/output.s.bus
[2020-03-26 21:51:04,488]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools sort -o tmp/output.s.bus -T tmp -t 8 -m 4G kallisto_bus_output/output.bus
[2020-03-26 21:51:29,622]   DEBUG Read in 69153075 BUS records
[2020-03-26 21:51:29,623]    INFO Whitelist not provided
[2020-03-26 21:51:29,623]    INFO Copying pre-packaged INDROPSV3 whitelist to kallisto_bus_output
[2020-03-26 21:51:29,722]    INFO Inspecting BUS file tmp/output.s.bus
[2020-03-26 21:51:29,723]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools inspect -o kallisto_bus_output/inspect.json -w kallisto_bus_output/inDropsv3_whitelist.txt -e kallisto_bus_output/matrix.ec tmp/output.s.bus
[2020-03-26 21:52:05,442]    INFO Correcting BUS records in tmp/output.s.bus to tmp/output.s.c.bus with whitelist kallisto_bus_output/inDropsv3_whitelist.txt
[2020-03-26 21:52:05,445]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools correct -o tmp/output.s.c.bus -w kallisto_bus_output/inDropsv3_whitelist.txt tmp/output.s.bus
[2020-03-26 21:52:13,340]   DEBUG Found 147456 barcodes in the whitelist
[2020-03-26 21:52:13,341]   DEBUG Number of hamming dist 1 barcodes = 6872832
[2020-03-26 21:52:13,341]   DEBUG Processed 25625292 bus records
[2020-03-26 21:52:13,341]   DEBUG In whitelist = 21308242
[2020-03-26 21:52:13,341]   DEBUG Corrected = 1643761
[2020-03-26 21:52:13,341]   DEBUG Uncorrected = 2673289
[2020-03-26 21:52:13,341]    INFO Sorting BUS file tmp/output.s.c.bus to kallisto_bus_output/output.unfiltered.bus
[2020-03-26 21:52:13,342]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools sort -o kallisto_bus_output/output.unfiltered.bus -T tmp -t 8 -m 4G tmp/output.s.c.bus
[2020-03-26 21:52:26,062]   DEBUG Read in 22952003 BUS records
[2020-03-26 21:52:26,073]    INFO Capturing records from BUS file kallisto_bus_output/output.unfiltered.bus to tmp/spliced.bus with capture list /n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/cDNA_tx_to_capture.txt
[2020-03-26 21:52:26,074]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools capture -o tmp/spliced.bus -c /n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/cDNA_tx_to_capture.txt -e kallisto_bus_output/matrix.ec -t kallisto_bus_output/transcripts.txt --transcripts kallisto_bus_output/output.unfiltered.bus
[2020-03-26 21:53:10,562]   DEBUG Parsing transcripts .. done
[2020-03-26 21:53:10,563]   DEBUG Parsing ECs .. done
[2020-03-26 21:53:10,563]   DEBUG Parsing capture list .. done
[2020-03-26 21:53:10,563]   DEBUG Read in 22054602 BUS records, wrote 15358258 BUS records
[2020-03-26 21:53:10,563]    INFO Sorting BUS file tmp/spliced.bus to kallisto_bus_output/spliced.unfiltered.bus
[2020-03-26 21:53:10,566]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools sort -o kallisto_bus_output/spliced.unfiltered.bus -T tmp -t 8 -m 4G tmp/spliced.bus
[2020-03-26 21:53:19,313]   DEBUG Read in 15358258 BUS records
[2020-03-26 21:53:19,313]    INFO Generating count matrix kallisto_bus_output/counts_unfiltered/spliced from BUS file kallisto_bus_output/spliced.unfiltered.bus
[2020-03-26 21:53:19,313]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools count -o kallisto_bus_output/counts_unfiltered/spliced -g /n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/neuron10k_velocity/tr2g.tsv -e kallisto_bus_output/matrix.ec -t kallisto_bus_output/transcripts.txt --genecounts kallisto_bus_output/spliced.unfiltered.bus
[2020-03-26 21:53:19,340]   DEBUG Usage: bustools count [options] sorted-bus-files
[2020-03-26 21:53:19,340]   DEBUG 
[2020-03-26 21:53:19,340]   DEBUG Options:
[2020-03-26 21:53:19,340]   DEBUG -o, --output          File for corrected bus output
[2020-03-26 21:53:19,340]   DEBUG -g, --genemap         File for mapping transcripts to genes
[2020-03-26 21:53:19,340]   DEBUG -e, --ecmap           File for mapping equivalence classes to transcripts
[2020-03-26 21:53:19,341]   DEBUG -t, --txnames         File with names of transcripts
[2020-03-26 21:53:19,341]   DEBUG --genecounts          Aggregate counts to genes only
[2020-03-26 21:53:19,341]   DEBUG -m, --multimapping    Include bus records that pseudoalign to multiple genes
[2020-03-26 21:53:19,341]   DEBUG 
[2020-03-26 21:53:19,341]   DEBUG Error: File not found /n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/neuron10k_velocity/tr2g.tsv
[2020-03-26 21:53:19,348]   ERROR An exception occurred
Traceback (most recent call last):
  File "/home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/main.py", line 483, in main
    COMMAND_TO_FUNCTION[args.command](args)
  File "/home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/main.py", line 135, in parse_count
    nucleus=args.nucleus,
  File "/home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/count.py", line 746, in count_velocity
    bus_result['txnames'],
  File "/home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/count.py", line 181, in bustools_count
    run_executable(command)
  File "/home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/utils.py", line 147, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools count -o kallisto_bus_output/counts_unfiltered/spliced -g /n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/neuron10k_velocity/tr2g.tsv -e kallisto_bus_output/matrix.ec -t kallisto_bus_output/transcripts.txt --genecounts kallisto_bus_output/spliced.unfiltered.bus' returned non-zero exit status 1.
[2020-03-26 21:53:19,353]   DEBUG Removing tmp directory

Versions used:

  • kallisto 0.46.0
  • bustools 0.39.4
  • kb_python 0.24.4

I'd appreciate any help to push this analysis forward!

Sergey

ERROR kallisto: unrecognized option `--kmer

When I input the following code in terminal in mac:

kb count /Users/khush/index.idx_cdna,/Users/khush/index.idx_intron.0,/Users/khus/index.idx_intron.1,/Users/khush/index.idx_intron.2 -g /Users/khush/t2g.txt -x 10xv3 --workflow lamanno --loom -c1 /Users/khush/cdna_t2c.txt -c2 /Users/khush/intron_t2c.txt /Users/khush/Downloads/R1.fastq.gz /Users/khush/Downloads/R2.fastq.gz /Users/khush/Downloads/I1.fastq.gz

It showed the error below:

ERROR kallisto 0.46.2
Generates BUS files for single-cell sequencing

Usage: kallisto bus [arguments] FASTQ-files

Required arguments:
-i, --index=STRING Filename for the kallisto index to be used for
pseudoalignment
-o, --output-dir=STRING Directory to write output to
-x, --technology=STRING Single-cell technology used

Optional arguments:
-l, --list List all single-cell technologies supported
-t, --threads=INT Number of threads to use (default: 1)
-b, --bam Input file is a BAM file
-n, --num Output number of read in flag column (incompatible with --bam)
--verbose Print out progress information every 1M proccessed reads
kallisto: unrecognized option `--kmer'

Error: Number of files (3) does not match number of input files required by technology 10XV3 (2)
[2021-02-10 10:47:49,967] ERROR An exception occurred
Traceback (most recent call last):
File "/Users/khush/miniconda/lib/python3.8/site-packages/kb_python/main.py", line 846, in main
COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
File "/Users/khush/miniconda/lib/python3.8/site-packages/kb_python/main.py", line 206, in parse_count
count_velocity(
File "/Users/khush/miniconda/lib/python3.8/site-packages/kb_python/count.py", line 1497, in count_velocity
bus_result = kallisto_bus_split(
File "/Users/khush/miniconda/lib/python3.8/site-packages/kb_python/count.py", line 193, in kallisto_bus_split
kallisto_bus(
File "/Users/khush/miniconda/lib/python3.8/site-packages/kb_python/validate.py", line 112, in inner
results = func(*args, **kwargs)
File "/Users/khush/miniconda/lib/python3.8/site-packages/kb_python/count.py", line 149, in kallisto_bus
run_executable(command)
File "/Users/khush/miniconda/lib/python3.8/site-packages/kb_python/dry/init.py", line 24, in inner
return func(*args, **kwargs)
File "/Users/khush/miniconda/lib/python3.8/site-packages/kb_python/utils.py", line 233, in run_executable
raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/Users/khushminiconda/lib/python3.8/site-packages/kb_python/bins/darwin/kallisto/kallisto bus -i /Users/khush/index.idx_cdna -o ./tmp/bus_part0 -x 10xv3 -t 8 --num --kmer /Users/khush/Downloads/R1.fastq.gz /Users/khush/Downloads/R2.fastq.gz /Users/khush/Downloads/I1.fastq.gz' returned non-zero exit status 1.

SureCell data crashing at bustools count

Describe the issue
Hello,

I am working with SureCell single-cell data and I am trying to use kb_python to analyze it.

I work on a SLURM system
and I have created an environment specific for Python3
My script looks like this:

#!/bin/bash
#SBATCH --job-name=kb
#SBATCH -A SEYEDAM_LAB
#SBATCH --output=kb.out
#SBATCH --error=kb.err
#SBATCH --time=3-00:00:00
#SBATCH --partition=free
#SBATCH --ntasks=20

source conda.sh
kb count -i /share/crsp/lab/seyedam/gbalderr/scBrain/kallisto/gencodeM21_pc_long.idx -g kb_trial.txt -x SURECELL --h5ad -t 20 TLR21_C029_S1_R1.fastq.gz TLR21_C029_S1_R2.fastq.gz

It looks like is running ok, but at the end of my log files I see the error:

data/homezvol0/gbalderr/anaconda3/lib/python3.7/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level names
pace.
from pandas.core.index import RangeIndex
[2020-08-06 22:53:04,576] INFO Generating BUS file from
[2020-08-06 22:53:04,577] INFO TLR21_C029_S1_R1.fastq.gz
[2020-08-06 22:53:04,577] INFO TLR21_C029_S1_R2.fastq.gz
[2020-08-06 22:56:01,242] INFO Sorting BUS file ./output.bus to tmp/output.s.bus
[2020-08-06 22:56:13,889] INFO Whitelist not provided
[2020-08-06 22:56:13,889] INFO Generating whitelist ./whitelist.txt from BUS file tmp/output.s.bus
[2020-08-06 22:56:13,985] INFO Inspecting BUS file tmp/output.s.bus
[2020-08-06 22:56:14,653] INFO Correcting BUS records in tmp/output.s.bus to tmp/output.s.c.bus with whitelist ./whitelist.txt
[2020-08-06 22:56:15,211] INFO Sorting BUS file tmp/output.s.c.bus to ./output.unfiltered.bus
[2020-08-06 22:56:17,545] INFO Generating count matrix ./counts_unfiltered/cells_x_genes from BUS file ./output.unfiltered.bus
[2020-08-06 22:56:17,605] ERROR An exception occurred
Traceback (most recent call last):
File "/data/homezvol0/gbalderr/anaconda3/lib/python3.7/site-packages/kb_python/main.py", line 476, in main
COMMAND_TO_FUNCTIONargs.command
File "/data/homezvol0/gbalderr/anaconda3/lib/python3.7/site-packages/kb_python/main.py", line 148, in parse_count
h5ad=args.h5ad,
File "/data/homezvol0/gbalderr/anaconda3/lib/python3.7/site-packages/kb_python/count.py", line 453, in count
bus_result['txnames'],
File "/data/homezvol0/gbalderr/anaconda3/lib/python3.7/site-packages/kb_python/count.py", line 178, in bustools_count
run_executable(command)
File "/data/homezvol0/gbalderr/anaconda3/lib/python3.7/site-packages/kb_python/utils.py", line 114, in run_executable
raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/data/homezvol0/gbalderr/anaconda3/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools count -o ./counts_unfiltered/cells_x_genes -g kb_trial.txt -e ./matrix.ec -t ./transcripts.txt
--genecounts ./output.unfiltered.bus' returned non-zero exit status 1.

I am not sure if I am missing something very obvious, but any clues on something I am doing wrong are welcome.

Thank you for your help!

Gaby B.

unrecognized "workflow" argument

Describe the issue
Kb does not appear to generate outputs for RNA velocity. Error indicates an unrecognized argument "workflow", but function still does not return velocity outputs (ie: cDNA_tx_to_capture.txt, introns_tx_to_capture.txt) even when the deprecated argument "lamanno" is used

What is the exact command that was run?

kb ref -i transcriptome.idx -g transcripts_to_genes.txt -d human --workflow lamanno -f2 cdna.fa -c1 cDNA_tx_to_capture.txt -c2 introns_tx_to_capture.txt

Command output (with --verbose flag)

kb: error: unrecognized arguments: --workflow

What is the expected time to run kb ref command for mm10 genome?

Describe the issue
Running kb ref command for mm10 genome takes 2 hours. Can I control threads number?

What is the exact command that was run?
From the log of kb ref

[2020-09-22 22:06:28,393]   DEBUG Namespace(c1='mm10_tx_to_capture.tsv', c2='mm10_intron_tx_to_capture.tsv', command='ref', d=None, f1='mm10_tx.fasta', f2='mm10_intron.fasta', fasta='/var/lib/cwl/stg7037066a-f5cb-4746-a6ca-f4998d573e56/mm10.fa', feature=None, g='mm10_tx_to_gene.tsv', gtf='/var/lib/cwl/stg3e05b746-afe0-454f-be5b-c7e9706be869/refgene.gtf', i='mm10_index.idx', keep_tmp=False, lamanno=False, list=False, overwrite=False, verbose=True, workflow='lamanno')

Command output (with --verbose flag)

[2020-09-22 22:29:07,650]    INFO Indexing to mm10_index.idx
[2020-09-22 22:29:07,738]   DEBUG kallisto index -i mm10_index.idx -k 31 tmp/combined.fa
[2020-09-23 00:24:27,268]   DEBUG 
[2020-09-23 00:24:27,275]   DEBUG [build] loading fasta file tmp/combined.fa
[2020-09-23 00:24:27,275]   DEBUG [build] k-mer length: 31
[2020-09-23 00:24:27,275]   DEBUG [build] warning: clipped off poly-A tail (longer than 10)
[2020-09-23 00:24:27,275]   DEBUG from 8 target sequences
[2020-09-23 00:24:27,275]   DEBUG [build] warning: replaced 2755231 non-ACGUT characters in the input sequence
[2020-09-23 00:24:27,276]   DEBUG with pseudorandom nucleotides
[2020-09-23 00:24:27,276]   DEBUG [build] counting k-mers ... done.
[2020-09-23 00:24:27,276]   DEBUG [build] building target de Bruijn graph ...  done
[2020-09-23 00:24:27,276]   DEBUG [build] creating equivalence classes ...  done
[2020-09-23 00:24:27,276]   DEBUG [build] target de Bruijn graph has 7550573 contigs and contains 954314524 k-mers
[2020-09-23 00:24:27,276]   DEBUG 
[2020-09-23 00:24:27,441]   DEBUG Removing tmp directory

Smart-seq vars in 0.25

Describe the issue
Trying to align paired Smart-seq data with the following command (or similar with alternate Human references), I get variable var types in my loom file but never the standard "gid" as with 10X data alignment to facilitate easy conversion back to the gene symbol with t2g.txt or other means.

EDIT: aligning a human 10X dataset (changing -x 10xv3) with 0.25 & same reference works fine and gives appropriate vars:
HuGBMv3_10X2.var
Out[3]:
Empty DataFrame
Columns: []
Index: [ENSG00000277400.1, ENSG00000274847.1, ENSG00000276256.1, ENSG00000278198.1

HuGBMv3_10X2
Out[4]:
AnnData object with n_obs × n_vars = 98545 × 58367

Using this reference or other (all human) references from your repositories or other with smart-seq fastqs, I get:
adata
Out[56]: AnnData object with n_obs × n_vars = 384 × 227368

adata.var
Out[55]:
Empty DataFrame
Columns: []
Index: [ENST00000003583.12, ENST00000003912.7, ENST00000008440.9,....

adata
Out[50]:
AnnData object with n_obs × n_vars = 384 × 188753

adata.var
Out[33]:
Empty DataFrame
Columns: []
Index: [ENST00000631435.1, ENST00000434970.2, ENST00000448914.1, ENST00000415118.1, ENST00000632684.1,...

adata
Out[27]: AnnData object with n_obs × n_vars = 384 × 845338

adata.var
Out[28]:
Empty DataFrame
Columns: []
Index: [ENSG00000277400.1.A14056, ENSG00000277400.1.A32841, ENSG00000277400.1.A35311,
What is the exact command that was run?
kb count -i /mnt/WD2TBunderside/WorkingHumanRefKBtools081520/Patcher096release/index.idx -g /mnt/WD2TBunderside/WorkingHumanRefKBtools081520/Patcher096release/t2g.txt -x SMARTSEQ -o BT1030e BT1030*.gz --loom --verbose -t 56 -m 196G

or are these outputs expected?

Unable to read .loom file in velocyto

Hi,

I managed to run "kb count --loom" on my 10X fastqs and got 2 loom files out of it (one in the counts_filtered folder and one in the counts_unfiltered one). When trying to read them in R with the read.loom.matrices function from the velocyto.R package, I get the following error:

reading loom file via hdf5r...
Error in [[.H5File(f, "col_attrs/CellID") :
An object with name col_attrs/CellID does not exist in this group

Could you please help me solve this?

Kind regards,
Marc

--tcc is missing

Describe the issue

Running into kb_python (v0.24.4) without --tcc options available.

What is the exact command that was run?

conda activate myenv
pip install kb_python

kb

usage: kb [-h] [--list] <CMD> ...

kb_python 0.24.4

# usage: kb [-h] [--list] <CMD> ...
# 
# kb_python 0.24.4
# 
# positional arguments:
#   <CMD>
#     info      Display package and citation information
#     ref       Build a kallisto index and transcript-to-gene mapping
#     count     Generate count matrices from a set of single-cell FASTQ files
# 
# optional arguments:
#   -h, --help  Show this help message and exit
#   --list      Display list of supported single-cell technologies

kb count -i kb.idx -g kb.t2g.txt -x 10xv2 --h5ad --tcc R1.fastq.gz R2.fastq.gz

Command output (with --verbose flag)

# usage: kb [-h] [--list] <CMD> ...
# kb: error: unrecognized arguments: --tcc

kb-python of 10X v1 data dies with Signals.SIGSEGV:11

Describe the issue
A clear and concise description of what the issue is.

I'm trying to run kb-python in 10X v1 data. I've previously runner kb on drop-seq and 10X v2 data. However, with 2 different 10X v1 experiments downloaded with fastq-dump --split-files , running kb leads me to an error:

What is the exact command that was run?

kb count --verbose -i index.idx -g transcripts_to_genes.txt -x 10XV1 -o Sample2_1 -c1 cdna_transcripts_to_capture.txt -c2 intron_transcripts_to_capture.txt --lamanno --loom SRR8571786_1.fastq SRR8571786_2.fastq SRR8571786_3.fastq

Command output (with --verbose flag)


/usr/local/lib/python3.7/dist-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
[2020-05-03 19:45:20,534]   DEBUG Printing verbose output
[2020-05-03 19:45:20,534]   DEBUG Creating tmp directory
[2020-05-03 19:45:20,534]   DEBUG Namespace(c1='cdna_transcripts_to_capture.txt', c2='intron_transcripts_to_capture.txt', command='count', fastqs=['SRR8571786_1.fastq', 'SRR8571786_2.fastq', 'SRR8571786_3.fastq'], filter=None, g='transcripts_to_genes.txt', h5ad=False, i='index.idx', keep_tmp=False, lamanno=True, list=False, loom=True, m='4G', nucleus=False, o='Sample2_1', overwrite=False, t=8, verbose=True, w=None, x='10XV1')
[2020-05-03 19:45:20,534]    INFO Generating BUS file from
[2020-05-03 19:45:20,534]    INFO         SRR8571786_1.fastq
[2020-05-03 19:45:20,534]    INFO         SRR8571786_2.fastq
[2020-05-03 19:45:20,534]    INFO         SRR8571786_3.fastq
[2020-05-03 19:45:20,534]   DEBUG /usr/local/lib/python3.7/dist-packages/kb_python/bins/linux/kallisto/kallisto bus -i index.idx -o Sample2_1 -x 10XV1 -t 8 SRR8571786_1.fastq SRR8571786_2.fastq SRR8571786_3.fastq

[2020-05-03 19:48:31,342]   DEBUG 
[2020-05-03 19:48:31,343]   DEBUG [index] k-mer length: 31
[2020-05-03 19:48:31,343]   DEBUG [index] number of targets: 790,418
[2020-05-03 19:48:31,343]   DEBUG [index] number of k-mers: 1,112,132,904
[2020-05-03 19:48:31,343]   DEBUG [index] number of equivalence classes: 5,486,663
[2020-05-03 19:48:31,343]   DEBUG [quant] will process sample 1: SRR8571786_1.fastq
[2020-05-03 19:48:31,343]   DEBUG SRR8571786_2.fastq
[2020-05-03 19:48:31,343]   DEBUG SRR8571786_3.fastq
[2020-05-03 19:48:31,343]   DEBUG [quant] finding pseudoalignments for the reads ...
[2020-05-03 19:48:31,343]   ERROR An exception occurred
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/kb_python/main.py", line 483, in main
    COMMAND_TO_FUNCTION[args.command](args)
  File "/usr/local/lib/python3.7/dist-packages/kb_python/main.py", line 135, in parse_count
    nucleus=args.nucleus,
  File "/usr/local/lib/python3.7/dist-packages/kb_python/count.py", line 676, in count_velocity
    fastqs, index_path, technology, out_dir, threads=threads
  File "/usr/local/lib/python3.7/dist-packages/kb_python/count.py", line 65, in kallisto_bus
    run_executable(command)
  File "/usr/local/lib/python3.7/dist-packages/kb_python/utils.py", line 147, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/usr/local/lib/python3.7/dist-packages/kb_python/bins/linux/kallisto/kallisto bus -i index.idx -o Sample2_1 -x 10XV1 -t 8 SRR8571786_1.fastq SRR8571786_2.fastq SRR8571786_3.fastq' died with <Signals.SIGSEGV: 11>.

I've redownloaded data and still have the same output. The output.bus file is empty.

Any clues?

Installing kb-python without pip

Thank you very much for this great tool!

It works nicely, but also would like to use it on a machine, where I cant use pip for several reasons. Is there any plan to include kb-python in anaconda?
Or, what would be the best way to compile the package from source?

Issue with Combining matrices

Hi,

I am trying to generate loom files to use downstream with velocito. The count command ran well and generated the spliced and unspliced matrices but seem to have failed at the point of combining them. I checked and I can import both anndata and loopy libraries in python. So I am not sure what the issue is.

Thank you very much for any insight in this.

The exact command I ran was:

kb count -i Mus_musculus_index.idx -g t2g.txt -x 10xv2 -o Scell1 -- workflow lamanno --loom -c1 cdna_t2c.txt -c2 intron_t2c.txt Scell1_GEX_S1_L001_R1_001.fastq.gz Scell1_GEX_S1_L001_R2_001.fastq.gz Scell1_GEX_S1_L002_R1_001.fastq.gz Scell1_GEX_S1_L002_R2_001.fastq.gz Scell1_GEX_S2_L001_R1_001.fastq.gz Scell1_GEX_S2_L001_R2_001.fastq.gz Scell1_GEX_S2_L002_R1_001.fastq.gz Scell1_GEX_S2_L002_R2_001.fastq.gz Scell1_GEX_S3_L001_R1_001.fastq.gz Scell1_GEX_S3_L001_R2_001.fastq.gz Scell1_GEX_S3_L002_R1_001.fastq.gz Scell1_GEX_S3_L002_R2_001.fastq.gz Scell1_GEX_S4_L001_R1_001.fastq.gz Scell1_GEX_S4_L001_R2_001.fastq.gz Scell1_GEX_S4_L002_R1_001.fastq.gz Scell1_GEX_S4_L002_R2_001.fastq.gz

Command output :

[2020-09-08 21:20:37,450]    INFO Using index Mus_musculus_index.idx to generate BUS file to Scell1 from
[2020-09-08 21:20:37,450]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S1_L001_R1_001.fastq.gz
[2020-09-08 21:20:37,450]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S1_L001_R2_001.fastq.gz
[2020-09-08 21:20:37,450]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S1_L002_R1_001.fastq.gz
[2020-09-08 21:20:37,450]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S1_L002_R2_001.fastq.gz
[2020-09-08 21:20:37,450]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S2_L001_R1_001.fastq.gz
[2020-09-08 21:20:37,450]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S2_L001_R2_001.fastq.gz
[2020-09-08 21:20:37,450]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S2_L002_R1_001.fastq.gz
[2020-09-08 21:20:37,450]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S2_L002_R2_001.fastq.gz
[2020-09-08 21:20:37,450]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S3_L001_R1_001.fastq.gz
[2020-09-08 21:20:37,450]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S3_L001_R2_001.fastq.gz
[2020-09-08 21:20:37,453]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S3_L002_R1_001.fastq.gz
[2020-09-08 21:20:37,453]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S3_L002_R2_001.fastq.gz
[2020-09-08 21:20:37,454]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S4_L001_R1_001.fastq.gz
[2020-09-08 21:20:37,454]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S4_L001_R2_001.fastq.gz
[2020-09-08 21:20:37,454]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S4_L002_R1_001.fastq.gz
[2020-09-08 21:20:37,454]    INFO         /Volumes/wd_elements/Sequencing_data/Singlecell_GEX_2019/Scell1_GEX_S4_L002_R2_001.fastq.gz
[2020-09-08 21:37:05,017]    INFO Sorting BUS file Scell1/output.bus to Scell1/tmp/output.s.bus
[2020-09-08 21:38:26,400]    INFO Whitelist not provided
[2020-09-08 21:38:26,401]    INFO Copying pre-packaged 10XV2 whitelist to Scell1
[2020-09-08 21:38:26,496]    INFO Inspecting BUS file Scell1/tmp/output.s.bus
[2020-09-08 21:39:25,680]    INFO Correcting BUS records in Scell1/tmp/output.s.bus to Scell1/tmp/output.s.c.bus with whitelist Scell1/10xv2_whitelist.txt
[2020-09-08 21:39:46,286]    INFO Sorting BUS file Scell1/tmp/output.s.c.bus to Scell1/output.unfiltered.bus
[2020-09-08 21:40:12,302]    INFO Capturing records from BUS file Scell1/output.unfiltered.bus to Scell1/tmp/spliced.bus with capture list intron_t2c.txt
[2020-09-08 21:41:22,452]    INFO Sorting BUS file Scell1/tmp/spliced.bus to Scell1/spliced.unfiltered.bus
[2020-09-08 21:41:35,562]    INFO Inspecting BUS file Scell1/spliced.unfiltered.bus
[2020-09-08 21:42:24,438]    INFO Generating count matrix Scell1/counts_unfiltered/spliced from BUS file Scell1/spliced.unfiltered.bus
[2020-09-08 21:43:49,732]    INFO Capturing records from BUS file Scell1/output.unfiltered.bus to Scell1/tmp/unspliced.bus with capture list cdna_t2c.txt
[2020-09-08 21:44:52,796]    INFO Sorting BUS file Scell1/tmp/unspliced.bus to Scell1/unspliced.unfiltered.bus
[2020-09-08 21:44:59,056]    INFO Inspecting BUS file Scell1/unspliced.unfiltered.bus
[2020-09-08 21:45:45,018]    INFO Generating count matrix Scell1/counts_unfiltered/unspliced from BUS file Scell1/unspliced.unfiltered.bus
[2020-09-08 21:47:01,560]    INFO Reading matrix Scell1/counts_unfiltered/spliced.mtx
[2020-09-08 21:47:16,627]    INFO Reading matrix Scell1/counts_unfiltered/unspliced.mtx
[2020-09-08 21:47:27,289]    INFO Combining matrices
/Users/jln_home/.pyenv/versions/miniconda3-latest/envs/velocytoenv/lib/python3.8/site-packages/anndata/_core/anndata.py:1094: FutureWarning: is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead
  if not is_categorical(df_full[k]):
[2020-09-08 21:47:27,687]    INFO Writing matrices to loom Scell1/counts_unfiltered/adata.loom
An exception occurred
Traceback (most recent call last):
  File "/Users/jln_home/.pyenv/versions/miniconda3-latest/envs/velocytoenv/lib/python3.8/site-packages/kb_python/main.py", line 727, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/Users/jln_home/.pyenv/versions/miniconda3-latest/envs/velocytoenv/lib/python3.8/site-packages/kb_python/main.py", line 179, in parse_count
    count_velocity(
  File "/Users/jln_home/.pyenv/versions/miniconda3-latest/envs/velocytoenv/lib/python3.8/site-packages/kb_python/count.py", line 1341, in count_velocity
    convert_matrices(
  File "/Users/jln_home/.pyenv/versions/miniconda3-latest/envs/velocytoenv/lib/python3.8/site-packages/kb_python/count.py", line 608, in convert_matrices
    adata.write_loom(loom_path)
  File "/Users/jln_home/.pyenv/versions/miniconda3-latest/envs/velocytoenv/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1891, in write_loom
    write_loom(filename, self, write_obsm_varm=write_obsm_varm)
  File "/Users/jln_home/.pyenv/versions/miniconda3-latest/envs/velocytoenv/lib/python3.8/site-packages/anndata/_io/write.py", line 89, in write_loom
    raise ValueError("loompy does not accept empty matrices as data")
ValueError: loompy does not accept empty matrices as data
[2020-09-08 21:47:27,696]   ERROR An exception occurred
Traceback (most recent call last):
  File "/Users/jln_home/.pyenv/versions/miniconda3-latest/envs/velocytoenv/lib/python3.8/site-packages/kb_python/main.py", line 727, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/Users/jln_home/.pyenv/versions/miniconda3-latest/envs/velocytoenv/lib/python3.8/site-packages/kb_python/main.py", line 179, in parse_count
    count_velocity(
  File "/Users/jln_home/.pyenv/versions/miniconda3-latest/envs/velocytoenv/lib/python3.8/site-packages/kb_python/count.py", line 1341, in count_velocity
    convert_matrices(
  File "/Users/jln_home/.pyenv/versions/miniconda3-latest/envs/velocytoenv/lib/python3.8/site-packages/kb_python/count.py", line 608, in convert_matrices
    adata.write_loom(loom_path)
  File "/Users/jln_home/.pyenv/versions/miniconda3-latest/envs/velocytoenv/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1891, in write_loom
    write_loom(filename, self, write_obsm_varm=write_obsm_varm)
  File "/Users/jln_home/.pyenv/versions/miniconda3-latest/envs/velocytoenv/lib/python3.8/site-packages/anndata/_io/write.py", line 89, in write_loom
    raise ValueError("loompy does not accept empty matrices as data")
ValueError: loompy does not accept empty matrices as data

kb count Being SIGKILL'd at bustools sort Step

Describe the issue
Hi I'm running kb count in a Docker image being run on Cromwell and Google Cloud Platform and the command (below) is SIGKILL'd at the bustools sort step. It seems to me that it is happening when it is building the command for bustools sort.
Docker image
WDL script

What is the exact command that was run?
For context, here are my disks (df -h)

Filesystem                         Size  Used Avail Use% Mounted on
overlay                             96G  2.3G   93G   3% /
tmpfs                               64M     0   64M   0% /dev
tmpfs                              126G     0  126G   0% /sys/fs/cgroup
shm                                 64M     0   64M   0% /dev/shm
/dev/sda1                           96G  2.3G   93G   3% /google
/dev/disk/by-id/google-local-disk  251G   26G  226G  11% /cromwell_root
tmpfs                              126G     0  126G   0% /proc/acpi
tmpfs                              126G     0  126G   0% /proc/scsi
tmpfs                              126G     0  126G   0% /sys/firmware

And here is the command:

#pwd is /cromwell_root
set -e
export TMPDIR=/tmp

kb count --verbose \
	-i /cromwell_root/shalek-lab-cromwell/kallisto-bustools/jobs/testkb4/ref/index.idx \
	-g /cromwell_root/shalek-lab-cromwell/kallisto-bustools/jobs/testkb4/ref/transcripts_to_genes.txt \
	-x DROPSEQ \
	-o count_CGP \
	 \
	--lamanno \
	-c1 /cromwell_root/shalek-lab-cromwell/kallisto-bustools/jobs/testkb4/ref/cDNA_transcripts_to_capture.txt \
	-c2 /cromwell_root/shalek-lab-cromwell/kallisto-bustools/jobs/testkb4/ref/intron_transcripts_to_capture.txt \
	 \
	--filter bustools \
	 \
	 \
	-t 32 \
	-m 256G \
	$(cat fastqs.tsv)

fastqs.tsv contains space-delimited FASTQ filenames (e.g."CGP_R1.fastq.gz CGP_R2.fastq.gz" which are located in the present working directory)

Command output (with --verbose flag)

CGP_R1.fastq.gz CGP_R2.fastq.gz[2020-03-30 15:58:59,055]   DEBUG Printing verbose output
[2020-03-30 15:58:59,055]   DEBUG Creating tmp directory
[2020-03-30 15:58:59,056]   DEBUG Namespace(c1='/cromwell_root/shalek-lab-cromwell/kallisto-bustools/jobs/testkb4/ref/cDNA_transcripts_to_capture.txt', c2='/cromwell_root/shalek-lab-cromwell/kallisto-bustools/jobs/testkb4/ref/intron_transcripts_to_capture.txt', command='count', fastqs=['CGP_R1.fastq.gz', 'CGP_R2.fastq.gz'], filter='bustools', g='/cromwell_root/shalek-lab-cromwell/kallisto-bustools/jobs/testkb4/ref/transcripts_to_genes.txt', h5ad=False, i='/cromwell_root/shalek-lab-cromwell/kallisto-bustools/jobs/testkb4/ref/index.idx', keep_tmp=False, lamanno=True, list=False, loom=False, m='256G', nucleus=False, o='count_CGP', overwrite=False, t=32, verbose=True, w=None, x='DROPSEQ')
[2020-03-30 15:58:59,057]    INFO Generating BUS file from
[2020-03-30 15:58:59,057]    INFO         CGP_R1.fastq.gz
[2020-03-30 15:58:59,057]    INFO         CGP_R2.fastq.gz
[2020-03-30 15:58:59,057]   DEBUG /usr/local/lib/python3.7/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i /cromwell_root/shalek-lab-cromwell/kallisto-bustools/jobs/testkb4/ref/index.idx -o count_CGP -x DROPSEQ -t 32 CGP_R1.fastq.gz CGP_R2.fastq.gz
[2020-03-30 16:17:38,738]   DEBUG 
[2020-03-30 16:17:38,738]   DEBUG [index] k-mer length: 31
[2020-03-30 16:17:38,738]   DEBUG [index] number of targets: 790,418
[2020-03-30 16:17:38,738]   DEBUG [index] number of k-mers: 1,112,132,904
[2020-03-30 16:17:38,738]   DEBUG [index] number of equivalence classes: 5,486,663
[2020-03-30 16:17:38,738]   DEBUG [quant] will process sample 1: CGP_R1.fastq.gz
[2020-03-30 16:17:38,738]   DEBUG CGP_R2.fastq.gz
[2020-03-30 16:17:38,738]   DEBUG [quant] finding pseudoalignments for the reads ... done
[2020-03-30 16:17:38,738]   DEBUG [quant] processed 254,495,575 reads, 193,336,101 reads pseudoaligned
[2020-03-30 16:17:38,739]    INFO Sorting BUS file count_CGP/output.bus to tmp/output.s.bus
[2020-03-30 16:17:38,739]   DEBUG /usr/local/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools sort -o tmp/output.s.bus -T tmp -t 32 -m 256G count_CGP/output.bus
[2020-03-30 16:20:45,441]   ERROR An exception occurred
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/kb_python/main.py", line 483, in main
    COMMAND_TO_FUNCTION[args.command](args)
  File "/usr/local/lib/python3.7/site-packages/kb_python/main.py", line 135, in parse_count
    nucleus=args.nucleus,
  File "/usr/local/lib/python3.7/site-packages/kb_python/count.py", line 689, in count_velocity
    memory=memory
  File "/usr/local/lib/python3.7/site-packages/kb_python/count.py", line 97, in bustools_sort
    run_executable(command)
  File "/usr/local/lib/python3.7/site-packages/kb_python/utils.py", line 147, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/usr/local/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools sort -o tmp/output.s.bus -T tmp -t 32 -m 256G count_CGP/output.bus' died with <Signals.SIGKILL: 9>.
[2020-03-30 16:20:45,634]   DEBUG Removing tmp directory

Furthermore, although I do not think memory is the problem, the monitoring log shows that the memory usage skyrocketed to 95% around this time (perhaps to delete the tmp dir?):

[Mon Mar 30 16:20:01 UTC 2020]
* CPU usage: 2.84003%
* Memory usage: 76.31%
* Disk usage: 19%
[Mon Mar 30 16:20:31 UTC 2020]
* CPU usage: 3.04777%
* Memory usage: 95.0675%
* Disk usage: 19%

Please help me figure out why the command is being killed. Perhaps the subprocess.CallProcessError contains a hint but is not outputting anything descriptive? Thank you!

edit: I should also say that the error exits with a return code of 0 somehow

Cannot build index using the fasta file with "chr"

Describe the issue
If I tried to build index using genome fasta files downloaded from UCSC and the GENCODE gene annotation gtf file, it produced empty index files. When I removed all "chr" in the fasta and gtf files, it works. If you can consider adding support for files with chromosomes written as "chrXX", that would be great.

kb ref doesn't support gff and has no readable message about it

Hello. Thank you for your tools.

As I got it kb doesn't support gff files for now.

What is the exact command that was run?

kb ref -i transcriptome.idx -g transcripts_to_genes.txt -f1 cdna.fa dna.primary_assembly.fa.gz gtf.gz

Command output

[2020-04-15 16:55:38,254]   ERROR An exception occurred
Traceback (most recent call last):
  File "/home/my/.local/lib/python3.8/site-packages/kb_python/main.py", line 483, in main
    COMMAND_TO_FUNCTION[args.command](args)
  File "/home/my/.local/lib/python3.8/site-packages/kb_python/main.py", line 103, in parse_ref
    ref(
  File "/home/my/.local/lib/python3.8/site-packages/kb_python/ref.py", line 273, in ref
    t2g_result = create_t2g_from_gtf(gtf_path, t2g_path)
  File "/home/my/.local/lib/python3.8/site-packages/kb_python/ref.py", line 105, in create_t2g_from_gtf
    transcript_id = entry['group']['transcript_id']
KeyError: 'transcript_id'

Could you add gff support or at lease create more readable error message?

Regards,
Artyom.

Non-zero exit status 1 when using velocity flags.

When using the velocity flags --lamanno and --nucleus, I get a non-zero exit status 1, and the analysis doesn't finish.
Note that this issue does not occur when I remove those flags. When I do that, it appears to finish without issue and the output is not empty. Also, the index is not empty, as suggested in what might be a related issue. For additional context, this is single-nucleus RNA-seq data. Running on a p3.2xlarge AWS instance.

What is the exact command that was run?

 kb count -i index.idx -g transcripts_to_genes.txt -o SRR8528318 -c1 cdna_transcript_to_capture.txt -c2 intron_transcripts_to_capture.txt --filter --keep-tmp --h5ad --lamanno --verbose -x DropSeq ../../func_gen/data/raw/SRR8528318_1.fastq ../../func_gen/data/raw/
SRR8528318_2.fastq

Command output (with --verbose flag)

[2019-12-09 23:10:07,776]   DEBUG Printing verbose output
[2019-12-09 23:10:07,776]   DEBUG Creating tmp directory
[2019-12-09 23:10:07,776]   DEBUG Namespace(c1='cdna_transcript_to_capture.txt', c2='intron_transcripts_to_capture.txt', command='count', fastqs=['../../func_gen/data/raw/SRR8528318_1.fastq', '../../func_gen/data/raw/SRR8528318_2.fastq'], filter='bustools', g='transcripts_to_genes.txt', h5ad=True, i='index.idx', keep_tmp=True, lamanno=True, list=False, loom=False, m='4G', nucleus=False, o='SRR8528318', overwrite=False, t=8, verbose=True, w=None, x='DropSeq')
[2019-12-09 23:10:07,776]    INFO Skipping kallisto bus because output files already exist. Use the --overwrite flag to overwrite.
[2019-12-09 23:10:07,776]    INFO Sorting BUS file SRR8528318/output.bus to tmp/output.s.bus
[2019-12-09 23:10:07,776]   DEBUG /home/ubuntu/anaconda3/lib/python3.6/site-packages/kb_python/bins/linux/bustools/bustools sort -o tmp/output.s.bus -T tmp -t 8 -m 4G SRR8528318/output.bus
[2019-12-09 23:10:11,527]   DEBUG Read in 3435664 BUS records
[2019-12-09 23:10:11,527]    INFO Whitelist not provided
[2019-12-09 23:10:11,527]    INFO Generating whitelist SRR8528318/whitelist.txt from BUS file tmp/output.s.bus
[2019-12-09 23:10:11,528]   DEBUG /home/ubuntu/anaconda3/lib/python3.6/site-packages/kb_python/bins/linux/bustools/bustools whitelist -o SRR8528318/whitelist.txt tmp/output.s.bus
[2019-12-09 23:10:11,573]   DEBUG Read in 3137746 BUS records, wrote 22397 barcodes to whitelist with threshold 13
[2019-12-09 23:10:11,573]    INFO Inspecting BUS file tmp/output.s.bus
[2019-12-09 23:10:11,573]   DEBUG /home/ubuntu/anaconda3/lib/python3.6/site-packages/kb_python/bins/linux/bustools/bustools inspect -o SRR8528318/inspect.json -w SRR8528318/whitelist.txt -e SRR8528318/matrix.ec tmp/output.s.bus
[2019-12-09 23:10:23,628]    INFO Correcting BUS records in tmp/output.s.bus to tmp/output.s.c.bus with whitelist SRR8528318/whitelist.txt
[2019-12-09 23:10:23,629]   DEBUG /home/ubuntu/anaconda3/lib/python3.6/site-packages/kb_python/bins/linux/bustools/bustools correct -o tmp/output.s.c.bus -w SRR8528318/whitelist.txt tmp/output.s.bus
[2019-12-09 23:10:24,039]   DEBUG Found 22397 barcodes in the whitelist
[2019-12-09 23:10:24,040]   DEBUG Number of hamming dist 1 barcodes = 886294
[2019-12-09 23:10:24,040]   DEBUG Processed 3137746 bus records
[2019-12-09 23:10:24,040]   DEBUG In whitelist = 414442
[2019-12-09 23:10:24,040]   DEBUG Corrected = 213006
[2019-12-09 23:10:24,040]   DEBUG Uncorrected = 2510298
[2019-12-09 23:10:24,040]    INFO Sorting BUS file tmp/output.s.c.bus to SRR8528318/output.unfiltered.bus
[2019-12-09 23:10:24,040]   DEBUG /home/ubuntu/anaconda3/lib/python3.6/site-packages/kb_python/bins/linux/bustools/bustools sort -o SRR8528318/output.unfiltered.bus -T tmp -t 8 -m 4G tmp/output.s.c.bus
[2019-12-09 23:10:27,062]   DEBUG Read in 627448 BUS records
[2019-12-09 23:10:27,063]    INFO Capturing records from BUS file SRR8528318/output.unfiltered.bus to tmp/spliced.bus with capture list cdna_transcript_to_capture.txt
[2019-12-09 23:10:27,063]   DEBUG /home/ubuntu/anaconda3/lib/python3.6/site-packages/kb_python/bins/linux/bustools/bustools capture -o tmp/spliced.bus -c cdna_transcript_to_capture.txt -e SRR8528318/matrix.ec -t SRR8528318/transcripts.txt --transcripts SRR8528318/output.unfiltered.bus
[2019-12-09 23:10:27,066]   DEBUG Usage: bustools text [options] bus-files
[2019-12-09 23:10:27,066]   DEBUG
[2019-12-09 23:10:27,066]   DEBUG Options:
[2019-12-09 23:10:27,066]   DEBUG -o, --output          File for text output
[2019-12-09 23:10:27,066]   DEBUG -p, --pipe            Write to standard output
[2019-12-09 23:10:27,067]   DEBUG
[2019-12-09 23:10:27,067]   DEBUG Error: File not found, cdna_transcript_to_capture.txt
[2019-12-09 23:10:27,067]   ERROR An exception occurred
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/kb_python/main.py", line 483, in main
    COMMAND_TO_FUNCTION[args.command](args)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/kb_python/main.py", line 135, in parse_count
    nucleus=args.nucleus,
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/kb_python/count.py", line 726, in count_velocity
    bus_result['ecmap'], bus_result['txnames']
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/kb_python/count.py", line 228, in bustools_capture
    run_executable(command)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/kb_python/utils.py", line 147, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/ubuntu/anaconda3/lib/python3.6/site-packages/kb_python/bins/linux/bustools/bustools capture -o tmp/spliced.bus -c cdna_transcript_to_capture.txt -e SRR8528318/matrix.ec -t SRR8528318/transcripts.txt --transcripts SRR8528318/output.unfiltered.bus' returned non-zero exit status 1.

I would really appreciate your help with this, as I am very excited to try using velocity with kb. Thank you for your time!

Problem with creating loom file for 10xV3 single cell velocity analysis

Hi,
I have a problem with kb count. I am trying to do a velocity analysis using fastq files from 10x chromium v3. The function is working until the last step (writing the loom file) and I get an error saying "loompy does not accept empty matrices as data

This is the command I used:

kb count -i index.idx -o MAD2_Velo2 -g t2g.txt -x 10xv3 --workflow lamanno --loom -c1 cdna_t2c.txt -c2 intron_t2c.txt -m 60G -t 12 \
MAD2/MAD2_S2_L001_R1_001.fastq.gz \
MAD2/MAD2_S2_L001_R2_001.fastq.gz \
MAD2/MAD2_S2_L002_R1_001.fastq.gz \
MAD2/MAD2_S2_L002_R2_001.fastq.gz

Command output (with --verbose flag)

[2020-12-07 09:57:20,955]    INFO Using index index.idx to generate BUS file to MAD2_Velo2 from
[2020-12-07 09:57:20,955]    INFO         MAD2/MAD2_S2_L001_R1_001.fastq.gz
[2020-12-07 09:57:20,955]    INFO         MAD2/MAD2_S2_L001_R2_001.fastq.gz
[2020-12-07 09:57:20,955]    INFO         MAD2/MAD2_S2_L002_R1_001.fastq.gz
[2020-12-07 09:57:20,955]    INFO         MAD2/MAD2_S2_L002_R2_001.fastq.gz
[2020-12-07 11:23:58,066]    INFO Sorting BUS file MAD2_Velo2/output.bus to MAD2_Velo2/tmp/output.s.bus
[2020-12-07 11:26:02,310]    INFO Whitelist not provided
[2020-12-07 11:26:02,310]    INFO Copying pre-packaged 10XV3 whitelist to MAD2_Velo2
[2020-12-07 11:26:02,919]    INFO Inspecting BUS file MAD2_Velo2/tmp/output.s.bus
[2020-12-07 11:27:50,834]    INFO Correcting BUS records in MAD2_Velo2/tmp/output.s.bus to MAD2_Velo2/tmp/output.s.c.bus with whitelist MAD2_Velo2/10xv3_whitelist.txt
[2020-12-07 11:28:44,251]    INFO Sorting BUS file MAD2_Velo2/tmp/output.s.c.bus to MAD2_Velo2/output.unfiltered.bus
[2020-12-07 11:30:21,574]    INFO Capturing records from BUS file MAD2_Velo2/output.unfiltered.bus to MAD2_Velo2/tmp/spliced.bus with capture list intron_t2c.txt
[2020-12-07 11:31:54,277]    INFO Sorting BUS file MAD2_Velo2/tmp/spliced.bus to MAD2_Velo2/spliced.unfiltered.bus
[2020-12-07 11:36:45,406]    INFO Inspecting BUS file MAD2_Velo2/spliced.unfiltered.bus
[2020-12-07 12:14:39,721]    INFO Generating count matrix MAD2_Velo2/counts_unfiltered/spliced from BUS file MAD2_Velo2/spliced.unfiltered.bus
[2020-12-07 12:16:46,502]    INFO Capturing records from BUS file MAD2_Velo2/output.unfiltered.bus to MAD2_Velo2/tmp/unspliced.bus with capture list cdna_t2c.txt
[2020-12-07 12:17:52,328]    INFO Sorting BUS file MAD2_Velo2/tmp/unspliced.bus to MAD2_Velo2/unspliced.unfiltered.bus
[2020-12-07 12:18:34,412]    INFO Inspecting BUS file MAD2_Velo2/unspliced.unfiltered.bus
[2020-12-07 12:19:22,924]    INFO Generating count matrix MAD2_Velo2/counts_unfiltered/unspliced from BUS file MAD2_Velo2/unspliced.unfiltered.bus
[2020-12-07 12:20:40,802]    INFO Reading matrix MAD2_Velo2/counts_unfiltered/spliced.mtx
[2020-12-07 12:21:24,876]    INFO Reading matrix MAD2_Velo2/counts_unfiltered/unspliced.mtx
[2020-12-07 12:21:43,009]    INFO Combining matrices
[2020-12-07 12:21:44,149]    INFO Writing matrices to loom MAD2_Velo2/counts_unfiltered/adata.loom
[2020-12-07 12:21:44,157]   ERROR An exception occurred
Traceback (most recent call last):
  File "/Users/mohamedomar/opt/anaconda3/lib/python3.8/site-packages/kb_python/main.py", line 785, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/Users/mohamedomar/opt/anaconda3/lib/python3.8/site-packages/kb_python/main.py", line 195, in parse_count
    count_velocity(
  File "/Users/mohamedomar/opt/anaconda3/lib/python3.8/site-packages/kb_python/count.py", line 1573, in count_velocity
    convert_matrices(
  File "/Users/mohamedomar/opt/anaconda3/lib/python3.8/site-packages/kb_python/count.py", line 795, in convert_matrices
    adata.write_loom(loom_path)
  File "/Users/mohamedomar/opt/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1889, in write_loom
    write_loom(filename, self, write_obsm_varm=write_obsm_varm)
  File "/Users/mohamedomar/opt/anaconda3/lib/python3.8/site-packages/anndata/_io/write.py", line 90, in write_loom
    raise ValueError("loompy does not accept empty matrices as data")
ValueError: loompy does not accept empty matrices as data

Tutorial doesn't correspond to the latest version on Pypi

Describe the issue
Installed from Pypi kb_python 0.24.4 has a different parameters in the help message compared to those that are mentioned in the Tutorial

What is the exact command that was run?

root@d27192e421b0:/tmp# kb ref                 
usage: kb ref [-h] [--keep-tmp] [--verbose] -i INDEX -g T2G -f1 FASTA [-f2 FASTA] [-c1 T2C] [-c2 T2C] [-d {human,mouse,linnarsson}] [--lamanno] [--overwrite] fasta gtf

Build a kallisto index and transcript-to-gene mapping

positional arguments:
  fasta                 Genomic FASTA file
  gtf                   Reference GTF file

optional arguments:
  -h, --help            Show this help message and exit
  --keep-tmp            Do not delete the tmp directory
  --verbose             Print debugging information
  -d {human,mouse,linnarsson}
                        Download a pre-built kallisto index (along with all necessary files) instead of building it locally
  --lamanno             Prepare files for RNA velocity based on La Manno et al. 2018 logic
  --overwrite           Overwrite existing kallisto index

required arguments:
  -i INDEX              Path to the kallisto index to be constructed
  -g T2G                Path to transcript-to-gene mapping to be generated
  -f1 FASTA             [Optional with -d] Path to the cDNA FASTA to be generated

required arguments for --lamanno:
  -f2 FASTA             Path to the intron FASTA to be generated
  -c1 T2C               Path to generate cDNA transcripts-to-capture
  -c2 T2C               Path to generate intron transcripts-to-capture

What is expected output?

$ kb ref
usage: kb ref [-h] [--tmp TMP] [--keep-tmp] [--verbose] -i INDEX -g T2G -f1
              FASTA [-f2 FASTA] [-c1 T2C] [-c2 T2C] [-n N]
              [-d {human,mouse,linnarsson}] [-k K]
              [--workflow {standard,lamanno,nucleus,kite}] [--lamanno]
              [--overwrite]
              fasta gtf [feature]

Build a kallisto index and transcript-to-gene mapping

positional arguments:
  fasta                 Genomic FASTA file(s), comma-delimited
  gtf                   Reference GTF file(s), comma-delimited
  feature               [`kite` workflow only] Path to TSV containing barcodes
                        and feature names.

optional arguments:
  -h, --help            Show this help message and exit
  --tmp TMP             Override default temporary directory
  --keep-tmp            Do not delete the tmp directory
  --verbose             Print debugging information
  -n N                  Number of files to split the index into. If this
                        option is specified, the FASTA that is normally used
                        to create an index is split into `N` approximately-
                        equal parts. Each of these FASTAs are indexed
                        separately.
  -d {human,mouse,linnarsson}
                        Download a pre-built kallisto index (along with all
                        necessary files) instead of building it locally
  -k K                  Use this option to override the k-mer length of the
                        index. Usually, the k-mer length automatically
                        calculated by `kb` provides the best results.
  --workflow {standard,lamanno,nucleus,kite}
                        Type of workflow to prepare files for. Use `lamanno`
                        for RNA velocity based on La Manno et al. 2018 logic.
                        Use `nucleus` for RNA velocity on single-nucleus RNA-
                        seq reads. Use `kite` for feature barcoding. (default:
                        standard)
  --lamanno             Deprecated. Use `--workflow lamanno` instead.
  --overwrite           Overwrite existing kallisto index

required arguments:
  -i INDEX              Path to the kallisto index to be constructed. If `-n`
                        is also specified, this is the prefix for the n
                        indices to construct.
  -g T2G                Path to transcript-to-gene mapping to be generated
  -f1 FASTA             [Optional with -d] Path to the cDNA FASTA (lamanno,
                        nucleus) or mismatch FASTA (kite) to be generated

required arguments for `lamanno` and `nucleus` workflows:
  -f2 FASTA             Path to the intron FASTA to be generated
  -c1 T2C               Path to generate cDNA transcripts-to-capture
  -c2 T2C               Path to generate intron transcripts-to-capture

Error in generating RNA count matrices

Hello, I'm mostly following kb_single_nucleus.ipynb to analyze mouse 10x3 single nucleus sequencing data. 1) I append a Cre-recombinase sequence and a EGFP sequence to the Mus_musculus.GRCm38.dna.primary_assembly.fa 2) I append both CRE and EGFP to the Mus_musculus.GRCm38.100.gtf
Everything seemed to work fine until the last step of generating RNA count matrices. In the output/counts_unfiltered, I see 6 files are generated (spliced.mtx, spliced.genes.txt, spliced.barcodes.txt, unspliced.mtx, unspliced.genes.txt, unspliced.barcodes.txt) after ~3 hours. And suddenly it stopped with a ValueError: could not convert integer scalar
I THINK it somehow can't do few last steps to write matrices for making .h5ad
Would you please take a look the Traceback and help me?

image

ensembl ID to gene symbol for kb output

Hi there,

I was using the function of read_count_output in BUSpaRse to analyze the output of kb.

The output files of kb is as followed
cells_x_genes.barcodes.txt cells_x_genes.genes.txt cells_x_genes.mtx

head of cells_x_genes.genes.txt is as followed

ENSG00000243485.5
ENSG00000237613.2
ENSG00000186092.6
ENSG00000238009.6
ENSG00000239945.1
ENSG00000239906.1
ENSG00000241599.1
ENSG00000236601.2
ENSG00000284733.1
ENSG00000235146.2

I was able to generate the dgCMatrix with barcodes in columns and genes in rows. And the gene names in rows are the Ensembl gene ID with version.

Is there any way to convert the Ensembl gene ID to the gene symbol?

Thank you!

Create transcriptome index for both coding and non-coding genes

Hey,

First, thanks a lot for implementing this software, it is super helpful.

I am trying to build the transcriptome index for hg38 and I was wondering whether it would include the different transcripts for both coding and non-coding genes or if it is only generating the index file considering protein coding genes.

Specifically I have run

kb ref -d human -i <index.idx> -g <t2g.txt>

Does the output index.idx include the indexes for coding and non-coding genes?

Thanks in advance,

Best,

Kike

Issue with sorting BUS files

Hello,

I am following the RNA velocity tutorial (https://www.kallistobus.tools/kb_velocity_tutorial.html); I have successfully built the Linnarsson index, but am running into an issue when generating the velocity matrices. When running the command:

"! kb count -i index.idx -g transcripts_to_genes.txt -x 10xv3 -o output
-c1 cdna_transcripts_to_capture.txt -c2 intron_transcripts_to_capture.txt --lamanno
/Users/seandelao/Desktop/85dpc_FASTQ_files/85dpc_1_S8_L001_R1_001.fastq.gz
/Users/seandelao/Desktop/85dpc_FASTQ_files/85dpc_1_S8_L001_R2_001.fastq.gz
/Users/seandelao/Desktop/85dpc_FASTQ_files/85dpc_1_S8_L002_R1_001.fastq.gz
/Users/seandelao/Desktop/85dpc_FASTQ_files/85dpc_1_S8_L002_R2_001.fastq.gz

it runs through a few of the steps, but fails at sorting the BUS files. Here is the output and traceback:

[2019-11-10 16:26:44,892] INFO Skipping kallisto bus because output files already exist. Use the --overwrite flag to overwrite.
[2019-11-10 16:26:44,892] INFO Sorting BUS file output/output.bus to tmp/output.s.bus
[2019-11-10 16:27:39,756] INFO Whitelist not provided
[2019-11-10 16:27:39,756] INFO Copying pre-packaged 10XV3 whitelist to output
[2019-11-10 16:27:40,835] INFO Inspecting BUS file tmp/output.s.bus
[2019-11-10 16:28:53,119] INFO Correcting BUS records in tmp/output.s.bus to tmp/output.s.c.bus with whitelist output/10xv3_whitelist.txt
[2019-11-10 16:32:34,900] INFO Sorting BUS file tmp/output.s.c.bus to output/output.unfiltered.bus
[2019-11-10 16:33:23,774] INFO Capturing records from BUS file output/output.unfiltered.bus to tmp/spliced.bus with capture list cdna_transcripts_to_capture.txt
[2019-11-10 16:33:37,991] INFO Sorting BUS file tmp/spliced.bus to output/spliced.unfiltered.bus
[2019-11-10 16:33:41,346] ERROR An exception occurred
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/kb_python/main.py", line 483, in main
COMMAND_TO_FUNCTIONargs.command
File "/anaconda3/lib/python3.7/site-packages/kb_python/main.py", line 135, in parse_count
nucleus=args.nucleus,
File "/anaconda3/lib/python3.7/site-packages/kb_python/count.py", line 733, in count_velocity
memory=memory
File "/anaconda3/lib/python3.7/site-packages/kb_python/count.py", line 97, in bustools_sort
run_executable(command)
File "/anaconda3/lib/python3.7/site-packages/kb_python/utils.py", line 147, in run_executable
raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/anaconda3/lib/python3.7/site-packages/kb_python/bins/darwin/bustools/bustools sort -o output/spliced.unfiltered.bus -T tmp -t 8 -m 4G tmp/spliced.bus' died with <Signals.SIGSEGV: 11>.

Any idea what might be happening? Thanks!

Process multiple SRRs; BrokenPipe error

Dear kallisto team,

Describe the issue
I want to process a sample with several SRRs (for example, this one) using pipes.
The bash syntax with curly braces was used in order to combine several links into one line. However, I've got a BrokenPipeError error. Any advice? Is my command right for my purpose?

What is the exact command that was run?

# version (installed via conda): kb_python 0.25.1
# prepare reference
kb ref -d mouse -i index.idx -g t2g.txt -f1 transcriptome.fasta
# run count command
kb count -i index.idx -g t2g.txt -x 10XV2 -o SRS7040866 --filter bustools -t 4 -m 40G --verbose ftp://{ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/068/SRR12264568/SRR12264568_1.fastq.gz,ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/069/SRR12264569/SRR12264569_1.fastq.gz} ftp://{ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/068/SRR12264568/SRR12264568_2.fastq.gz,ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/069/SRR12264569/SRR12264569_2.fastq.gz}

Command output

/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
[2021-02-24 17:37:27,551]   DEBUG Printing verbose output
[2021-02-24 17:37:27,552]   DEBUG kallisto binary located at /nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/site-packages/kb_python/bins/linux/kallisto/kallisto
[2021-02-24 17:37:27,552]   DEBUG bustools binary located at /nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/site-packages/kb_python/bins/linux/bustools/bustools
[2021-02-24 17:37:27,553]   DEBUG Creating SRS7040866/tmp directory
[2021-02-24 17:37:27,554]   DEBUG Namespace(c1=None, c2=None, cellranger=False, command='count', dry_run=False, fastqs=['ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/068/SRR12264568/SRR12264568_1.fastq.gz', 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/069/SRR12264569/SRR12264569_1.fastq.gz', 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/068/SRR12264568/SRR12264568_2.fastq.gz', 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/069/SRR12264569/SRR12264569_2.fastq.gz'], filter='bustools', g='t2g.txt', h5ad=False, i='index.idx', keep_tmp=False, lamanno=False, list=False, loom=False, m='40G', mm=False, no_inspect=False, no_validate=False, nucleus=False, o='SRS7040866', overwrite=False, report=False, t=4, tcc=False, tmp=None, verbose=True, w=None, workflow='standard', x='10XV2')
[2021-02-24 17:37:27,554]    INFO Piping ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/068/SRR12264568/SRR12264568_1.fastq.gz to SRS7040866/tmp/SRR12264568_1.fastq.gz
[2021-02-24 17:37:27,557]    INFO Piping ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/069/SRR12264569/SRR12264569_1.fastq.gz to SRS7040866/tmp/SRR12264569_1.fastq.gz
[2021-02-24 17:37:27,557]    INFO Piping ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/068/SRR12264568/SRR12264568_2.fastq.gz to SRS7040866/tmp/SRR12264568_2.fastq.gz
[2021-02-24 17:37:27,558]    INFO Piping ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/069/SRR12264569/SRR12264569_2.fastq.gz to SRS7040866/tmp/SRR12264569_2.fastq.gz
[2021-02-24 17:37:27,560]    INFO Using index index.idx to generate BUS file to SRS7040866 from
[2021-02-24 17:37:27,560]    INFO         SRS7040866/tmp/SRR12264568_1.fastq.gz
[2021-02-24 17:37:27,560]    INFO         SRS7040866/tmp/SRR12264569_1.fastq.gz
[2021-02-24 17:37:27,560]    INFO         SRS7040866/tmp/SRR12264568_2.fastq.gz
[2021-02-24 17:37:27,560]    INFO         SRS7040866/tmp/SRR12264569_2.fastq.gz
[2021-02-24 17:37:27,560]   DEBUG kallisto bus -i index.idx -o SRS7040866 -x 10XV2 -t 4 SRS7040866/tmp/SRR12264568_1.fastq.gz SRS7040866/tmp/SRR12264569_1.fastq.gz SRS7040866/tmp/SRR12264568_2.fastq.gz SRS7040866/tmp/SRR12264569_2.fastq.gz
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/urllib/request.py", line 280, in urlretrieve
    tfp.write(block)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/urllib/request.py", line 283, in urlretrieve
    reporthook(blocknum, bs, size)
BrokenPipeError: [Errno 32] Broken pipe
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/urllib/request.py", line 280, in urlretrieve
    tfp.write(block)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/urllib/request.py", line 283, in urlretrieve
    reporthook(blocknum, bs, size)
BrokenPipeError: [Errno 32] Broken pipe
[2021-02-24 17:47:24,583]   DEBUG 
[2021-02-24 17:47:24,583]   DEBUG [index] k-mer length: 31
[2021-02-24 17:47:24,583]   DEBUG [index] number of targets: 142,446
[2021-02-24 17:47:24,583]   DEBUG [index] number of k-mers: 120,632,459
[2021-02-24 17:47:24,583]   DEBUG [index] number of equivalence classes: 512,299
[2021-02-24 17:47:24,583]   DEBUG [quant] will process sample 1: SRS7040866/tmp/SRR12264568_1.fastq.gz
[2021-02-24 17:47:24,584]   DEBUG SRS7040866/tmp/SRR12264569_1.fastq.gz
[2021-02-24 17:47:24,584]   DEBUG [quant] will process sample 2: SRS7040866/tmp/SRR12264568_2.fastq.gz
[2021-02-24 17:47:24,584]   DEBUG SRS7040866/tmp/SRR12264569_2.fastq.gz
[2021-02-24 17:47:24,584]   DEBUG [quant] finding pseudoalignments for the reads ... done
[2021-02-24 17:47:24,584]   DEBUG [quant] processed 29,498,682 reads, 11,599,332 reads pseudoaligned
[2021-02-24 17:47:26,907]   DEBUG SRS7040866/output.bus passed validation
[2021-02-24 17:47:26,907]    INFO Sorting BUS file SRS7040866/output.bus to SRS7040866/tmp/output.s.bus
[2021-02-24 17:47:26,907]   DEBUG bustools sort -o SRS7040866/tmp/output.s.bus -T SRS7040866/tmp -t 4 -m 40G SRS7040866/output.bus
[2021-02-24 17:47:58,300]   DEBUG Read in 11599332 BUS records
[2021-02-24 17:47:59,440]   DEBUG SRS7040866/tmp/output.s.bus passed validation
[2021-02-24 17:47:59,440]    INFO Whitelist not provided
[2021-02-24 17:47:59,440]    INFO Copying pre-packaged 10XV2 whitelist to SRS7040866
[2021-02-24 17:47:59,550]    INFO Inspecting BUS file SRS7040866/tmp/output.s.bus
[2021-02-24 17:47:59,550]   DEBUG bustools inspect -o SRS7040866/inspect.json -w SRS7040866/10xv2_whitelist.txt -e SRS7040866/matrix.ec SRS7040866/tmp/output.s.bus
[2021-02-24 17:48:03,253]    INFO Correcting BUS records in SRS7040866/tmp/output.s.bus to SRS7040866/tmp/output.s.c.bus with whitelist SRS7040866/10xv2_whitelist.txt
[2021-02-24 17:48:03,254]   DEBUG bustools correct -o SRS7040866/tmp/output.s.c.bus -w SRS7040866/10xv2_whitelist.txt SRS7040866/tmp/output.s.bus
[2021-02-24 17:48:05,145]   DEBUG Found 737280 barcodes in the whitelist
[2021-02-24 17:48:05,146]   DEBUG Processed 10969696 BUS records
[2021-02-24 17:48:05,146]   DEBUG In whitelist = 5736
[2021-02-24 17:48:05,146]   DEBUG Corrected    = 49327
[2021-02-24 17:48:05,146]   DEBUG Uncorrected  = 10914633
[2021-02-24 17:48:05,169]   DEBUG SRS7040866/tmp/output.s.c.bus passed validation
[2021-02-24 17:48:05,169]    INFO Sorting BUS file SRS7040866/tmp/output.s.c.bus to SRS7040866/output.unfiltered.bus
[2021-02-24 17:48:05,169]   DEBUG bustools sort -o SRS7040866/output.unfiltered.bus -T SRS7040866/tmp -t 4 -m 40G SRS7040866/tmp/output.s.c.bus
[2021-02-24 17:48:32,969]   DEBUG Read in 55063 BUS records
[2021-02-24 17:48:32,992]   DEBUG SRS7040866/output.unfiltered.bus passed validation
[2021-02-24 17:48:32,992]    INFO Generating count matrix SRS7040866/counts_unfiltered/cells_x_genes from BUS file SRS7040866/output.unfiltered.bus
[2021-02-24 17:48:32,992]   DEBUG bustools count -o SRS7040866/counts_unfiltered/cells_x_genes -g t2g.txt -e SRS7040866/matrix.ec -t SRS7040866/transcripts.txt --genecounts SRS7040866/output.unfiltered.bus
[2021-02-24 17:48:34,997]   DEBUG SRS7040866/counts_unfiltered/cells_x_genes.mtx passed validation
[2021-02-24 17:48:34,997]    INFO Filtering with bustools
[2021-02-24 17:48:34,997]    INFO Generating whitelist SRS7040866/filter_barcodes.txt from BUS file SRS7040866/output.unfiltered.bus
[2021-02-24 17:48:34,997]   DEBUG bustools whitelist -o SRS7040866/filter_barcodes.txt SRS7040866/output.unfiltered.bus
[2021-02-24 17:48:35,012]   DEBUG Read in 55053 BUS records, wrote 100 barcodes to whitelist with threshold 63
[2021-02-24 17:48:35,013]    INFO Correcting BUS records in SRS7040866/output.unfiltered.bus to SRS7040866/tmp/output.unfiltered.c.bus with whitelist SRS7040866/filter_barcodes.txt
[2021-02-24 17:48:35,013]   DEBUG bustools correct -o SRS7040866/tmp/output.unfiltered.c.bus -w SRS7040866/filter_barcodes.txt SRS7040866/output.unfiltered.bus
[2021-02-24 17:48:35,869]   DEBUG Found 100 barcodes in the whitelist
[2021-02-24 17:48:35,869]   DEBUG Processed 55053 BUS records
[2021-02-24 17:48:35,870]   DEBUG In whitelist = 14486
[2021-02-24 17:48:35,870]   DEBUG Corrected    = 0
[2021-02-24 17:48:35,870]   DEBUG Uncorrected  = 40567
[2021-02-24 17:48:35,885]   DEBUG SRS7040866/tmp/output.unfiltered.c.bus passed validation
[2021-02-24 17:48:35,885]    INFO Sorting BUS file SRS7040866/tmp/output.unfiltered.c.bus to SRS7040866/output.filtered.bus
[2021-02-24 17:48:35,885]   DEBUG bustools sort -o SRS7040866/output.filtered.bus -T SRS7040866/tmp -t 4 -m 40G SRS7040866/tmp/output.unfiltered.c.bus
[2021-02-24 17:49:04,372]   DEBUG Read in 14486 BUS records
[2021-02-24 17:49:04,384]   DEBUG SRS7040866/output.filtered.bus passed validation
[2021-02-24 17:49:04,384]    INFO Generating count matrix SRS7040866/counts_filtered/cells_x_genes from BUS file SRS7040866/output.filtered.bus
[2021-02-24 17:49:04,385]   DEBUG bustools count -o SRS7040866/counts_filtered/cells_x_genes -g t2g.txt -e SRS7040866/matrix.ec -t SRS7040866/transcripts.txt --genecounts SRS7040866/output.filtered.bus
[2021-02-24 17:49:06,588]   DEBUG SRS7040866/counts_filtered/cells_x_genes.mtx passed validation
[2021-02-24 17:49:06,605]   DEBUG Removing SRS7040866/tmp directory

I also downloaded these files manually and ran kb count with the same parameters. As expected, an output with a higher number of barcodes was generated.

[2021-02-24 18:12:13,375]   DEBUG Found 737280 barcodes in the whitelist
[2021-02-24 18:12:13,376]   DEBUG Processed 19498506 BUS records
[2021-02-24 18:12:13,376]   DEBUG In whitelist = 18935130
[2021-02-24 18:12:13,376]   DEBUG Corrected    = 157841
[2021-02-24 18:12:13,376]   DEBUG Uncorrected  = 405535

Passing --make-unique argument into kb

Dear kb team,

I am trying to build a reference using an axolotl genome, AmexG_v6.0-DD, from this website, and the corresponding annotation file, to quantify scRNA-seq data.

When I try to run the following command:

kb ref -i transcriptome_axolotl.idx -g t2g_axolotl.txt -f1 cdna_axolotl.fa \
AmexG_v6.0-DD.fa.gz \
AmexT_v47-AmexG_v6.0-DD.gtf.gz

I get this output (after passing --verbose but it was so long I dramatically shortened it):

[2021-02-14 18:04:16,288]   DEBUG Printing verbose output
[2021-02-14 18:04:16,288]   DEBUG kallisto binary located at /home/axela/.local/lib/python3.6/site-packages/kb_python/bins/linux/kallisto/kallisto
[2021-02-14 18:04:16,288]   DEBUG bustools binary located at /home/axela/.local/lib/python3.6/site-packages/kb_python/bins/linux/bustools/bustools
[2021-02-14 18:04:16,288]   DEBUG Creating tmp directory
[2021-02-14 18:04:16,288]   DEBUG Namespace(c1=None, c2=None, command='ref', d=None, f1='cdna_axolotl.fa', f2=None, fasta='AmexG_v6.0-DD.fa.gz', feature=None, flank=None, g='t2g_axolotl.txt', gtf='AmexT_v47-AmexG_v6.0-DD.gtf.gz', i='transcriptome_axolotl.idx', k=None, keep_tmp=False, lamanno=False, list=False, n=1, no_mismatches=False, overwrite=False, tmp=None, verbose=True, workflow='standard')
[2021-02-14 18:04:16,288]    INFO Preparing AmexG_v6.0-DD.fa.gz, AmexT_v47-AmexG_v6.0-DD.gtf.gz
[2021-02-14 18:04:16,288]    INFO Decompressing AmexT_v47-AmexG_v6.0-DD.gtf.gz to tmp
[2021-02-14 18:04:17,711]    INFO Creating transcript-to-gene mapping at /mnt/axela/scRNASeqData/axolotl/tmp/tmpt560t2dd
[2021-02-14 18:04:31,236]    INFO Decompressing AmexG_v6.0-DD.fa.gz to tmp
[2021-02-14 18:07:18,039]    INFO Sorting tmp/AmexG_v6.0-DD.fa to /mnt/axela/scRNASeqData/axolotl/tmp/tmp27hucco4
[2021-02-14 18:07:18,039]   DEBUG Found FASTA entry chr10p
...
[2021-02-14 18:59:59,712]    INFO Concatenating 1 transcript-to-gene mappings to t2g_axolotl.txt
[2021-02-14 18:59:59,780]    INFO Concatenating 1 cDNAs to cdna_axolotl.fa
[2021-02-14 19:00:00,451]    INFO Indexing cdna_axolotl.fa to transcriptome_axolotl.idx
[2021-02-14 19:00:00,452]   DEBUG kallisto index -i transcriptome_axolotl.idx -k 31 cdna_axolotl.fa
[2021-02-14 19:00:00,470]   DEBUG 
[2021-02-14 19:00:00,471]   DEBUG [build] loading fasta file cdna_axolotl.fa
[2021-02-14 19:00:00,471]   DEBUG [build] k-mer length: 31
[2021-02-14 19:00:00,471]   DEBUG Error: repeated name in FASTA file cdna_axolotl.fa
[2021-02-14 19:00:00,471]   DEBUG LOC111947635
[2021-02-14 19:00:00,471]   DEBUG 
[2021-02-14 19:00:00,471]   DEBUG Run with --make-unique to replace repeated names with unique names
[2021-02-14 19:00:00,471]   ERROR 
[build] loading fasta file cdna_axolotl.fa
[build] k-mer length: 31
Error: repeated name in FASTA file cdna_axolotl.fa
LOC111947635

Run with --make-unique to replace repeated names with unique names
[2021-02-14 19:00:00,471]   ERROR An exception occurred
Traceback (most recent call last):
  File "/home/axela/.local/lib/python3.6/site-packages/kb_python/main.py", line 785, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/home/axela/.local/lib/python3.6/site-packages/kb_python/main.py", line 159, in parse_ref
    temp_dir=temp_dir
  File "/home/axela/.local/lib/python3.6/site-packages/kb_python/ref.py", line 439, in ref
    cdna_fasta_path, index_path, k=k or 31
  File "/home/axela/.local/lib/python3.6/site-packages/kb_python/ref.py", line 212, in kallisto_index
    run_executable(command)
  File "/home/axela/.local/lib/python3.6/site-packages/kb_python/dry/__init__.py", line 24, in inner
    return func(*args, **kwargs)
  File "/home/axela/.local/lib/python3.6/site-packages/kb_python/utils.py", line 232, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/axela/.local/lib/python3.6/site-packages/kb_python/bins/linux/kallisto/kallisto index -i transcriptome_axolotl.idx -k 31 cdna_axolotl.fa' returned non-zero exit status 1.
[2021-02-14 19:00:00,480]   DEBUG Removing tmp directory

I understand that the argument --make-unique is passed to kallisto index, but I find that kb ref does not recognise this argument. I understand that I could run kallisto index directly, but I am unsure how to then generate the transcript-to-gene mapping file, i.e. t2g_axolotl.txt. Any advice on how to overcome this issue would be greatly appreciated. Thank you!

Best wishes,
Axel.

Support for CELSeq2 384 barcode format

Hi,

Thank you for developing these tools. I find it very useful.
We have CELSeq2 scRNA datasets with 384 barcode combination. It is slightly different from the usual CELSeq2 configuration

The R1 contains 7bp UMI +7 bp barcode instead of the usual 6 +6 combination. Is it possible to add support for this kind of dataset as well?

Thanks in advance.

5' 10x scRNA-seq

Good afternoon! I am very interested in trying kallistobustools, and really appreciate the in depth tutorials. I would like to use kallistobustools on 5' scRNAseq data generated with 10x to get unspliced and spliced reads. Is this supported? If so, do you have recommendations for settings to use to adapt kallisto to 5' data.

Thank you!!

Jennifer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.