Code Monkey home page Code Monkey logo

hts's Introduction

bíogo

bíogo

GoDoc Build Status

Installation

    $ go get github.com/biogo/biogo/...

Overview

bíogo is a bioinformatics library for the Go language.

Getting help

Help or similar requests are preferred on the biogo-user Google Group.

https://groups.google.com/forum/#!forum/biogo-user

Contributing

If you find any bugs, feel free to file an issue on the github issue tracker. Pull requests are welcome, though if they involve changes to API or addition of features, please first open a discussion at the biogo-dev Google Group.

https://groups.google.com/forum/#!forum/biogo-dev

Citing

If you use bíogo, please cite Kortschak, Snyder, Maragkakis and Adelson "bíogo: a simple high-performance bioinformatics toolkit for the Go language", doi:10.21105/joss.00167, and Kortschak and Adelson "bíogo: a simple high-performance bioinformatics toolkit for the Go language", doi:10.1101/005033.

The Purpose of bíogo

bíogo stems from the need to address the size and structure of modern genomic and metagenomic data sets. These properties enforce requirements on the libraries and languages used for analysis:

  • speed - size of data sets
  • concurrency - problems often embarrassingly parallelisable

In addition to the computational burden of massive data set sizes in modern genomics there is an increasing need for complex pipelines to resolve questions in tightening problem space and also a developing need to be able to develop new algorithms to allow novel approaches to interesting questions. These issues suggest the need for a simplicity in syntax to facilitate:

  • ease of coding
  • checking for correctness in development and particularly in peer review

Related to the second issue is the reluctance of some researchers to release code because of quality concerns.

The issue of code release is the first of the principles formalised in the Science Code Manifesto.

Code  All source code written specifically to process data for a published
      paper must be available to the reviewers and readers of the paper.

A language with a simple, yet expressive, syntax should facilitate development of higher quality code and thus help reduce this barrier to research code release.

Articles

bíogo: a simple high-performance bioinformatics toolkit for the Go language

Analysis of Illumina sequencing data using bíogo

Using and extending types in bíogo

Yet Another Bioinformatics Library

It seems that nearly every language has it own bioinformatics library, some of which are very mature, for example BioPerl and BioPython. Why add another one?

The different libraries excel in different fields, acting as scripting glue for applications in a pipeline (much of [1, 2, 3]) and interacting with external hosts [1, 2, 4, 5], wrapping lower level high performance languages with more user friendly syntax [1, 2, 3, 4] or providing bioinformatics functions for high performance languages [5, 6].

The intended niche for bíogo lies somewhere between the scripting libraries and high performance language libraries in being easy to use for both small and large projects while having reasonable performance with computationally intensive tasks.

The intent is to reduce the level of investment required to develop new research software for computationally intensive tasks.

  1. BioPerl http://genome.cshlp.org/content/12/10/1611.full http://www.springerlink.com/content/pp72033m171568p2

  2. BioPython http://bioinformatics.oxfordjournals.org/content/25/11/1422

  3. BioRuby http://bioinformatics.oxfordjournals.org/content/26/20/2617

  4. PyCogent http://genomebiology.com/2007/8/8/R171

  5. BioJava http://bioinformatics.oxfordjournals.org/content/24/18/2096

  6. SeqAn http://www.biomedcentral.com/1471-2105/9/11

Library Structure and Coding Style

The bíogo library structure is influenced both by the Go core library.

The coding style should be aligned with normal Go idioms as represented in the Go core libraries.

Quality Scores

Quality scores are supported for all sequence types, including protein. Phred and Solexa scoring systems are able to be read from files, however internal representation of quality scores is with Phred, so there will be precision loss in conversion. A Solexa quality score type is provided for use where this will be a problem.

Copyright and License

Copyright ©2011-2013 The bíogo Authors except where otherwise noted. All rights reserved. Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.

The bíogo logo is derived from Bitstream Charter, Copyright ©1989-1992 Bitstream Inc., Cambridge, MA.

BITSTREAM CHARTER is a registered trademark of Bitstream Inc.

hts's People

Contributors

bloveless avatar brentp avatar csw avatar egonelbre avatar kortschak avatar xuweixw avatar zhsj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hts's Issues

Bam reader error on date format (different format)

Ran into a similar error running smoove > goleft > biogo

panic: parsing time "2018-08-02 114421" as "2006-01-02T150405": cannot parse " 114421" as "T": line 8: @RG\tID:H333CDSXX.1\tLB:EXPT_ID:180351;PREP_ID:182579\tPL:ILLUMINA\tPU:H333CDSXX.1\tSM:sample064\tCN:IGM\tDT:2018-08-02 11:44:21"

It is a novaSeq, dragen aligned bam

bgzf/index: chunk reader may over-run chunk by one buffer length when end block offset is 0

See logged cases in #14 tests. It is not possible for the chunk reader to know that a read will result in a step to another BGZF block, so there is no way other than reading to progress the last chunk interval beyond the desired chunk end and trigger termination.

The two options for additional API are a method to return whether the next read will move to another block or to return the remaining number of bytes in the current block. The latter seems more sensible.

Zero length bam records have invalid bin values

I encountered a problem with biogo's bam output for records that have a zero length alignment, but a non-zero position and reference. This case can happen when the data has a read pair where one read is mapped, and the mate is unmapped. The unmapped mate will have the same position and reference as the mapped mate. In this case, the unmapped mate will have a zero length alignment.

When I output these records to a bam file using biogo's bam writer, and validate the output bam using "picard ValidateSamFile", I get errors like this:

ERROR: Record 105026, Read name FOOBAR, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned

"sambamba_v0.6.6 index -c" reports

sambamba-index: Bin in read with name 'FOOBAR' is set incorrectly (810 instead of expected 6485)

I traced this problem back to a difference between picard and biogo's handling of bin calculation. When picard (and I assume sambamba) encounters a read with 0 length alignment, it adds 1 to the end position before calling reg2bin(), thus considering the alignment to have length 1. I'm guessing this is a special case that was not mentioned in the SAM specification. See https://github.com/samtools/htsjdk/blob/c20e0edd361189c3f2bc3718b018dac2ec90530b/src/main/java/htsjdk/samtools/SAMRecord.java#L1554

Here's a diff I made to record.go that addresses the problem in my use case:

@@ -132,10 +132,18 @@ func (r *Record) Bin() int {
                return 4680 // reg2bin(-1, 0)
        }
        end := r.End()
+
+       // If the alignment length is zero (for example, if the read is
+       // unmapped), then increment end by 1 and treat the read as length
+       // 1 for binning purposes.
+       if end == r.Pos {
+               end++
+       }
+
        if !internal.IsValidIndexPos(r.Pos) || !internal.IsValidIndexPos(end) {
                return -1
        }
-       return int(internal.BinFor(r.Pos, r.End()))
+       return int(internal.BinFor(r.Pos, end))
 }

tabix misses intervals at start of file

With this setup:

echo $'chr1\t1\t100' | bgzip -c > t.bed.gz; tabix t.bed.gz

And the script below, tabix misses all intervals. Even if I have thousands of intervals at < 10KB position, it will return empty chunks. This change seems to fix:

diff --git a/internal/index.go b/internal/index.go
index 1287db0..baaf656 100644
--- a/internal/index.go
+++ b/internal/index.go
@@ -293,6 +293,7 @@ func OverlappingBinsFor(beg, end int) []uint32 {
        for _, r := range []struct {
                offset, shift uint32
        }{
+               {level0, level0Shift},
                {level1, level1Shift},
                {level2, level2Shift},
                {level3, level3Shift},
package main

import (
    "compress/gzip"
    "io/ioutil"
    "log"
    "os"

    "github.com/biogo/hts/bgzf"
    "github.com/biogo/hts/bgzf/index"
    "github.com/biogo/hts/tabix"
)

func check(err error) {
    if err != nil {
        panic(err)
    }
}

type location struct {
    chrom string
    start int
    end   int
}

func (s location) RefName() string {
    return s.chrom
}
func (s location) Start() int {
    return s.start
}
func (s location) End() int {
    return s.end
}

func main() {

    path := os.Args[1]

    fh, err := os.Open(path + ".tbi")
    check(err)

    gz, err := gzip.NewReader(fh)
    check(err)
    defer gz.Close()

    idx, err := tabix.ReadFrom(gz)
    check(err)

    b, err := os.Open(path)
    check(err)
    bgz, err := bgzf.NewReader(b, 2)

    check(err)

    chunks, err := idx.Chunks(location{"chr1", 1, 19999999})
    check(err)
    log.Println(chunks)

    cr, err := index.NewChunkReader(bgz, chunks)
    buf, _ := ioutil.ReadAll(cr)

    log.Println(len(buf))

}

bgzf: race?

This might not be a biogo/hts issue, but I'm trying to track down a race that only occurs when rd > 1 to bgzf.NewReader.

I'm thinking that in this part of the code, there could be a race if the user called Close() on the bgzf reader and the Close() on the reader backing the bgzf reader.

So the race I'm wondering about is if dec is pulled from bg.waiting at line 403, but before dec.nextBlockAt(next, nil) is called at 419, the underlying file handle is closed.

bgzf output not compatible with htslib

if I bgzip a bed file with biogo, it's not usable from tabix. Here's a test that demonstrates the problem:

func TestHeader(t *testing.T) {
    buf := new(bytes.Buffer)
    gz := NewWriter(buf, 1)
    gz.Comment = "comment"
    gz.Extra = []byte("extra")

    _, err := gz.Write([]byte("payload"))
    if err != nil {
        t.Fatal("error writing bgzf")
    }
    if err := gz.Close(); err != nil {
        t.Fatal("error closing bgzf")
    }

    b := buf.Bytes()[:16]
    expected := []byte(MagicBlock[:16])
    if !bytes.Equal(b, expected) {
        t.Fatalf("bgzf: incorrect header.\n got    %v\n wanted %v", b, expected)
    }
}

The output is:

--- FAIL: TestHeader (0.00s)
	bgzf_test.go:87: bgzf: incorrect header.
		 got    [31 139 8 20 0 0 0 0 0 0 11 0 66 67 2 0]
		 wanted [31 139 8 4 0 0 0 0 0 255 6 0 66 67 2 0]
FAIL

This is the line failing htslib checks: https://github.com/samtools/htslib/blob/6d927dfa3192492f3015b1bc9a0026f585e93acc/bgzf.c#L1494

same signature for Chunks() csi and tabix

having an external library support both csi and tabix would be simpler if they both had the same signature for Chunks() would you consider changing CSI to signature of

func (i *Index) Chunks(r Record) ([]bgzf.Chunk, error)

to match tabix?

sam: renaming references, read groups and programs

The work on the bam.Merger API has exposed some limitations of the Header API. The limitations are not harmful in the normal consume-a-bam-or-sam-file-to-read-some-data kind of use, but impact on the ability to mutate bam and sam files in an efficient way. This kind of mutation is potentially needed when merging bams.

Background

There are two broad cases for merging:

  1. re-merging a collection of sub-sorted bams to get an overall sort and
  2. joining together a collection of files from distinct origins.

In the first case the headers of the input files all match and so there is no work to do. In the second case references, read groups and programs may need to be added to the header, and since read groups and programs must have unique identifiers (ID field in both) collision must be handled - this may be either accepting that the read group or program is already recorded or that the identifiers must be de-colided. We cannot know which is correct in general, so we should leave this up to the user.

The way the code I have a present (not yet merged into the merger branch) is to blithely rename all read groups and programs to "<old-id>|<n>" where <old-id> is the previous ID and <n> is index into the list of headers that are merged. It is then up to the user to delete all the read groups/programs that are not needed, and accept that "<old-id>" is unrecoverable because we do not allow name changes in any sensible kind of way. This is horrible.

What I'm proposing here is that users should be able to change the names of read groups, programs and, while we are here, references. Currently there is no way to add this API to these types given the structure of the types representing these data; changing the name of one of these will break an invariant that is depended on. There are two possible approaches.

  1. Add a name changing API to Header in the form of RenameX(old, new string) error where X is {Reference|ReadGroup|Program} and non-nil errors are returned if old doesn't exist or new already does, or
  2. Add a name setting API to each type in the form of SetName(n string) error where a non-nil error is returned if a name n already exists.

The first option adds noise to the API and ties behaviour that seems like it should belong to the type to Header instead. The second option requires that the types know who owns them (this is allowed in the current invariants) by adding a *Header field to each type so that that the owner's seen maps can be updated.

@brentp Do you have a preference? I am leaning toward option 2.

cram support

we are starting to have a push to shift to CRAM over BAM for storage reasons. I know this will be a huge development effort, but perhaps we can start with a few discrete tasks. Here are some obvious ones:

  • implement golomb-rice en/decoding
  • implement itf8 en/decoding
  • Elias gamma en/decoding
  • sub exponential en/decoding

I could prototype some of those if we agree on an API and if there is interest in having cram.

bam: reader over allocates in some cases

Identified by go-fuzz with 8 slaves in a 20GB workstation. Running individual cases does not crash:

==> fuzzbam/crashers/890d061bfb86c19d89175f5eb38e4a94f7c3ae2d.quoted <==
	"BAM\x01\x00\x00\x00\x00\x00\x00\x10M6328125\x01" +
	"\xfd\xff\xff\xff\xff\xff\xff.\x00"

==> fuzzbam/crashers/8d367e53a22f146175b890f7e9e5daa36ee2c71c.quoted <==
	"BAM\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x12wline" +
	" in \x00\x00\x00it doesnut do" +
	"es not tch format516" +
	"171819\f"

==> fuzzbam/crashers/9d00139331a7d548cc5cf411deec26f3a82c9288.quoted <==
	"BAM\x01\x02\x00\xff\u007f\x00@\x0f"

==> fuzzbam/crashers/ca01e8adf89554a81d5fe98a65b075c7e2377689.quoted <==
	"BAM\x01\x00\x00\x00\x00\x00\x00\x10n5r27836!" +
	"14>\xae1\\n5r2781\x97Zgn5r2" +
	"7836!14;\xd7\xd2st\xc5\xef\x05\x12519r" +
	"5228366\x97Z\\n5r2783614" +
	";\xd7\xd2st\xc5\xef\x05\x12>\xae1\\n5r2781"

==> fuzzbam/crashers/d8a6eed420bd1b2eb5c5179989802d89784ec460.quoted <==
	"BAM\x01\xab\xeb\x82r27145149ab\x81d" +
	"efghijkl\x8fnopqrstuvwx" +
	"yz141516171819l"

==> fuzzbam/crashers/da5fb2fbea8a1ed21a85c4121c8284848658476d.quoted <==
	"BAM\x01\x00\x00\x00\x00\x00\x1a\x10M\x01\x00\x00\x00\x00\x00\x00A" +
	"M\x01\x00\x00[\x00\x06\x1aA\x00\x01\x00\x00\x00\x00\x00\x1a\x10M\x01" +
	"\x00\x00\x00\x00\x00\x00AM\x01\x00\x00\x00\x00\x06\x1a\x00M\x01\x00\x00" +
	"\x00\x00\x06\x1aA\x00\x01\x00\x00\x00\x00\x00\x1a\x10M\x01\x00\x00\x00\x00" +
	"\x00\x00AM\x01\x00\x00\x00\x00\x06\x1a\x00M\x01\x00\x00\x00\x00\x00\x00" +
	"\x00/\x01\x00\x00M\x00\x00\x1a\x10\x00\x01\x00\x00\x00\x00\x00\x00AM" +
	"\x01\x00\x00\x00\x00\x06\x1aAM\x01\x00\x00\x00\x00 \x00AM\x01\x00" +
	"\x00\x00\x00\x00\x1a\x10M\x01\x00\x00\x00\x00\x00\x00AM\x01\x00\x00\x00" +
	"\x00\x06\x1a\x10M\x01\x00\x00\x00\x00\x00\x00\x00/\x01\x00\x00\x00\x00\x06" +
	"\x1aAM\x01\x00\x00\x00\x00\x00\x1a\x10M\x01\x00\x00\x00\x00\x00\x00\x00" +
	"/\x01\x00\x00\x00\x00\x00\x1a\x10M\x01\x00\x00\x00\x00\x00\x00AM\x01" +
	"\x00\x00\x00\x00\x06\x1aAM\x01\x00\x00\x00\x00 \x00AM\x01\x00\x00" +
	"\x00\x00\x00\x1a\x10M\x01\x00\x00\x00\x00\x00\x00AM\x01\x00\x00\x00\x00" +
	"\x06\x1a\x10M\x01\x00\x00\x00\x00\x00\x00\x00/\x01\x00\x00\x00\x00\x06\x1a" +
	"AM\x01\x00\x00\x00\x00\x00\x1a\x10M\x01\x00\x00\x00\x00\x00\x00AM" +
	"\x01\x00\x00\x00\x01"

==> fuzzbam/crashers/e22770c25532dadc8a249e9c2b1ac10c05bb60e3.quoted <==
	"BAM\x01519r52283663\xb5n5r" +
	"27836!14>n5r24914M\xe8#" +
	"8518!14>\xae1n5r2781491" +
	"4M\xe8#85\xc9j\xffٷ185519r52" +
	"28366\xb5\xec\xd0ė\xaen5r27836!" +
	"14\xd7\xd2\xc5\xef\xe6\xf2n5r27814914"

==> fuzzbam/crashers/ee833561203f1f8bd4b3dd0af3b40f888f57b6a7.quoted <==
	"BAM\x01\x00\x00\x00\x00\x00\x1a\x10M\x01\x00\x00\x00\x00\x00\x00A" +
	"M\x01\x00\x00\x00\x00\x06\x1aAM\x01\x00\x00\x00\x00\x00\x1a\x10M\x01" +
	"\x00\x00\x00\x00\x00\x00AM\x01\x00\x00\x00\x00\x06\x1a\x10M\x01\x00\x00" +
	"\x00\x00\x00\x00\x00/\x01\x00\x00\x00\x00\x00\x1a\x10M\x01\x00\x00\x00\x00" +
	"\x00\x00AM\x01\x00\x00\x00\x00\x00\x1a\x10M\x01\x00\x00\x00\x00\x00\x00" +
	"AM\x01\x00\x00\x00\x00\x06\x1a\x10M\x01\x00\x00\x00\x00\x00\x00\x006" +
	"!w\xb5>+\xecZ314;\xd7\xd2s/\x00\x00\x00\xa3\x91" +
	"_t\xe5\xef\x05\x13\x04 \x85abefx47ABCD" +
	"EF\x85\xafB\x00\x01\x1911bd\xa3\xfd\f\n\x9b\xfb|d" +
	" of efghijklmnopqrst" +
	"uvwxyz7\x11-50016952667" +
	"\xc5\xef\x05\x12\xe6\x14uct \x80\x91\xbclėZn5r" +
	"27836\xbd\x87\xe0\xe3'\xe2!1;\xd7\xd2st\xc5\xef" +
	"\x05\x12\xe611112127272111344" +
	"4101n2\x00\x01\x00-9281330455" +
	"15283666\x01\xb5>%(ADTn5r7" +
	"!\xe6\xf2ct \x85>\x01r781911885"

Failure to read reference stats from index

I am trying to read reference stats from a BAM index using index.ReferenceStats but I get back a false status for all reference IDs (which I assume are integers from 0 to numRef-1).
The index is valid and samtools idxstats works on it. Also, I have managed to read out the number of unmapped reads.
Any ideas what I am doing wrong?

Thanks,
Botond

bam: consider adding Skip method

Consider adding a Skip method that reads to the next record without parsing data beyond what is necessary for this.

This allows fast record counting.

csi: malformed dummy bin header

On csi files created by htslib (with bcftools index or with tabix), I see this error.

I think there must be something wrong with the check: if bins[i].bin == statsDummyBin in csi_read.go that causes this (given enough data) to trigger when n != 2. I can't find in the spec or htslib code how you came up with that. Is there any other (currently missing) constraint for this check?

For most files, I do not have this problem, but for 1 large file I do.

This is 1 of 2 issues related to CSI that I've found. Opening the other issue presently.

Invalid .bam output using hts.bam.Writer

Hi all,
I have been using the biogo library, and it's really great for what I need, but when I produce some output bam files using hts.bam.Wrtier, I encounter some errors. When I run the output bam file through a checker, I get some errors regarding some INVALID_INDEXING_BIN and INVALID_INDEX_FILE_POINTER errors from picard.

If I then read in the same bam file through biogo's Reader using an index, some of the records do not come through.

I've attached some code that seems to reproduce some of the problems. What's interesting is that if I change refLen and readRange to be 10x smaller, I no longer see the errors.
main.go.txt

When I run the code and then check the bam output, these are the errors I get from picard and sambamba (you can ignore the platform errors):

ayip@ayip:~/fun/go/src/main$ go run main.go

ayip@ayip:~/fun/go/src/main$ picard ValidateSamFile I=testoutput.bam MAX_OUTPUT=5
[Thu Jul 27 11:25:03 PDT 2017] picard.sam.ValidateSamFile INPUT=testoutput.bam MAX_OUTPUT=5 MODE=VERBOSE IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Thu Jul 27 11:25:03 PDT 2017] Executing as ayip@ayip on Linux 4.10.0-27-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11; Picard version: 2.10.3-SNAPSHOT
ERROR: Read name name, The platform (PL) attribute (plat) + was not one of the valid values for read group
ERROR: Record 4194304, Read name r:536870657, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned
ERROR: Record 4194305, Read name r:536870913, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned
ERROR: Record 4194306, Read name r:536870913, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned
ERROR: Record 4194307, Read name r:536871169, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned
Maximum output of [5] errors reached.
[Thu Jul 27 11:25:13 PDT 2017] picard.sam.ValidateSamFile done. Elapsed time: 0.18 minutes.
Runtime.totalMemory()=959447040
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp

ayip@ayip:~/fun/go/src/main$ sambamba_v0.6.6 index -c testoutput.bam
sambamba-index: Bin in read with name 'r:536870657' is set incorrectly (65535 instead of expected 0)

Faster bgzf compression/decompression with libdeflate?

If the latest htslib is built on a system with libdeflate (https://github.com/ebiggers/libdeflate), .bam and .vcf.gz compression and decompression speed is roughly doubled over stock zlib, and substantially better than what you get with Intel and Cloudflare zlib as well. I've put together a simple cgo wrapper for libdeflate (see https://godoc.org/github.com/grailbio/base/compress/libdeflate), and confirmed that modifying hts/bgzf to use its functions over compress/gzip (or klauspost/compress/gzip) produces a similar speedup.

The catch is that the libdeflate package currently requires cgo, and it would take an unreasonable amount of work (from my perspective, anyway) to change this. I see that there's currently no cgo dependency anywhere in biogo, though it appears to have existed in the past. Does this disqualify the proposal, or is there a way to introduce a cgo dependency that you'd consider acceptable?

If the latter is true, I'll go ahead and create a pull request of the appropriate form.

hts/tabix: add name -> id func to api

ReferenceStats takes an int but nameMap is not exposed to the user. I guess it might be guaranteed that the index inf the Index.Names() slice is the id to send to ReferenceStats, but it would be nice to have a function like

func (i *Index) Id(name string) (id int, ok bool) {
    id, ok := i.nameMap[name]
}

so that the Id could then be sent to ReferenceStats

paper/examples/flagstat/README.md misleading?

(The following doesn't affect my review of the paper, because paper.md makes no performance claims, and strictly speaking flagstat/README.md doesn't say anything too unreasonable)

flagstat/README.md compares performance of biogo/hts with samtools, demonstrating that biogo becomes faster than samtools as more cores are used. However, it is misleading that you don't compare to samtools using multiple threads.

(Also, in my hands, 8 cores is not enough to make biogo faster than single-threaded samtools; the core scaling I see is very bad - could it be a bug?)

My own timings:

samtools 1.3.1-45-g30bc5e2 (using htslib 1.3.2-204-g255863e)

real 1m58.898s
user 1m56.319s
sys 0m2.576s

samtools flagstat --input-fmt-option nthreads=8

real 0m22.954s
user 2m40.638s
sys 0m14.881s

(5x faster for 8x more cores, 0.6 scaling)

biogo 1.0.1 (using go v 1.7.5) GOMAXPROCS=1

real 7m26.520s
user 7m22.924s
sys 0m4.904s

biogo GOMAXPROCS=8

real 4m2.774s
user 24m8.423s
sys 0m44.927s

(1.8x faster for 8x more cores, 0.2 scaling)
(half the speed of samtools using 7 more cores)

Feature request: Random access BAM reading by genomic positions

Hi,

I have a use case where I need to fetch all reads overlapping a given genomic interval in an indexed BAM (e.g., all reads on chr3 between position 123 and 456). It doesn't seem like there's an easy way to do this in the current code, unless I am missing something. I can probably contribute some code for this if it would be helpful. Thanks!

Bam reader error on date format

Hello.

I am trying to parse a bam file but I have an error when the lib parse the headers date.
Here is the error :

"parsing time "2017-05-10T21:02:29" as "2006-01-02T15:04:05-0700": cannot parse "" as "-0700"

I assume that my sequencer output is not the same as the lib available formats:
const ( iso8601Date = "2006-01-02" iso8601TimeDateZ = "2006-01-02T15:04:05Z" iso8601TimeDateN = "2006-01-02T15:04:05-0700" )

Would you add something that allows to not parse the date if the format is unkown or just reformat it with time.Parse in readGroupLine() from parse_header.go ?

Thanks.

header parsing error with CL.

reported in brentp/goleft#44

A user has a BAM header with:

@PG	ID:bwa	PN:bwa	VN:0.7.12-r1039	CL:/commun/data/packages/bwa/bwa-0.7.12/bwa mem -t 10 -M -H @CO	20180608.isidor.: Mapping de bams pour B Isidor/ CHU-Nantes. Les Bams viennent de  -R @RG\tID:18D0609\tLB:18D0609\tSM:18D0609\tPL:illumina\tCN:Nantes /commun/data/pubdb/broadinstitute.org/bundle/1.5/b37/index-bwa-0.7.12/human_g1k_v37.fasta /mnt/beegfs/lindenb/WORK/2018/20180607.ISIDOR.MEDEXOME/FASTQS/18D0609_S1_R1_001.fastq.gz /mnt/beegfs/lindenb/WORK/2018/20180607.ISIDOR.MEDEXOME/FASTQS/18D0609_S1_R2_001.fastq.gz

This causes header-parsing to return an error because of this check

because of the tabs in the CL. I don't see an obvious fix given that quoting doesn't appear to be required.

hts/internal: Index.Read does not update LastRecord

This prevents updating an index if appending new records that are in sorted order. This may not be what is wanted, but at the moment adding a new record will not fail because LastRecord is 0. Either set it to the maximum int value (prevent addition) or set it to the maximum record seen during the read (allow addition of valid records).

Possible issue with SAM record validation

The line below compares the length of the sequence stored in a sam.Record to record.Len(). However, Len() calculates the number of bases consumed on the reference and not on the query.

if cigarLen := r.Len(); cigarLen < 0 || (r.Seq.Length != 0 && r.Seq.Length != cigarLen) {

bam.index.Chunks() to take record

as with the change for tabix (d68ff49)

the bam.index.Chunks (

func (i *Index) Chunks(r *sam.Reference, beg, end int) ([]bgzf.Chunk, error) {
)
should take a Record interface so it's easy to query a bam with just an interval.

If you agree, I can open a PR

sam: header parsing does not conform the the SAM spec

This is the result of incorrectly reading the spec. The reading I made was that tags were a closed set. This is not the case, and so bam and sam will not read some valid BAM/SAM files in the wild.

The fix is not immediately obvious (there are three reasonable options - none good).

  1. add OtherTags map[Tag]string fields to the four types above and add values to those when the statically defined field does not exist. This is bad because there are two ways to get a tag and when a new SAM tag is promoted to predefined in the spec we cannot promote it to a static field without breaking backwards compatibility
  2. add Tags map[Tag]string fields to the four types above and add all values to those even when the statically defined field does exist. This is bad because there are two places to store a record value and so there needs to be a way to reconcile conflicts.
  3. move all tag record values to a map[Tag]string field in each type above and provide accessors for the required tags. This is bad because it breaks compatibility.

I think 3. is the least bad at this stage.

csi: missing chunks from CSI index

I've updated vcfanno to use CSI when available. I did some exhaustive testing and found that in very rare cases, there are missed annotations.
I'm sure this is due to logic in the Chunks() method. I have not been able to make a test-case, but I will make a PR that fixes this shortly.

I think it may be related to the logic change that you did for tabix (internal/index) here https://github.com/biogo/hts/pull/20/files

panic from slice out of bounds in bgzf.index

In func (r *ChunkReader) Read(p []byte) (int, error), I get:

panic: runtime error: slice bounds out of range

goroutine 1 [running]:
github.com/biogo/hts/bgzf/index.(*ChunkReader).Read(0xc208994270, 0xc20883d20c, 0xdf4, 0xdf4, 0x7f80307beaa8, 0x0, 0x0)
    /usr/local/src/gocode/src/github.com/biogo/hts/bgzf/index/index.go:64 +0x714
bufio.(*Reader).fill(0xc208e391a0)
    /usr/local/go/src/bufio/bufio.go:97 +0x1ce
bufio.(*Reader).ReadSlice(0xc208e391a0, 0xc20883d80a, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/bufio/bufio.go:295 +0x257
bufio.(*Reader).ReadBytes(0xc208e391a0, 0xc2085c780a, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/bufio/bufio.go:374 +0xd2
bufio.(*Reader).ReadString(0xc208e391a0, 0xa, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/bufio/bufio.go:414 +0x58
...

I printed out:

        log.Println(r.chunks[0].End.Block, r.r.LastChunk().End.Block, len(p))

in the offending if statement and see:

2015/09/06 07:41:48 64366 3148 3572

I'm hoping you see an obvious solution (maybe using vOffset() is required in the if?), otherwise, I'll try to make a small test-case.

iso8601

AFAICT, 2010-10-19T00:00:00.000+00:00 is a valid ios8601 date. but it fails to parse with master.

package main

import "github.com/biogo/hts/sam"

const hdr = `@HD	VN:1.0	SO:coordinate
@SQ	SN:1	LN:249250621	M5:1b22b98cdeb4a9304cb5d48026a85128	UR:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz	AS:NCBI37	SP:Human
@RG	ID:ERR184232	LB:5916154	SM:HG01311	PI:311	CN:SC	PL:ILLUMINA	DS:SRP001525
@RG	ID:DKFZ:100630_SN143_0256_A15006043_5	PL:ILLUMINA	CN:DKFZ	PI:353	DT:2010-10-19T00:00:00.000+00:00	LB:WGS:DKFZ:ICGC_BL12	SM:52c198b4-7bda-4f81-8101-a322787a10a6	PU:DKFZ:100630_SN143_0256_A15006043_5	PG:fastqtobam`

func main() {
	s, err := sam.NewHeader([]byte(hdr), nil)
	if err != nil {
		panic(err)
	}
	_ = s
}

sam: adding a reference cloned from a header fails with "duplicate reference name"

This test fails. It probably shouldn't.

func (s *S) TestAddClonedReference(c *check.C) {
	sr, err := NewReader(bytes.NewReader(specExamples.data))
	c.Assert(err, check.Equals, nil)
	h := sr.Header()
	ref := h.Refs()[0].Clone()
	c.Check(h.AddReference(ref), check.Equals, nil)
}

The failure is due to the comparison between the reference ID which is -1 in the cloned ref. I think we should special-case a -1 ID.

The fix is trivial:

diff --git a/sam/reference.go b/sam/reference.go
index c6a7117..875b356 100644
--- a/sam/reference.go
+++ b/sam/reference.go
@@ -270,7 +270,7 @@ func equalRefs(a, b *Reference) bool {
        if a == b {
                return true
        }
-       if a.id != b.id ||
+       if (a.id != -1 && b.id != -1 && a.id != b.id) ||
                a.name != b.name ||
                a.lRef != b.lRef ||
                a.md5 != b.md5 ||

The merge process panics when the bam is small

For a small size bam file in which the records number is less than the chunk size, the process panics. As the godoc example:
I first found one of the reasons is that the records length n is not updated when the first chunk is not filled up.

for {
    for it.Next() {
        recs = append(recs, it.Record())
        if len(recs) == cap(recs) {
            r, err := writeChunk(dir, h, recs)
            if err != nil {
                log.Panic(err)
            }
            t = append(t, r)
            n, recs = len(recs), recs[:0]     //when the records is less than the chunk size, n will not be updated here
        }
    }
    err = it.Error()
    if n == 0 || err != nil {    //break because n is not updated, and thus the reader slices t is still empty without any reader appended.
        break
    }
    if len(recs) != 0 {
        r, err := writeChunk(dir, h, recs)
        if err != nil {
            log.Panic(err)
        }
        t = append(t, r)
    }
}

I think that can be fixed by just simply changing it to "n++" for every record in the iterator. When I did this, the reader slice t is updated with one reader as expected. However it panics once again when it goes to:

m, err := bam.NewMerger(nil, t...)
if err != nil {
    log.Panicf("failed to created merger: %v", err)
}
sorted := sam.NewIterator(m)
for sorted.Next() {                              //here
    // Operate on coordinate sorted stream.
    fn(sorted.Record())
}

I don't know what happens here. I'm guessing if it's about something in the merger constructor when there is only one underlying reader.

BTW, that does not happen on big bam files in which case there are more than one sub-reader.

sam: add Omit method to Reader

I am processing a sam stream and most of the CPU time is spent in sam.ParseAux, which I don't need parsed.
But I do need to keep it unchanged to print out. It would be good if we had an Omit() for SAM that left the AuxFields unparsed so the record could still be printed as valid SAM.

Here's an example profile:

(pprof) list UnmarshalSAM
Total: 1.61mins
ROUTINE ======================== github.com/biogo/hts/sam.(*Record).UnmarshalSAM in /home/brentp/go/src/github.com/biogo/hts/sam/record.go
     3.16s     43.96s (flat, cum) 45.57% of Total
         .          .    204:// UnmarshalSAM parses a SAM format alignment line in the provided []byte, using
         .          .    205:// references from the provided Header. If a nil Header is passed to UnmarshalSAM
         .          .    206:// and the SAM data include non-empty refence and mate reference names, fake
         .          .    207:// references with zero length and an ID of -1 are created to hold the reference
         .          .    208:// names.
      50ms       50ms    209:func (r *Record) UnmarshalSAM(h *Header, b []byte) error {
      70ms     12.88s    210:	f := bytes.SplitN(b, []byte{'\t'}, 20)
      10ms       10ms    211:	if len(f) < 11 {
         .          .    212:		return errors.New("sam: missing SAM fields")
         .          .    213:	}
     130ms      880ms    214:	*r = Record{Name: string(f[0])}
         .          .    215:	// TODO(kortschak): Consider parsing string format flags.
      80ms      940ms    216:	flags, err := strconv.ParseUint(string(f[1]), 0, 16)
      30ms       30ms    217:	if err != nil {
         .          .    218:		return fmt.Errorf("sam: failed to parse flags: %v", err)
         .          .    219:	}
      20ms       20ms    220:	r.Flags = Flags(flags)
     100ms      1.15s    221:	r.Ref, err = referenceForName(h, string(f[2]))
         .          .    222:	if err != nil {
         .          .    223:		return fmt.Errorf("sam: failed to assign reference: %v", err)
         .          .    224:	}
      70ms      840ms    225:	r.Pos, err = strconv.Atoi(string(f[3]))
      30ms       30ms    226:	r.Pos--
         .          .    227:	if err != nil {
         .          .    228:		return fmt.Errorf("sam: failed to parse position: %v", err)
         .          .    229:	}
      60ms      410ms    230:	mapQ, err := strconv.ParseUint(string(f[4]), 10, 8)
         .          .    231:	if err != nil {
         .          .    232:		return fmt.Errorf("sam: failed to parse map quality: %v", err)
         .          .    233:	}
         .          .    234:	r.MapQ = byte(mapQ)
      50ms      1.06s    235:	r.Cigar, err = ParseCigar(f[5])
         .          .    236:	if err != nil {
         .          .    237:		return fmt.Errorf("sam: failed to parse cigar string: %v", err)
         .          .    238:	}
     180ms      400ms    239:	if bytes.Equal(f[2], f[6]) || bytes.Equal(f[6], []byte{'='}) {
      10ms       10ms    240:		r.MateRef = r.Ref
         .          .    241:	} else {
         .       60ms    242:		r.MateRef, err = referenceForName(h, string(f[6]))
         .          .    243:		if err != nil {
         .          .    244:			return fmt.Errorf("sam: failed to assign mate reference: %v", err)
         .          .    245:		}
         .          .    246:	}
      60ms      590ms    247:	r.MatePos, err = strconv.Atoi(string(f[7]))
         .          .    248:	r.MatePos--
      10ms       10ms    249:	if err != nil {
         .          .    250:		return fmt.Errorf("sam: failed to parse mate position: %v", err)
         .          .    251:	}
      70ms      560ms    252:	r.TempLen, err = strconv.Atoi(string(f[8]))
      30ms       30ms    253:	if err != nil {
         .          .    254:		return fmt.Errorf("sam: failed to parse template length: %v", err)
         .          .    255:	}
         .          .    256:	if !bytes.Equal(f[9], []byte{'*'}) {
      40ms      2.67s    257:		r.Seq = NewSeq(f[9])
      50ms      210ms    258:		if !(len(r.Cigar) == 0 || r.Cigar.IsValid(r.Seq.Length)) {
         .          .    259:			return errors.New("sam: sequence/CIGAR length mismatch")
         .          .    260:		}
         .          .    261:	}
      30ms       60ms    262:	if !bytes.Equal(f[10], []byte{'*'}) {
     160ms      850ms    263:		r.Qual = append(r.Qual, f[10]...)
     210ms      210ms    264:		for i := range r.Qual {
     830ms      830ms    265:			r.Qual[i] -= 33
         .          .    266:		}
         .          .    267:	} else if r.Seq.Length != 0 {
         .          .    268:		r.Qual = make([]byte, r.Seq.Length)
         .          .    269:		for i := range r.Qual {
         .          .    270:			r.Qual[i] = 0xff
         .          .    271:		}
         .          .    272:	}
      40ms       40ms    273:	if len(r.Qual) != 0 && len(r.Qual) != r.Seq.Length {
         .          .    274:		return errors.New("sam: sequence/quality length mismatch")
         .          .    275:	}
     150ms      150ms    276:	for _, aux := range f[11:] {
      80ms        14s    277:		a, err := ParseAux(aux)
      20ms       20ms    278:		if err != nil {
         .          .    279:			return err
         .          .    280:		}
     480ms      4.95s    281:		r.AuxFields = append(r.AuxFields, a)
         .          .    282:	}
      10ms       10ms    283:	return nil
         .          .    284:}

Possible bug in internal/index Chunks()

I have what I believe is a valid indexed BAM file but I am getting "index: invalid interval" errors when trying to read it. I tracked the problem down to line 169 of internal/index.go. I believe I am querying a reference that happens to have no reads aligned to it in the BAM, so ref.Intervals is empty. Therefore no matter what begin position is passed to Chunks(), there will be an "invalid interval" error. It seems to me a better thing to do in this case would be to return an empty slice of chunks with no error. Otherwise, the behavior should at least be documented so the caller knows this condition must be checked for (it's not intuitive).

Thanks for taking a look!

bgzf tests failing on go version go1.8beta1 linux/amd64

I can also get an error to occur in 1.7.3 on the same test if I add a Flush()
That may be a different error.

diff --git a/bgzf/bgzf_test.go b/bgzf/bgzf_test.go
index 683d69e..7543460 100644
--- a/bgzf/bgzf_test.go
+++ b/bgzf/bgzf_test.go
@@ -155,6 +155,7 @@ func TestRoundTrip(t *testing.T) {
        if _, err := w.Write([]byte("payload")); err != nil {
                t.Fatalf("Write: %v", err)
        }
+       fmt.Println(w.Flush())
        if err := w.Close(); err != nil {
                t.Fatalf("Writer.Close: %v", err)
        }

Here is the error for 1.8beta:

$ go test
--- FAIL: TestRoundTrip (0.00s)
bgzf_test.go:186: comment is "", want "comment"
bgzf_test.go:195: mtime is -62135596800, want 100000000
bgzf_test.go:198: name is "", want "name"
FAIL
FAIL github.com/biogo/hts/bgzf 0.137s

double read of end region for some cases in bgz.index

In bix I was seeing some of the intervals I was querying appear 2x. I think the logic error in bgzf.index.ChunkReader.Read() is apparent. This change fixes the problem in my test-cases:

diff --git a/bgzf/index/index.go b/bgzf/index/index.go
index 12ae501..9d2ddc0 100644
--- a/bgzf/index/index.go
+++ b/bgzf/index/index.go
@@ -70,8 +70,10 @@ func (r *ChunkReader) Read(p []byte) (int, error) {
                return n, err
        }
        if len(r.chunks) != 0 && vOffset(r.r.LastChunk().End) >= vOffset(r.chunks[0].End) {
-               err = r.r.Seek(r.chunks[0].Begin)
                r.chunks = r.chunks[1:]
+               if len(r.chunks) != 0 {
+                       err = r.r.Seek(r.chunks[0].Begin)
+               }
        }
        return n, err
 }

I'm not sure if the record.End() returns thre right position.

for example, a record from a sam file:
ST-E00205:338:HJMLJALXX:3:1116:29000:35942 161 6 97894792 60 150M = 97894892 250 TGAACTACTCTTCACTTGAGAACTAGAATTTCATTATTTCTTCTTTATTCTCAGTCTTCATTAGTTTTTATCTTTTCAGCAACATTTGAAACAACAGAACACACCCTCCCTCCTTTTTGAAATATTGTCTTTATTTTTAGGACACCACTG AAAFFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJJJJJJJJJJJJJJJJJJJFJJAFJFJJJJAAAJJJJJJJJ7-<FFJ<JFJFJJ7JJJJJJA<AFFJJ<FFJFFJJFFJJFFFFAFJF<AFJ NM:i:0 MD:Z:150 AS:i:150 XS:i:0 RG:Z:bwa

some output by using the hts package:
Rec Start from record.Start(): 97894791
Rec End from record.End(): 97894941 //here, 97894940 or 97894941?
Rec Len from record.Len(): 150

The sequencing strategy is paired-end 150. Thus, I just think the end position of record should be 97894940, then the record length is 97894940-97894791+1=150. Otherwise, if the end position was 97894941, the length would be 97894941-97894791+1=151.

Might I be right?

bam: add Merge function

This package is absolutely a great work of Go for NGS data processing. I'm loving to use it.
Is there an API for sorting bam files, similar to "samtools sort"?

README.md improvements

Hi, in reviewing for openjournals/joss-reviews#168, I find that the "statement of need" requirement is not met by README.md. I think it would be satisfied if you just copied over the more comprehensive "Summary" paragraph from paper.md.

README.md also lacks example usage. I would be great to see a little trivial case example of using the API to do a basic operation like range through a (particular part of a) bam file and output to sam. Anything to get a friendly starting point in to using the API and exploring the methods further from there.

(The go docs for each package could also benefit from a friendly intro in a package comment, providing a starting point or method in to making use of that package.)

README.md may also be an appropriate place to include a brief sentence to satisfy the "Community guidelines" requirement of the joss review.

sam: invalid aux tag field: sam: unknown type float64

Hi, I encountered an error message when reading sam files generated by NextGenMap:
sam: invalid aux tag field: sam: unknown type float64

I think the reason is the following code in line 176 in auxtags.go:
f, err := strconv.ParseFloat(string(tf[2]), 32)

ParseFloat will return f as float64 no matter using bit size 64 or 32, as explained in the strconv document. So the following type checking in the NewAux function will always failed.

tabix query missing bytes.

apologies for the case, but it is reproducible. We are using a normalized version of ExAC for annotation and I ran trhough and intersected every variant with itself to make sure to get a hit. There is a single failure.

The file is here: http://s3.amazonaws.com/gemini-annotations/ExAC.r0.3.sites.vep.tidy.vcf.gz (and .tbi)

and the code is below. Run with:

go run main.go ExAC.r0.3.sites.vep.tidy.vcf.gz

It is querying for the location: location{"X", 229379 - 2, 229382}
And the output includes the variant of interest (X:229379), but it is appended to the end another line (X:229380)

Here is the output from the htslib tabx:

$ tabix  /usr/local/src/gemini_install/data/gemini/data/ExAC.r0.3.sites.vep.tidy.vcf.gz X:229378-229380 | cut -f 1-5
X   229275  .   CACACGGCGACCATGGGAACCCCCTCTCCTGGGCACGTGCTCACCGCAGCTGTCGTACGGCACCACTGAGACGACAGGGACCCCCTGCCCTCCCCCGGGCGAGTCCTCACCGGTG C
X   229379  .   CCTCACCGGTGACACGGAGACCGCGGAAGGCCCCTCCCCTGGGCGCGTG   C
X   229380  .   C   G
package main

import (
    "bufio"
    "compress/gzip"
    "fmt"
    "io"
    "io/ioutil"
    "os"
    "strings"

    "github.com/biogo/hts/bgzf"
    "github.com/biogo/hts/bgzf/index"
    "github.com/biogo/hts/tabix"
)

func check(err error) {
    if err != nil {
        panic(err)
    }
}

type location struct {
    chrom string
    start int
    end   int
}

func (s location) RefName() string {
    return s.chrom
}
func (s location) Start() int {
    return s.start
}
func (s location) End() int {
    return s.end
}

func main() {

    path := os.Args[1]

    fh, err := os.Open(path + ".tbi")
    check(err)

    gz, err := gzip.NewReader(fh)
    check(err)
    defer gz.Close()

    idx, err := tabix.ReadTabix(gz)
    check(err)

    b, err := os.Open(path)
    check(err)
    bgz, err := bgzf.NewReader(b, 1)
    //  bgz.Blocked = false

    check(err)

    chunks, err := idx.Chunks(location{"X", 229379 - 1, 229380})
    check(err)

    cr, err := index.NewChunkReader(bgz, chunks)
    check(err)
    br := bufio.NewReaderSize(cr, 32768/8)
    var j int
    for {
        line, err := br.ReadString('\n')
        if err == io.EOF {
            break
        }
        if strings.Contains(line, "\t229379\t") {
            fmt.Println(line + "\n")
        }

        j += 1
        _ = line
    }

    cr, err = index.NewChunkReader(bgz, chunks)
    buf, _ := ioutil.ReadAll(cr)

    for _, line := range strings.Split(string(buf), "\n") {
        if strings.Contains(line, "\t229379\t") {
            fmt.Println("read:" + line + "\n")
        }
    }
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.