Code Monkey home page Code Monkey logo

long-read-pipelines's Introduction

Generic badge CI/CD Nightly

Long read pipelines

This repository contains pipelines for processing of long read data from PacBio and/or Oxford Nanopore platforms. The pipelines are written in WDL 1.0 intended for use with Google Cloud Platform via the scientific workflow engine, Cromwell. Processing is designed to be reasonably consistent between both long read platforms, and use platform-specific options or tasks where necessary.

High level workflows can be found in the wdl/ directory.

Documentation: Documentation for each workflow can be found at the repository site.

External Contributors: Please see the Contributing Guidelines for information on how to contribute to the repository.

long-read-pipelines's People

Contributors

bshifaw avatar ericsong avatar eviewan avatar jonn-smith avatar kvg avatar shuang-broad avatar tedsharpe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

long-read-pipelines's Issues

Dealing with strange (and sometimes wrong) CIGARs in alignments

The strange CIGAR could be something like this in MM2.

Or it could something complained by SortSam from Picard

Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: WARNING: Read name m64020_190419_185501/103416136/ccs, No M or N operator between pair of D operators in CIGAR

when sorting the results of bamout from GATK-HC on a CCS corrected BAM (the source could be MM2).

Generate graph visualization of the pipelines

As we clean up and document the WDLs, it'll be good to have a graph/DAG visualization of workflows.

This helps

  • new users understand the status of the pipeline, and
  • developers maintain the repo as clean as possible (e.g. there are pipelines that we may not need anymore, or repeated tasks)

How to handle sex chromosomes with GATK

The PR that will be bringing in GATK has a limitation that unless one carefully modifies the scattering scheme, the pipeline will fail because when sex chromosomes and autosomes are mixed in one scatter, the logic trying to detect ploidy will bark.

Currently, we can get away with separating runs that limit the input intervals to the workflow (i.e. separate runs for autosomes and sex chromosomes).

Reproducible test case for the Intel codec bug

Right now we're using the JDK codec for bgzip because of an apparent intermittent bug in the Intel codec. Because this comes at a huge speed penalty, we write temporary files out in uncompressed form. Producing a reproducible test for the Intel codec bug would hopefully allow Intel to fix the issue, which would in turn allow us to make this pipeline leaner and faster.

Auto-evaluation of resource usage of our WDLs

The resource monitoring script gets run for all of our tasks, but I have yet to analyze the data. Let's write an R script that can examine resource usage per task over time, along with that task's requested resources, and can aggregate data from multiple invocations of that task across workflows. That way we can know where to scale back our resource requests and optimize the pipeline.

Remove ProcessReads

With the recent changes, we now have some redundant functionality across the codebase. For example, AlignReads, PBUtils, ONTUtils, and Utils now substantially overlap with their progenitor WDL, ProcessReads. We should remove ProcessReads and update anything that depended on it.

Investigate if we should `tar` some of the result folders when using cromwell

Relying on cromwell for delocalizing a folder gives you names like
glob-4474304f3e3392228593ba19c5cd74e8,
glob-de84a38a722f40d0a73061d4179b1788
which takes an unnecessary effort to decode, especially when the two folders hold contents in highly similar structures.

tar-only avoids the compression time, hence maybe a good compromise between easy-of-comprehension and speed.

Set up script(s) for processing resource monitoring log file

E.g. bare minimum:

grep -F 'Memory usage:' resources.log | \
    grep -Eo "[0-9\.]+ GiB" | \
    sed 's/GiB//' | \
    Rscript -e 'summary (as.numeric (readLines ("stdin")))'

grep -F 'CPU usage:' resources.log | \
    awk -F ':' '{print $2}' | \
    sed 's/ //g' | sed 's/%//' | \
    Rscript -e 'summary (as.numeric (readLines ("stdin")))'

Check upfront in the PacBio CCS and CLR pipelines whether a PacBio flowcell actually represents CCS or CLR data

The metadata.xml file that accompanies PacBio data is unreliable with regards to whether a flowcell represents CCS or CLR data. However, it's pretty easy to look at the first few hundred reads of a subread file and make this determination by counting the frequency of ZMW numbers. Here's a quick example:

Likely CCS data:
$ gsutil cat gs://broad-gp-pacbio/r64020_20190507_173946/4_D01/**.subreads.bam | samtools view | awk -F"/" '{ print $2 }' | uniq -c | head
14 0
2 1
15 2
22 5
2 7
20 8
15 9
2 12
5 13
2 15

Likely CLR data:
$ gsutil cat gs://broad-gp-pacbio/r64020_20190507_173946/1_A01/**.subreads.bam | samtools view | awk -F"/" '{ print $2 }' | uniq -c | head
1 4
1 6
1 7
1 12
1 15
1 16
1 20
1 25
1 30
1 31

TODO: Write a tool that runs very early in the PBCCS and PBCLR workflows that checks, say, the first 10000 reads and makes a determination whether a run is actually appropriate for that workflow to run. Throw an error if not.

Official Sniffles docker seems to have issues with Cromwell

See here.
The sever is Cromwell version 36, though, so updating to a newer 39 should fix this issue as claimed in the above ticket.
And the following message, though the (sub) workflow is marked as "Done".

find: unrecognized: -empty
xargs: invalid option -- 'I'
BusyBox v1.22.1 (2014-05-23 01:24:27 UTC) multi-call binary.

Usage: find [-HL] [PATH]... [OPTIONS] [ACTIONS]

Search for files and perform actions on them.
First failed action stops processing of current file.
Defaults: PATH is current directory, action is '-print'

	-L,-follow	Follow symlinks
	-H		...on command line only
	-xdev		Don't descend directories on other filesystems
	-maxdepth N	Descend at most N levels. -maxdepth 0 applies
			actions to command line arguments only
	-mindepth N	Don't act on first N levels
	-depth		Act on directory *after* traversing it

Actions:
	( ACTIONS )	Group actions for -o / -a
	! ACT		Invert ACT's success/failure
	ACT1 [-a] ACT2	If ACT1 fails, stop, else do ACT2
	ACT1 -o ACT2	If ACT1 succeeds, stop, else do ACT2
			Note: -a has higher priority than -o
	-name PATTERN	Match file name (w/o directory name) to PATTERN
	-iname PATTERN	Case insensitive -name
	-path PATTERN	Match path to PATTERN
	-ipath PATTERN	Case insensitive -path
	-regex PATTERN	Match path to regex PATTERN
	-type X		File type is X (one of: f,d,l,b,c,...)
	-perm MASK	At least one mask bit (+MASK), all bits (-MASK),
			or exactly MASK bits are set in file's mode
	-mtime DAYS	mtime is greater than (+N), less than (-N),
			or exactly N days in the past
	-mmin MINS	mtime is greater than (+N), less than (-N),
			or exactly N minutes in the past
	-newer FILE	mtime is more recent than FILE's
	-user NAME/ID	File is owned by given user
	-group NAME/ID	File is owned by given group
	-size N[bck]	File size is N (c:bytes,k:kbytes,b:512 bytes(def.))
			+/-N: file size is bigger/smaller than N
	-prune		If current file is directory, don't descend into it
If none of the following actions is specified, -print is assumed
	-print		Print file name
	-print0		Print file name, NUL terminated
	-exec CMD ARG ;	Run CMD with all instances of {} replaced by
			file name. Fails if CMD exits with nonzero

BusyBox v1.22.1 (2014-05-23 01:24:27 UTC) multi-call binary.

Usage: xargs [OPTIONS] [PROG ARGS]

Run PROG on every item given by stdin

	-r	Don't run command if input is empty
	-0	Input is separated by NUL characters
	-t	Print the command on stderr before execution
	-e[STR]	STR stops input processing
	-n N	Pass no more than N args to PROG
	-s N	Pass command line of no more than N bytes
	-x	Exit if size is exceeded

Two workflows in invalidate state

womtool validate tells me these two are invalid.

PB10xSingleFlowcell.wdl
Failed to import 'tasks/HiFi.wdl' (reason 1 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'relative to directory [...]/wdl (escaping allowed)' (reason 1 of 1): File not found: tasks/HiFi.wdl
Failed to import 'tasks/HiFi.wdl' (reason 2 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'entire local filesystem (relative to '/')' (reason 1 of 1): File not found: tasks/HiFi.wdl
Failed to import 'tasks/HiFi.wdl' (reason 3 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'relative to directory [...]/wdl (escaping allowed)' (reason 1 of 1): File not found: tasks/HiFi.wdl
Failed to import 'tasks/HiFi.wdl' (reason 4 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'http importer (no 'relative-to' origin)' (reason 1 of 1): Relative path
PB10xSingleProcessedSample.wdl
Failed to import 'tasks/HiFi.wdl' (reason 1 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'relative to directory [...]/wdl (escaping allowed)' (reason 1 of 1): File not found: tasks/HiFi.wdl
Failed to import 'tasks/HiFi.wdl' (reason 2 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'entire local filesystem (relative to '/')' (reason 1 of 1): File not found: tasks/HiFi.wdl
Failed to import 'tasks/HiFi.wdl' (reason 3 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'relative to directory [...]/wdl (escaping allowed)' (reason 1 of 1): File not found: tasks/HiFi.wdl
Failed to import 'tasks/HiFi.wdl' (reason 4 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'http importer (no 'relative-to' origin)' (reason 1 of 1): Relative path

Task for routine trio-assembly quality assessment

Now that we showed trio-assembly costs can be lowered to sub $10, it makes sense to have a (sub-) pipeline for quality assessment on routine trio-assembly.

Currently, I'm experimenting with BUSCO and U50.
Other suggestions/ideas welcome.

DeepVariant optimization

We need two optimizations for DV:

  • separate out the three steps make_example, call_variants and post process; make_example is CPU intensive and takes ~50 CPU hours for our EAP data now, call_variants can be hugely improved with GPU and takes approximately ~1-2 hours (and is relatively more expensive on the per hour basis), post processing step is a mere one hour.
  • need an AVX512F-optimized DV docker to shorten make_example, but this is more easily done by the DV team.

All together, this could bring the cost down per WGS to ~$3, and ultimately we can bring it down to below $1.

Longshot taking a long time

Currently the scattering scheme is per chromosome, which leads to jobs running 10 hours or longer.

We should be using Picards' IntervalListTools for more but shorter intervals.

Inventory codebase to identify and remove redundant functionality

We now have a few WDLs that have some tasks that are slightly or entirely redundant with other WDLs. We should inventory the codebase to see where we have multiple tasks that do nearly the same thing, and then make a plan to consolidate those tasks into one canonical task.

Change to dockerhub

Right now the pipeline Docker images all exist in our personal Docker sites. Let's move these to Dockerhub.

Pipeline regression testing

How are we going to verify that this pipeline continues to function the way we expect as we add things to it over time?

Keeping track of which dockers we use

So that we know which to cleanup, and what the dependency chain is.
I'm not sure where to keep this documentation, so I just created this ticket.


3rd party

DOCKER TAG WDL
quay.io/biocontainers/mosdepth 0.2.4--he527e40_0 AlignedMetrics.wdl
quay.io/biocontainers/nanoplot 1.28.0--py_0 NanoPlot.wdl
gcr.io/deepvariant-docker/deepvariant 0.8.0-gpu DeepVariantLR.wdl
us.gcr.io/broad-gatk/gatk latest GATKBestPractice.wdl
us.gcr.io/broad-gotc-prod/genomes-in-the-cloud 2.4.1-1540490856 Utils.wdl

we have control over, need time to migrate

DOCKER TAG WDL
quay.io/broad-long-read-pipelines/canu v1.9_wdl_patch_varibale_k AssignChildLongReads.wdl
quay.io/broad-long-read-pipelines/canu v1.9_wdl_patch_varibale_k CollectParentsKmerStats.wdl
us.gcr.io/broad-dsde-methods/samtools-cloud v1.clean GATKBestPractice.wdl

we have control over, need to get versions right

DOCKER TAG WDL
us.gcr.io/broad-dsp-lrma/lr-10x 0.1.9 AnnotateAdapters.wdl, ONT10xSingleFlowcell.wdl
us.gcr.io/broad-dsp-lrma/lr-align 0.1.26 AlignReads.wdl, PhaseReads.wdl, Utils.wdl, PB10xSingleProcessedSample.wdl, TestCromwell.wdl
us.gcr.io/broad-dsp-lrma/lr-asm 0.1.12 AssembleTarget.wdl
us.gcr.io/broad-dsp-lrma/lr-c3poa 0.1.4 C3POa.wdl
us.gcr.io/broad-dsp-lrma/lr-canu 0.1.0 Canu.wdl
us.gcr.io/broad-dsp-lrma/lr-cloud-downloader 0.2.1 DownloadFromSRA.wdl
us.gcr.io/broad-dsp-lrma/lr-finalize 0.1.2 Finalize.wdl
us.gcr.io/broad-dsp-lrma/lr-gatk 0.1.1 GATKBestPractice.wdl
us.gcr.io/broad-dsp-lrma/lr-guppy 4.0.14 Guppy.wdl
us.gcr.io/broad-dsp-lrma/lr-longshot 0.1.1 CallSmallVariants.wdl
us.gcr.io/broad-dsp-lrma/lr-medaka 0.1.0 Medaka.wdl
us.gcr.io/broad-dsp-lrma/lr-metrics 0.1.8 AlignedMetrics.wdl, UnalignedMetrics.wdl, Utils.wdl
us.gcr.io/broad-dsp-lrma/lr-nanopolish 0.3.0 Nanopolish.wdl
us.gcr.io/broad-dsp-lrma/lr-pb 0.1.5 PBUtils.wdl
us.gcr.io/broad-dsp-lrma/lr-peregrine 0.1.6 Peregrine.wdl
us.gcr.io/broad-dsp-lrma/lr-quast 0.1.0 Quast.wdl
us.gcr.io/broad-dsp-lrma/lr-racon 0.1.0 Racon.wdl
us.gcr.io/broad-dsp-lrma/lr-sv 0.1.2 CallSVs.wdl
us.gcr.io/broad-dsp-lrma/lr-utils 0.1.6 Guppy.wdl, ONTUtils.wdl, PBUtils.wdl, Utils.wdl

Two workflows fail to validate

PB10xSingleFlowcell.wdl
Failed to import 'tasks/HiFi.wdl' (reason 1 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'relative to directory [...]/wdl (escaping allowed)' (reason 1 of 1): File not found: tasks/HiFi.wdl
Failed to import 'tasks/HiFi.wdl' (reason 2 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'entire local filesystem (relative to '/')' (reason 1 of 1): File not found: tasks/HiFi.wdl
Failed to import 'tasks/HiFi.wdl' (reason 3 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'relative to directory [...]/wdl (escaping allowed)' (reason 1 of 1): File not found: tasks/HiFi.wdl
Failed to import 'tasks/HiFi.wdl' (reason 4 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'http importer (no 'relative-to' origin)' (reason 1 of 1): Relative path
PB10xSingleProcessedSample.wdl
Failed to import 'tasks/HiFi.wdl' (reason 1 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'relative to directory [...]/wdl (escaping allowed)' (reason 1 of 1): File not found: tasks/HiFi.wdl
Failed to import 'tasks/HiFi.wdl' (reason 2 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'entire local filesystem (relative to '/')' (reason 1 of 1): File not found: tasks/HiFi.wdl
Failed to import 'tasks/HiFi.wdl' (reason 3 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'relative to directory [...]/wdl (escaping allowed)' (reason 1 of 1): File not found: tasks/HiFi.wdl
Failed to import 'tasks/HiFi.wdl' (reason 4 of 4): Failed to resolve 'tasks/HiFi.wdl' using resolver: 'http importer (no 'relative-to' origin)' (reason 1 of 1): Relative path

Decide on code formatting standards

We now have code written in

  • WDL
  • shell
  • python

And sooner or later there could be more.
We need to think about picking a style guide.

For WDL, there's the WDL Guidelines for GATK Repo;
for shell scripts I generally use Sublime + SublimeLinter + shellcheck;
for python, there's the generally accepted PEP 8.

Find empirical formula on batch size for racon

It is showing some strange behavior in splitting the data: setting the value to 80, it complains not enough data, setting it to 90, huzzah!

And batch sizes for 3000-contig assemblies may be too large for 4000-contig assemblies, leading to OoM errors.

The task is to find an empirical formula.

Who should take this on? ๐Ÿง

@kvg

Implement a task for the SNV caller for raw Nanopore data, Clair

Clairvoyante is a new deep-learning approach to SNP and indel calling designed for long read sequencing (https://www.nature.com/articles/s41467-019-09025-z). We should implement this in our pipeline ASAP. This is particularly important for Nanopore and PacBio CLR sequencing as we don't currently have a solution for SNP calling on such data (currently the pipeline can only call SNPs on PacBio CCS data using DeepVariant and (very soon) GATK HaplotypeCaller).

Clair is the successor to Clairvoyante, with apparently a 5% bump in sensitivity for ONT data (https://github.com/HKU-BAL/Clair). The usage is apparently identical to Clairvoyante (https://github.com/aquaskyline/Clairvoyante). This is probably where we should start, rather than with Clairvoyante itself.

Abstract
The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5โ€“15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2โ€‰h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source (https://github.com/aquaskyline/Clairvoyante), with modules to train, utilize and visualize the model.

Improve the pipeline monitoring unit

A wish list

  • ask for more metrics to be monitored (e.g. I/O ops, GPU) in the docker
  • more robust querying command (I can imagine many ways to fool the command, e.g. the python script currently does not check for the status of a job, and may fail for failed jobs, etc)

Anyone interested in implementing any feature can cross out the item.

More wishes welcome.

Watchout for alignments with > 65535 CIGAR operations

As the reads and assemblies gets longer, we will get there.
In fact we might already be there for some assemblies with NG50 > 10Mb.

See minimap2's comment about how to do that.

A tricky thing is how up-to-date downstream tools are, when it comes to adhering to the hts-specs.

Too few reads in bam for Sniffles

In the PBCCSWholeGenomeSingleFlowcell workflow, in CallSVs, Sniffles task I got an error where it sounds like there's too few reads in the bam to estimate some parameter it needs? Log file excerpt follows:

++ samtools view -H /cromwell_root/fc-408ac13e-b06c-4835-b747-4258321b9a9b/8aaaf651-b226-426c-9580-40be7525723e/PBCCSWholeGenomeSingleFlowcell/8a3ed254-64b3-4266-8f57-08e5e1e0766b/call-MergeRuns/SM-JOTZQ_RW.m64020_200118_025318.bam
++ grep -m1 '^@RG'
++ sed 's/\t/\n/g'
++ grep '^SM:'
++ sed s/SM://g
+ SM=SM-JOTZQ_RW
+ sniffles -t 8 -m /cromwell_root/fc-408ac13e-b06c-4835-b747-4258321b9a9b/8aaaf651-b226-426c-9580-40be7525723e/PBCCSWholeGenomeSingleFlowcell/8a3ed254-64b3-4266-8f57-08e5e1e0766b/call-MergeRuns/SM-JOTZQ_RW.m64020_200118_025318.bam -v SM-JOTZQ_RW.m64020_200118_025318.sniffles.pre.vcf -s 3 -r 1000 -q 20 --genotype --report_seq --report_read_strands
Estimating parameter...
Too few reads detected in /cromwell_root/fc-408ac13e-b06c-4835-b747-4258321b9a9b/8aaaf651-b226-426c-9580-40be7525723e/PBCCSWholeGenomeSingleFlowcell/8a3ed254-64b3-4266-8f57-08e5e1e0766b/call-MergeRuns/SM-JOTZQ_RW.m64020_200118_025318.bam

This is using Terra with the method imported from dockstore:
github.com/broadinstitute/long-read-pipelines/PBCCSWholeGenomeSingleFlowcellVersion: 2.0-dockstore-test-2

Let me know if any other details are needed.

Implement and evaluate a new long-read-specialized tandem repeat finder

Implement a WDL task and evaluate the use of the noise-cancelling repeat finder (NCRF). From the manuscript (https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz484/5530597):

Abstract
Summary
Tandem DNA repeats can be sequenced with long-read technologies, but cannot be accurately deciphered due to the lack of computational tools taking high error rates of these technologies into account. Here we introduce Noise-Cancelling Repeat Finder (NCRF) to uncover putative tandem repeats of specified motifs in noisy long reads produced by Pacific Biosciences and Oxford Nanopore sequencers. Using simulations, we validated the use of NCRF to locate tandem repeats with motifs of various lengths and demonstrated its superior performance as compared to two alternative tools. Using real human whole-genome sequencing data, NCRF identified long arrays of the (AATGG)n repeat involved in heat shock stress response.

Availability and implementation
NCRF is implemented in C, supported by several python scripts, and is available in bioconda and at https://github.com/makovalab-psu/NoiseCancellingRepeatFinder.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.