Code Monkey home page Code Monkey logo

mindee / doctr Goto Github PK

View Code? Open in Web Editor NEW
3.1K 42.0 367.0 79.33 MB

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Home Page: https://mindee.github.io/doctr/

License: Apache License 2.0

Python 99.67% Dockerfile 0.22% Makefile 0.11%
ocr deep-learning document-recognition tensorflow2 text-detection-recognition text-detection text-recognition optical-character-recognition pytorch

doctr's People

Contributors

aminemindee avatar atomme1 avatar carl-krikorian avatar charlesmindee avatar chunyuan-w avatar dependabot[bot] avatar eikaramba avatar eltociear avatar felixdittrich92 avatar felixt2k avatar ffalkenberg avatar fg-mindee avatar fharper avatar fmobrj avatar frgfm avatar hamzagbada avatar ianardee avatar jsn5 avatar kforcodeai avatar khalidmindee avatar khanfarhan10 avatar mara004 avatar mtvch avatar mzeidhassan avatar odulcy-mindee avatar osanseviero avatar rbmindee avatar rob192 avatar siddhantbahuguna avatar skaarfacee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

doctr's Issues

[models] Implement an Artefact object detector

Following up on #15, I believe we should consider a major feature for artefact detection. For now, the different artefact that we consider would be:

  • Check boxes
  • QR Codes (#259)
  • Bar codes (#260)
  • Signature / Signed initials
  • Pictures (faces #258)
  • Logo / watermarks

Training an object detection model might be a good option to start from!

[models] Add text recognition module

Design a model subpart that is responsible to identify text strings inside the regions of interest of an image

Input

  • images: Numpy-style encoded (cropped) images (already read), expected to hold a single character sequence

Output

  • text: list of N strings, where N = number of cropped input images

The following components would be required:

  • Preprocessor (#20, #33)
  • RecognitionModel (#35)
  • RecognitionProcessor (#37)
  • RecognitionPredictor (#39)

[utils] Move doctr dependences out of doctr.utils

The doctr.utils.visualization introduces a dependency of other modules, which might be troublesome later. Several options are at our disposal:

  • sterilize these imports, and implement the specific version in the modules of the former dependency
  • use exported dictionary versions of element to plot (typing would then not require the imports of doctr.documents)

Any other suggestion is welcome!

[models] detect page orientation

We should be able to detect the rotation of a page (angle from 0 to 359°) to redress the page before sending it to the OCR.
This would highly improve our predictor on "tricky" datasets where pages are often rotated.

Some documents are very complex and have areas of text with different orientations, but even for these documents we can define a main orientation for the page (most of the lines would be oriented this way).

This leads us to define 2 levels of orientation:

  • Page orientation: for most of the documents it would be the orientation of all text lines, for tricky documents the main orientation of the lines (most of the lines are oriented this way).
  • Box/line/block level orientation: Once the document is rotated (after the detection of page orientation), find the areas which are still rotated and redress only these areas.

Any suggestion is welcome! I think firstly we should implement page orientation which should be relatively easy and then focus on the other part which is far more trickier.

@fg-mindee

[ci] Setup basic PR checks

The following should be setup:

  • lint checking using flake8
  • typing annotation check with mypy
  • unittests using pytest and coverage
  • package installation verification

[models] Models should accept list of pages as inputs and not list of documents

The current behaviour overcomplicates things since a document is already a list of pages.
The only advantage of the current behaviour is to save potentially 1 forward. Assuming we have 1 document with A pages and a second document with B pages, and our batch size N:

  • currently, we do math.ceil((A + B) / N) forwards
  • the proposition would have math.ceil(A / N) + math.ceil(B / N) forwards

The real advantage is when A and B are less than N.

The interface would be much cleaner though.

Write a conversion function from Image + bounding boxes to cropped images

The detection block is currently returning a list of bounding boxes, while the recognition block is actually using cropped images. The recognition pre-processing needs to handle this.

Input

  • images: Numpy-style encoded images (already read)
  • bounding boxes: list of tensor of predictions of size N*4 (xmin, ymin, xmax, ymax)

Output

  • cropped images: N cropped images

[models] BN layers call behaviour depends on a 2nd argument

According to TF documentation, BN layers have a second argument in their call method which changes its behaviour. This raises the following question:

  • how do we pass down this information in a Sequential for instance?
  • if it's possible, shouldn't this be a layer attribute that we can switch? (bn.training = False) since we won't want to change this behaviour upon each call.

[conda] Unable to make a conda build

Unfortunately, one of the project dependencies does not have any conda release or any way to make one. I opened an issue on their repo pymupdf/PyMuPDF#938 to track this, but so far I haven't found any way to release the project on anaconda with this dependency.

ImportError: cannot import name 'DocumentFile' from 'doctr.documents'

Whenever I try to execute the same code presented in the main page, I get this error : "ImportError: cannot import name 'DocumentFile' from 'doctr.documents' (/Users/Aksol/miniconda3/lib/python3.8/site-packages/doctr/documents/init.py)".
I can't find a solution around it, I tried cloning the repo in Google Colab and I still get the same error.
Has anyone came across the same problem and had a solution around it ?
Thank you

[documents] Harmonize file reading

A package user has to import different functions to read a file depending on its extension, and reading mean (from path, from bytes). This needs serious refactoring.

  • Handle reading means (#172)
  • Handle extensions (#172)

[models] Profiling models for temporal optimization

Line Profiler https://github.com/pyutils/line_profiler is used to perform model profiling, to highlight the most time-consuming lines when our models are running.

For an OCRPredictor with a sar_vgg16_bn coupled with a db_resnet50, we have the following analysis:

  • the whole recognition model (SAR) accounts for 85% of the total execution time (OCR predictor)
  • the whole detection model (DB) accounts for 15% of the total execution time

More precisely:

  • Recognition task: almost 100% of the time is spent inside the model (without pre/postprocessing). We have the following distribution inside the model: 22.5% for the feature extractor (VGG16 with bn), 17.5% for the encoder (2 LSTM layers with 512 hidden layers), and 60% for the decoder. The decoder has 2 main time-consuming tasks: a 31% goes to the lstm decoder (2 layers of stacked LSTM cells with 512 hidden layers), and a 62% goes to the attention module, and inside this module 87% of the time is spent in the conv2D operation (kernel 3x3, stride=1) which is encoding the feature map (N, H, W, feature_units) --> (N, H, W, attention_units=512). We can conclude for the recognition task that:
  1. 27.5% of the total execution time (ocr end to end) is spent in this conv2D layer, which can be reduced by decreasing the number of attention units from 512 to 256, maybe less.
  2. 15% of the total execution time is spent in the encoder and another 16% of the total execution time is spent in the LSTM decoder. Thus, LSTM layers accounts for more than 30% of the total execution time. This can be improved by reducing the number of encoding/decoding layers from 2 to 1, and reducing the number of hidden rnn units inside the LSTM cells from 512 to 256 for instance.

It is important to highlight that we have a significant leverage on the execution time of the whole model moving only 2 hyper-parameters: attention units and hidden units (almost 60% of the execution time end-to-end is directly impacted).

  • Detection task: 99% is spent inside the model and 1% for the post-processing. Inside the model, 65% are spent inside the feature extractor (Resnet50), which accounts for almost 10% in the whole model. This time can be reduced using a lighter resnet such as resnet18 for instance. The remaining 35% are spent in the pyramidal module and in the computing of the probability map.

TypeError: Expected Ptr<cv::UMat> for argument 'array' when using read_pdf()

Can't execute read_pdf() function.

See the pdf file sent over slack to reproduce.

python 3.6.9

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order)
Traceback (most recent call last):
  File "/home/jonathan/mindee/dev/client_tests//test_classifier/main.py", line 17, in <module>
    result = model([doc])
  File "/home/jonathan/mindee/dev/client_tests//test_classifier/doctr/doctr/models/core.py", line 51, in __call__
    boxes = self.det_predictor(pages, **kwargs)
  File "/home/jonathan/mindee/dev/client_tests//test_classifier/doctr/doctr/models/detection/core.py", line 140, in __call__
    out = [self.post_processor(batch) for batch in out]
  File "/home/jonathan/mindee/dev/client_tests//test_classifier/doctr/doctr/models/detection/core.py", line 140, in <listcomp>
    out = [self.post_processor(batch) for batch in out]
  File "/home/jonathan/mindee/dev/client_tests//test_classifier/doctr/doctr/models/detection/differentiable_binarization.py", line 173, in __call__
    boxes = self.bitmap_to_boxes(pred=p_, bitmap=bitmap_)
  File "/home/jonathan/mindee/dev/client_tests//test_classifier/doctr/doctr/models/detection/differentiable_binarization.py", line 140, in bitmap_to_boxes
    _box = self.polygon_to_box(points)
  File "/home/jonathan/mindee/dev/client_tests//test_classifier/doctr/doctr/models/detection/differentiable_binarization.py", line 106, in polygon_to_box
    x, y, w, h = cv2.boundingRect(expanded_points)  # compute a 4-points box from expanded polygon
TypeError: Expected Ptr<cv::UMat> for argument 'array'

ValueError on model call

🐛 Bug

When calling the model on a document, I get a ValueError.

To Reproduce

Steps to reproduce the behavior:

from doctr.documents import read_pdf 
from doctr.models import ocr_db_crnn 
model = ocr_db_crnn(pretrained=True)                                                                                                                                                                
doc = read_pdf("path/to/pdf") # write your path to the pdf 
result = model([doc]) 
~/tensorflow_2/lib/python3.6/site-packages/doctr/models/core.py in __call__(self, documents, **kwargs)
     53         # Reorganize
     54         num_pages = [len(doc) for doc in documents]
---> 55         results = self.doc_builder(boxes, char_sequences, num_pages, [page.shape[:2] for page in pages])
     56 
     57         return results

~/tensorflow_2/lib/python3.6/site-packages/doctr/models/core.py in __call__(self, boxes, char_sequences, num_pages, page_shapes)
    204                         self._build_blocks(
    205                             page_boxes[:num_crops[page_idx]],
--> 206                             char_sequences[crop_idx: crop_idx + num_crops[page_idx]]
    207                         ),
    208                         page_idx,

~/tensorflow_2/lib/python3.6/site-packages/doctr/models/core.py in _build_blocks(self, boxes, char_sequences)
    164                         ((boxes[idx, 0], boxes[idx, 1]), (boxes[idx, 2], boxes[idx, 3]))
    165                     ) for idx in line]
--> 166                 ) for line in lines]
    167             )
    168         ]

~/tensorflow_2/lib/python3.6/site-packages/doctr/models/core.py in <listcomp>(.0)
    164                         ((boxes[idx, 0], boxes[idx, 1]), (boxes[idx, 2], boxes[idx, 3]))
    165                     ) for idx in line]
--> 166                 ) for line in lines]
    167             )
    168         ]

~/tensorflow_2/lib/python3.6/site-packages/doctr/documents/elements.py in __init__(self, words, geometry)
    108         # Resolve the geometry using the smallest enclosing bounding box
    109         if geometry is None:
--> 110             geometry = resolve_enclosing_bbox([w.geometry for w in words])
    111 
    112         super().__init__(words=words)

~/tensorflow_2/lib/python3.6/site-packages/doctr/utils/geometry.py in resolve_enclosing_bbox(bboxes)
     20 
     21 def resolve_enclosing_bbox(bboxes: List[BoundingBox]) -> BoundingBox:
---> 22     x, y = zip(*[point for box in bboxes for point in box])
     23     return ((min(x), min(y)), (max(x), max(y)))

ValueError: not enough values to unpack (expected 2, got 0)

Example of pdf causing the error

000f51e2-b734-48d9-968f-e77e3b022844.pdf

Add __repr__ to main classes of the package

Existing classes of the package don't have any __repr__ set, which makes it difficult for new users to understand what each object is composed of.

The following classes would greatly benefit from adding a repr:

  • Elements and all classes inheriting from it (#102)
  • Models (#102)

Pb: unitest text_export_size not passing on tf 2.3.1

Unitest text_export_size not OK locally on tf 2.3.1 :

def test_export_sizes(test_convert_to_tflite, test_convert_to_fp16, test_quantize_model):
        assert sys.getsizeof(test_convert_to_tflite) > sys.getsizeof(test_convert_to_fp16)
>       assert sys.getsizeof(test_convert_to_fp16) > sys.getsizeof(test_quantize_model)
E       AssertionError: assert 3041 > 3041

[scripts] Add a script for environment collection

To be able to systematically identify sources of reported issues, each bug report should come with a description of the user's environment. A script would be required to collect among others:

  • Python version
  • Tensorflow version
  • NVIDIA driver version
  • CUDA version

The user would only have to paste the result in the PR description.

[encoding] Unify string encoding for datasets and models

As of the latest commit, models are trained with a string encoder that is not particularly ordered (TF records). To clean all of this, I suggest the following:

  • Define pre-established vocab that we will use as encoders for your future models (#116)
  • Address the topic of vocab mapping (converting an accented character to its raw version for instance)
  • Standardize dataset label encoding using a given vocab (#116)

On the topic of vocab definition, there are two aspects to consider: the character selection, their order in the vocab.

cc @charlesmindee

[documents] Check for input PDF with PyMuPDF when it's source content

So far, we have used the content reading feature of PyMuPDF. When a source PDF is read, the library actually extracts all the localization and text information from the document. Two options are then available:

  • skipping the model inference and use this
  • combining it with the model inference to improve our predictions

[documents] Page elements could also be QR Code, Pictures, etc.

The current design for page elements only considers text, while the actual documents do have a much wider variety in terms of page elements. Among others, the structure should have non-text elements integrated such as:

  • QR Codes
  • Bar codes
  • Pictures
  • Signature / Signed initials
  • Watermarks / logo

In the end, we would need to:

  • Select the artefact types to be supported by the library
  • Implement the integration within the existing structure (#26)

[models] Add full OCR module

Design an object that will wrap all DL model components and be responsible for localizing and identifying all text elements in documents.

Inputs

  • a collection of documents, where each document is a list of pages, themselves expressed as numpy-encoded images.

Outputs

  • a collection of Document objects

The following components will be required:

  • DetectionPredictor (#39)
  • RecognitionPredictor (#39)
  • OCRPredictor (#39)
  • ElementExporter (#16, #26, #40)

[models] Add a pretrained checkpoint loading mechanism

Architecture definitions are already available in the repo, but it still lacks a way to load a set of pretrained parameters. This should be tackled for the upcoming release. Here are the requirements:

  • Select a checkpoint format: .ckpt
  • Implement necessary methods/functions to save and load the checkpoint: keras inherited (#49)
  • Ensure data integrity (SHA256 hash most likely) (#49)
  • Add a first pretrained model (#49)

[metrics] Benchmark python options for text-distance computing

Options for computing distance between 2 character sequences in python:

  • textdistance: full python lib
  • jellyfish: full python lib
  • strsimpy: full python lib
  • python-Levenshtein: C lib
  • edlib: C++ lib (python binding)
  • RapidFuzz: C++/python lib
  • polyleven: C/python lib

All these libs provides algorithms such as Levenshtein distance, Hamming distance, Jaro-Winkler distance, ... To compute distance between 2 strings (character sequences).

This graph taken from the RapidFuzz documentation highlights the strong dependence of runtime on string length:

Capture d’écran de 2021-03-10 16-00-17

We can see for example that python-Levenshtein is faster on short sequences than edlib, but much slower on long sequences. In our cases, we are typically using character sequences from 0 to 30 characters, very often between 0 and 15 characters and almost never more than 30 characters. According to these plots, it seems that polyleven, python-Levenshtein and rapidfuzzy are the fastest solutions on these typical lengths.

I conducted a short study: I called levenshtein distance 1000 times for each of these package, on 2 strings between 15 and 20 characters. Here are the results for 1000 iterations:

  • RapidFuzz / python-Levenshtein: 0.0002 s
  • Polyleven: 0.0003 s
  • Jellyfish: 0.0009 s
  • edlib: 0.002 s
  • textdistance: 0.0064 s
  • strsimpy: 0.09 s

Conclusion:
Jellyfish is fast for a full-python lib. However, as expected C libs are faster, especially RapidFuzz and python-Levenshtein on these sequence lengths. In terms of dependencies, python-Levenshtein is really light but rapidfuzz is heavier (cpp files)

@fg-mindee

[documents] Add basic export module

The output object type of a document analysis should be defined as follows:

  • structured hierarchy as stated in the design doc
  • have export methods to different formats.

[utils] Add visualization utilities

Users cannot easily visualize the results of OCR models. Some steps would have to be taken to handle this issue properly:

  • Establish a visualization design (considering the density of predictions to display)
  • Implement it in the doctr.utils module (#54)

Here is a nice thread about matplotlib dynamic display: https://stackoverflow.com/questions/7908636/possible-to-make-labels-appear-when-hovering-over-a-point-in-matplotlib
And a matplotlib-compatible library: https://mplcursors.readthedocs.io/en/stable/examples/hover.html

[models] Implement HRGAN or MASTER

This paper suggests a new architecture: Holistic Representation Guided Attention Network for text recognition model, inspired from transformers, which oustands SAR both in accuracy & speed.

We should implement this model, but the impressive speed results should be handled carefully (X8 speed compared to SAR), since the experiments are conducted on a GPU, and this model is highly parallelable (no recurrency). Is this new model so fast on CPU ?

[models] Add model compression utils

Add a doctr.models.utils module to compress existing models and improve their latency / memory load for inference purposes on CPU. Some interesting leads to investigate:

Optional: TensorRT export (cf. https://developer.nvidia.com/blog/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/)

[api] Add a minimal API setup with DocTR

Since the library can be used to build an API, we could add a minimal codebase for users to deploy a light API for OCR on documents.

Here is a more detailed suggestion:

  • put everything in a api folder
  • implement it with FastAPI
  • implement a POST method "/analyze" that returns the OCR results

spread into 3 PRs:

  • Text recognition route (#242)
  • Text detection route (#245)
  • OCR route (#247)

[models] Properly design the preprocessing for detection & recognition

Let's review the raw inputs, and all the transformations we need to apply for the model to process it correctly.
Inputs

  • List of images, where each image is expressed as a numpy.ndarray of arbitrary shape (H, W, 3) and encoded in np.uint8

Transformations
Please note that the order below is important:

Detection

  • convert to tf.Tensor
  • resize each image to a fixed height & width by bilinear interpolation using TF
  • batch
  • cast to tf.float32
  • divide by 255
  • normalize

Recognition

  • convert to tf.Tensor
  • resize each image to a fixed height using bilinear interpolation using TF
  • pad with zeros
  • batch
  • cast to tf.float32
  • divide by 255
  • normalize

Let's proceed as follows:

  • Discuss & select the list of transforms
  • Implement the modifications / additions to the existing codebase (#50)

[models] Add detection module

Design a model subpart that is responsible to localize regions of interest in the document.

Input

  • Numpy-style encoded images (already read)

Output

  • localization: list of tensor of prediction. Each prediction is of size 5 (xmin, ymin, xmax, ymax, objectness)

Core components
With DetectionPredictor being the main class, with the following components:

  • Preprocessor (#20)
  • DetectionModel (#32)
  • Postprocessor (#24)
  • DetectionPredictor (#39)

[data] Handle different cases of vertical text

We need to keep in mind that we will come across 2 cases of vertical text:

  • "Rotated" vertical text: horizontal text with a +/-90° rotation (rotated letters)

rotated

  • "Truly" vertical text: a text with horizontal letters (unrotated letters), written from top to bottom

vertical

[docs] Add documentation building dependencies

Some basic documentation with proper installation and usage installations would be greatly beneficial to the library. This would be the privileged mean of communication with non-developer audiences.

Having it being automatically built using something similar to sphinx would be efficient considering all docstring will have a compatible format.

[docs] Add performance benchmark for all pretrained models

There are a few tables in the documentation that still need filling:

  • Text detection (#143)
  • Text recognition (#143)
  • End-to-End OCR (#143)
  • Comparison with similar solutions (#149)

We need to select a public dataset for each task and run the evaluation on their respective test sets, then report the results back in these tables.

[documents] PDF & image input page have different dimensions data formats

If we consider that the document analysis output is used for document reconstruction, a problem arises for pages. Simply put, image pages have their dimensions in pixels, while PDF have theirs in inches/centimeters and hold a DPI parameter.

Two questions have to be tackled:

  • uniformity: should we enforce some uniformity of dimensioning for pages whatever the format?
  • export: either way, which export format do we use to avoid information loss?

Demo app error when analyzing my first document

🐛 Bug

I tried to analyze a PNG and a PDF, got the same error. I try to change the model, didn't change anything.

To Reproduce

Steps to reproduce the behavior:

  1. Upload a PNG
  2. Click on analyze document
KeyError: 0
Traceback:
File "/Users/thibautmorla/opt/anaconda3/lib/python3.8/site-packages/streamlit/script_runner.py", line 337, in _run_script
    exec(code, module.__dict__)
File "/Users/thibautmorla/Downloads/doctr/demo/app.py", line 93, in <module>
    main()
File "/Users/thibautmorla/Downloads/doctr/demo/app.py", line 77, in main
    seg_map = predictor.det_predictor.model(processed_batches[0])[0]

Additional context

First image upload

[documents] improve line detection

An OCR predictor must detect lines, and our current version is too weak:

  • many lines overlap
  • some lines stop in the middle of a dense block whereas other lines are kind of "bridging" between separated blocks.

line11

[models] Add minimum pretrained models

For the library to be used as an end-to-end tool, some assets need to be provided:

  • detection pretrained model (#62)
  • recognition pretrained model (#62)

Only then, we will be able to consider this as a potential MVP.

[utils] Add visualization capabilities for independent tasks

Visualization is end-to-end for the moment dynamic, but this means that a static version is not currently available, nor that there is a visualization option for text detection or text recognition only. We should discuss and add visualization for the following blocks:

  • Text detection: display bounding boxes of detected items over the image
  • Text recognition: display the label and confidence in a corner of the crop

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.