Code Monkey home page Code Monkey logo

cas's Introduction

Content Addressable Storage

This project implements a simple and pragmatic approach to Content Addressable Storage (CAS). It was heavily influenced by Perkeep (aka Camlistore) and Git.

For more details, see concepts and comparison with other systems.

Status

The project is in an active development. Both API and on-disk format may change.

Check the Quick start guide for a list of basic commands.

Goals

  • Simplicity: the core specification should be trivial to implement.

  • Interop: CAS should play nicely with existing tools and technologies, either content-addressable or not.

  • Easy to use: CAS should be a single command away, similar to git init.

Use cases

  • Immutable and versioned archives: CAS supports files with multiple TBs of data, folders with millions of files and can index and use remote data without storing it locally.

  • Data processing pipelines: CAS caching capabilities allows to use it for incremental data pipelines.

  • Git for large files: CAS stores files with an assumption that they can be multiple TBs and is optimized for this use case, while still supporting tags and branches, like Git.

Features and the roadmap

Implemented:

  • Fast file hashing
    • SHA-256, other can be used
    • Stores results in file attributes (cache)
  • Support for large archives
    • Large contiguous files (> TB)
    • Large multipart files (> TB)
    • Large directories (> millions of files)
    • Zero-copy file fetch (BTRFS)
  • Integrations
    • Can index and sync web content
    • HTTP(S) caching (as a Go library)
  • Remote storage
    • Self-hosted HTTP CAS server (read-only)
    • Google Cloud Storage
  • Usability
    • Mutable objects (pins)
    • Local storage in Git fashion
  • Data pipelines
    • Extendable
    • Caches results
    • Incremental

Planned:

  • Support for large multipart files (> TB)
    • Support multilevel parts
    • Support blob splitters (rolling checksum, new line, etc)
  • Remote storage
    • AWS, etc
    • Self-hosted HTTP CAS server (read-write)
  • Integration with Git
    • Zero-copy fetch from Git (either remote or local)
    • LFS integration
  • Integration with Docker
    • Zero-copy fetch of an image from Docker
    • Unpack FS images to CAS
    • Use containers in pipelines
  • Integration with BitTorrent:
    • Store torrent files
    • Download torrent data directly to CAS
    • To consider: expose CAS as a peer
  • Integration with other CAS systems:
    • Perkeep
    • Upspin
    • IPFS
  • Windows and OSX support
  • Better support for pipelines

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.