greglook / blocks Goto Github PK

View Code? Open in Web Editor NEW

112.0 9.0 6.0 748 KB

Clojure content-addressable data storage.

License: The Unlicense

Clojure 96.66% Java 3.34%

clojure storage content-addressable-storage

blocks's People

Stargazers

Watchers

Forkers

gitter-badger phi4grv jstokes aengelberg clojure-land lionturtle

blocks's Issues

babashka compatibility

Hi, it would be cool to be able to use this library from babashka. Sadly the dependency on manifold breaks that. Do you think it would be possible to replace the manifold with core.async (which works in babashka)?

Streaming storage protocol

In some use cases it is desirable to stream the block content into a store rather than having to stage it locally. This should be implemented as an enhancement protocol that the store! function can use if available. This would be most useful on stores that can store the data in a temporary location and atomically move it to the correct hash id once the stream completes.

Logical stores

Add some logical block stores which use other stores as subcomponents. Two ideas come to mind:

Caching block store which keeps up to a certain total size of block content in a cache store, backed by some authoritative primary store.
Replica store which stores blocks in multiple backing stores, ensuring at least some number succeed.

Unsupported block-store URI scheme: "mem:-"

Following the examples from the README, something like this should be possible to create the store:

(require '[blocks.store :as store])
(def store (block/->store "mem:-"))

But instead it throws the following error:

Syntax error (IllegalArgumentException) compiling at (form-init11663713394850559011.clj:1:12).
Unsupported block-store URI scheme: "mem:-"

Running (store/parse-uri "mem:-") returns {:scheme "mem", :name "-"} so seems to work fine on the surface at least.

Support block subranges

Figure out the right way to support reading a subrange of block content. For literal blocks, this is easy - just need to open a stream, skip some portion of the content, and wrap a bounded stream around that. Both file and s3-backed blocks support subrange reading.

Maybe the stat metadata can include a marker for whether the reader function supports being invoked with range arguments. If so, call it like (reader start end).

IPFS block store

This should actually be in a separate repo (blocks-ipfs?), but there's a fairly straightforward mapping from the BlockStore protocol to the IPFS API.

stat maps to /block/stat
list is a bit of a question mark, since it doesn't make sense to try to enumerate every block in the IPFS network.
get maps to /block/get
put! maps to /block/put
delete! has no equivalent, and should throw an exception.

IPFS would be pretty useful as a fallback lookup store, so that public IPFS data could be referenced transparently from blocks in a private store.

is this a blockchain library?

This is going to sound pretty ignorant... I've been googling 'blockchain' technology for about half a day and I still don't know what it means. Does this library have anything to do with it?

Protocol for lazy block source

Instead of relying on lazy blocks' source function providing a stream when called with zero or two args, it would be better to implement a protocol with more explicit methods, which could fall back to assuming the source was a stream-returning function.

Storage utilities

There are a few utilities for working with block stores which would be really nice to be able to use programmatically or from the command line:

A sync tool which takes a source store and a destination store and ensures that all the blocks in the source are stored in the destination.
A validation tool which will scan every block in a store (possibly using EnumeratingStore?) and validate the id, size, and hash.

HTTP store

Write a library implementing a ring handler and compatible client to serve block data over HTTP.

Store versions with blocks

Both file-store and s3-store would benefit from keeping version information with the stored block content. We can do this under the configured root prefix because the filename (version?) isn't valid hexadecimal.

Minimally, the version should denote breaking compatibility so we can warn the user at start-time. For example, changing file-store so it stores blocks in a flat structure (e.g. root/111493dc/48a5... vs root/1114/93/dc/48a5...). A version would let the user know that the content on disk was incompatible with the current code.

Later, this also enables migrations from one version to another.

Add debug/trace logging to stores

There's opportunity for a lot of on-demand debugging messages in the existing block stores. Minimally, this would involve adding TRACE level statements on every API operation and DEBUG statements wherever else it makes sense.

Use cases

Hi @greglook.

This library looks amazing. But I can't come up with some good use cases when one should use it.
Do you have some good links to read on projects that use content-addressable storage?

Metrics wrapper for block stores

Idea: a logical block store which is given a name and a reporting function and emits Riemann event maps to measure block store operations. These would look like:

{:time #inst "2015-11-18T19:01:45-00:00"
 :host "TBD"
 :service "block-store <store-name> <operation>"
 :metric 38.234 ; ms elapsed
 :state "ok"}

The map conversion could even be punted on, and the store could just call the function with the service name and metric. That would leave filling in the time, host, state, etc to the metrics backend.

Block enumerator protocol

Per this TODO comment:

Protocol which returns a lazy sequence of every block in the store, along with an opaque marker which can be used to resume the stream in the same position. Blocks are explicitly not returned in any defined order; it is assumed the store will enumerate them in the most efficient order available.

This is mostly useful for synchronizing two block stores, as the data will be read out of the source in an efficient order. For example, if using a diskpacked-like store you don't want to have to skip between packfiles frequently just to access the blocks in ascending identifier order.

Use Instant for times

Currently the :stored-at metadata on blocks uses java.util.Date - instead, the library should use java.time.Instant as the superior time representation. This would be a breaking change.

puget dependency version update required for Clojure 1.12 compatibility

Blocks has a dependency on [mvxcvi/puget "1.3.1"]. This older version has a dependency on [fipp "0.6.23"], which uses the private clojure.instant/thread-local-utc-date-format that is removed in Clojure 1.12

This is no longer an issue in [mvxcvi/puget "1.3.4"]

Project logo

Create or find a logo for the blocks family of libraries. Themes could include:

Abstract cubic geometry like this.
Separate cubes of different colors?
Could focus on the immutable nature of the blocks somehow, or that this is both for the blocks and the storage.
Something adaptable to the different backing stores would be great, e.g. reusing an element of the main logo and pairing it with the s3 logo.

Protocol for batch operations

It may be useful to have an optional protocol (BatchingStore?) that defines methods for reading and writing multiple blocks at once. These would be 'private' methods (similar to -get and -list) and wrapped by a function which checks for the protocol. If the store in question doesn't implement BatchingStore, it would fall back to operating on the blocks one-by-one and returning a collection.

get-batch retrieves blocks identified by a collection of multihash ids, returning a sequence of all the blocks identified which exist in the store.
put-batch! stores a collection of blocks, returning a sequence of stored block versions.
delete-batch! removes blocks identified by a collection of multihash ids.

greglook / blocks Goto Github PK

blocks's People

Stargazers

Watchers

Forkers

blocks's Issues

Recommend Projects

Recommend Topics

Recommend Org