greglook / blocks Goto Github PK
View Code? Open in Web Editor NEWClojure content-addressable data storage.
License: The Unlicense
Clojure content-addressable data storage.
License: The Unlicense
Hi, it would be cool to be able to use this library from babashka. Sadly the dependency on manifold breaks that. Do you think it would be possible to replace the manifold with core.async (which works in babashka)?
In some use cases it is desirable to stream the block content into a store rather than having to stage it locally. This should be implemented as an enhancement protocol that the store!
function can use if available. This would be most useful on stores that can store the data in a temporary location and atomically move it to the correct hash id once the stream completes.
Add some logical block stores which use other stores as subcomponents. Two ideas come to mind:
Following the examples from the README, something like this should be possible to create the store:
(require '[blocks.store :as store])
(def store (block/->store "mem:-"))
But instead it throws the following error:
Syntax error (IllegalArgumentException) compiling at (form-init11663713394850559011.clj:1:12).
Unsupported block-store URI scheme: "mem:-"
Running (store/parse-uri "mem:-")
returns {:scheme "mem", :name "-"}
so seems to work fine on the surface at least.
Figure out the right way to support reading a subrange of block content. For literal blocks, this is easy - just need to open a stream, skip some portion of the content, and wrap a bounded stream around that. Both file and s3-backed blocks support subrange reading.
Maybe the stat metadata can include a marker for whether the reader function supports being invoked with range arguments. If so, call it like (reader start end)
.
This should actually be in a separate repo (blocks-ipfs
?), but there's a fairly straightforward mapping from the BlockStore
protocol to the IPFS API.
stat
maps to /block/stat
list
is a bit of a question mark, since it doesn't make sense to try to enumerate every block in the IPFS network.get
maps to /block/get
put!
maps to /block/put
delete!
has no equivalent, and should throw an exception.IPFS would be pretty useful as a fallback lookup store, so that public IPFS data could be referenced transparently from blocks in a private store.
This is going to sound pretty ignorant... I've been googling 'blockchain' technology for about half a day and I still don't know what it means. Does this library have anything to do with it?
Instead of relying on lazy blocks' source function providing a stream when called with zero or two args, it would be better to implement a protocol with more explicit methods, which could fall back to assuming the source was a stream-returning function.
There are a few utilities for working with block stores which would be really nice to be able to use programmatically or from the command line:
EnumeratingStore
?) and validate the id, size, and hash.Write a library implementing a ring handler and compatible client to serve block data over HTTP.
Both file-store
and s3-store
would benefit from keeping version information with the stored block content. We can do this under the configured root prefix because the filename (version
?) isn't valid hexadecimal.
Minimally, the version should denote breaking compatibility so we can warn the user at start-time. For example, changing file-store
so it stores blocks in a flat structure (e.g. root/111493dc/48a5...
vs root/1114/93/dc/48a5...
). A version would let the user know that the content on disk was incompatible with the current code.
Later, this also enables migrations from one version to another.
There's opportunity for a lot of on-demand debugging messages in the existing block stores. Minimally, this would involve adding TRACE
level statements on every API operation and DEBUG
statements wherever else it makes sense.
Hi @greglook.
This library looks amazing. But I can't come up with some good use cases when one should use it.
Do you have some good links to read on projects that use content-addressable storage
?
Idea: a logical block store which is given a name and a reporting function and emits Riemann event maps to measure block store operations. These would look like:
{:time #inst "2015-11-18T19:01:45-00:00"
:host "TBD"
:service "block-store <store-name> <operation>"
:metric 38.234 ; ms elapsed
:state "ok"}
The map conversion could even be punted on, and the store could just call the function with the service name and metric. That would leave filling in the time, host, state, etc to the metrics backend.
Per this TODO comment:
Protocol which returns a lazy sequence of every block in the store, along with an opaque marker which can be used to resume the stream in the same position. Blocks are explicitly not returned in any defined order; it is assumed the store will enumerate them in the most efficient order available.
This is mostly useful for synchronizing two block stores, as the data will be read out of the source in an efficient order. For example, if using a diskpacked-like store you don't want to have to skip between packfiles frequently just to access the blocks in ascending identifier order.
Currently the :stored-at
metadata on blocks uses java.util.Date
- instead, the library should use java.time.Instant
as the superior time representation. This would be a breaking change.
Blocks has a dependency on [mvxcvi/puget "1.3.1"]. This older version has a dependency on [fipp "0.6.23"], which uses the private clojure.instant/thread-local-utc-date-format
that is removed in Clojure 1.12
This is no longer an issue in [mvxcvi/puget "1.3.4"]
Create or find a logo for the blocks
family of libraries. Themes could include:
It may be useful to have an optional protocol (BatchingStore
?) that defines methods for reading and writing multiple blocks at once. These would be 'private' methods (similar to -get
and -list
) and wrapped by a function which checks for the protocol. If the store in question doesn't implement BatchingStore
, it would fall back to operating on the blocks one-by-one and returning a collection.
get-batch
retrieves blocks identified by a collection of multihash ids, returning a sequence of all the blocks identified which exist in the store.put-batch!
stores a collection of blocks, returning a sequence of stored block versions.delete-batch!
removes blocks identified by a collection of multihash ids.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.