Code Monkey home page Code Monkey logo

ipfs-cluster's Introduction

IPFS Cluster

Made by Main project Discord Matrix channel pkg.go.dev Go Report Card codecov

Pinset orchestration for IPFS

logo

IPFS Cluster provides data orchestration across a swarm of IPFS daemons by allocating, replicating and tracking a global pinset distributed among multiple peers.

There are 3 different applications:

  • A cluster peer application: ipfs-cluster-service, to be run along with kubo (go-ipfs) as a sidecar.
  • A client CLI application: ipfs-cluster-ctl, which allows easily interacting with the peer's HTTP API.
  • An additional "follower" peer application: ipfs-cluster-follow, focused on simplifying the process of configuring and running follower peers.

Are you using IPFS Cluster?

Please participate in the IPFS Cluster user registry.


Table of Contents

Documentation

Please visit https://ipfscluster.io/documentation/ to access user documentation, guides and any other resources, including detailed download and usage instructions.

News & Roadmap

We regularly post project updates to https://ipfscluster.io/news/ .

The most up-to-date Roadmap is available at https://ipfscluster.io/roadmap/ .

Install

Instructions for different installation methods (including from source) are available at https://ipfscluster.io/download .

Usage

Extensive usage information is provided at https://ipfscluster.io/documentation/ , including:

Contribute

PRs accepted. As part of the IPFS project, we have some contribution guidelines.

License

This library is dual-licensed under Apache 2.0 and MIT terms.

ยฉ 2022. Protocol Labs, Inc.

ipfs-cluster's People

Contributors

alekswn avatar arthurgavazza avatar dependabot-preview[bot] avatar dependabot[bot] avatar dgrisham avatar hsanjuan avatar iand avatar ipfs-mgmt-read-write[bot] avatar jbenet avatar jmank88 avatar jonasrmichel avatar jorropo avatar kishansagathiya avatar laevos avatar lanzafame avatar magik6k avatar mateon1 avatar michaelmure avatar olizilla avatar omkarprabhu-98 avatar qiwaa avatar roignpar avatar rubenkelevra avatar s1na avatar sona1111 avatar te0d avatar vasco-santos avatar whilei avatar whyrusleeping avatar zenground0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ipfs-cluster's Issues

CLI tooling (flags)

  • ipfscluster-server

    • See if it is possible to have -config PATH and not -config string (not possible with vanilla flag I think)
    • Study use of https://github.com/urfave/cli
    • Make ipfscluster-server -init a subcommand too ipfscluster-server init
    • Add -f to overwrite config file rather than asking the user to do it
  • ipfscluster

    • Longer general help
    • See also about urfave/cli
    • Probably should be able to launch the server
    • ipfscluster id

Hangs when shutting down + graceful disconnects

- also ^c on one node, then ^c on the other hangs, looks like it's trapping the exit signal and waiting for the other members to respond, so it's stuck. cant kill it, only kill -9.

ctrl-c in one node causes some errors. Raft is for sure going to complain about this (as it should). Gotta investigate if there are further problems.

Authentication

The os/etcd/rafthttp Raft HTTP implementation (https://github.com/coreos/etcd/tree/master/rafthttp or https://godoc.org/github.com/coreos/etcd/rafthttp) can use TLS to secure communication, which in turn is capable of mandating the client be authenticated, allowing you to specify a CA to validate client certificates (https://godoc.org/crypto/tls#Config). We could use this to our advantage to add authentication to ipfs-cluster by having the cluster accept a configured certificate and a certificate authority that it uses to keep the cluster tight. Kubernetes or similar would be responsible for the CA and Certificate generation.

Use case: home directories in ipfs-cluster

I read in the meeting notes that you're asking for use cases. Here is my very personal use case and wishlist for ipfs-cluster. I'm not 100% sure if this doesn't go beyond the scope of ipfs-cluster, but I'll just write it down here anyway.

I want to replace any distributed filesystem I've currently in use with ipfs. I'm using XtreemFS (because it works well over WAN) and have been using AndrewFS, GlusterFS and HDFS (without WANdisco) in the past.

My first use case is to store home directories in ipfs clusters and these are the features that I would really like to have:

  • Auth: Every user gets authenticated and has access only to the parts of the cluster that he has permissions for (authorization).
  • Encryption: Yes, one could roll his own. But having it already integrated could come with a lot of goodies. Like per-user keys and key management in general. This could maybe be done with the excellent gocryptfs.
  • Admin management web interface: It would be great to have a management interace, where you can set the above 2 things (auth and encryption) and also manage the whole of the cluster. Which replication scheme to use. Managing sub-clusters. Setting different replication schemes for different sub-clusters. Kinda like Amazon S3 -> Amazon Glacier. One cluster is the high availability store, another is less well replicated (effect on speed through striping). Also in the interface the admin should be able to control the traffic and disk space quotas of the participants of the cluster (nodes, sub-clusters)
  • Client-side quota limit: Although the actual quotas that are used should be handled by the system/admin, each participant should be able to set a hard limit to traffic and diskspace that cannot be exceeded.
  • Replication schemes: RAID schemes are well understood, but RAID5 or RAID6 isn't enough. Especially when there are many participants who need high availability of the files that are on the cluster, which is over WAN. The replication isn't just about data safety, it's also about speed. So just going up the RAID level with an increasing number of participants, RAIDX? It would be great to have replication schemes that would take into account the popularity and age of a file. Have old documents only replicated for example on 5 nodes, while new and popular documents that get opened and used a lot are replicated on 20 nodes.
  • Balancer: Every distributed filesystem needs a balancer. Different levels of aggressiveness for the balancer would be great. Especially since the cluster could be over WAN, with nodes or sub-clusters going offline a lot. So there should be balancing schemes for LAN and WAN.

So how would I use the above?
I would have a company-wide cluster where everyone can access their home directories from everywhere. The cluster would have sub-clusters which represent the different sites. Those would be connected over WAN. Inside each sub-cluster there would be nodes which are locally connected over LAN. I don't want to just have dedicated machines building up the cluster, but also each and every "client", which is why the client-side quota limit is important in my opinion.

I hope this is the same vision that you have for ipfs-cluster. I think it's a pretty common use case for a distributed filesystem.

Reading List

I figured we should put a reading-list together to cover a lot of the concepts relevant to cluster. LINKS ONLY PLEASE, don't add files.

We'll want to touch on:

  • erasure coding/FEC
    • reed-solomon codes, tornado codes, raptor codes, etc
  • replication models UX
    • RAID models
    • geographical partitions
  • rebalancing contents
    • self-healing (erasure coding, etc)
    • tuning for demand (measured, expected by predictive model, or expected externally)
    • to deal with speed delays
  • consensus
    • distributed consensus
      • paxos, raft, etc.
      • intra-datacenter (<1ms latencies)
      • inter-datacenter (>10ms latencies, regional data locality needs)
      • global (>100ms latencies, ...)
    • byzantine agreement (consensus)
      • traditional: byzantine paxos, pbft, etc.
      • modern: blockchains (PoW, PoS, ...), FBA, HBA.
    • resource-heterogeneous consensus networks
      • latencies, message complexity
      • handling slow peers (demoting to follow, expulsion, etc)
  • representing pinsets
    • HAMT/CHAMP pinset
    • Pin an IPLD Selector

Sharness tests

CLI apps should be tested by sharness, at least to check that they are not utterly broken.

Upgrading and backups

There are a number of pain points if the consensus state format changes upon an upgrade.

Currently, persistence is obtained via Raft snapshots which are loaded on boot and written on shutdown (at least). Raft snapshots format is coming from go-libp2p-raft FSMSnapshot implementation, which is just a serialization of the state using msgpack.

If the state changes, loading the snapshot is likely to break. Also, this format is unreadable to the user and hard to work with.

A few thoughts about tackling this:

  • ipfs-cluster should store a readable, easy to work with, json backup of the state on shutdown. [DONE]
  • There should be a pain-free procedure (at the very least documented) to load and old backup onto a newer State (a migration). This is potentially tricky on a large cluster:
    • Raft snapshots need to be removed [manually?] or Raft will attempt deserializing them on top of a newer version of the state.
    • Raft may also replay log entries on boot, which does not make sense if snapshots have been removed, so probably all Raft data needs to be cleaned up upon upgrade.
    • If Raft starts with a clean state after an upgrade, the new state could load the old backup and migrate it to the new format. With that state, we could directly run a rollback operation (which replaces the whole clean Raft state with the migrated version). However go-libp2p-consensus rollbacks are not very specific and this would work only because it's the way go-libp2p-raft does it at the moment.
    • An alternative it is to let cluster replay and commit every entry on the old state. This might be slightly cleaner. Replaying would need to make sure allocations remain the same as in the imported state.

Onboarding Question Collection

Here are a few questions I have had at various points while working out how ipfs-cluster works.

What exactly are peers communicating to each other?

What is the division of labor between an ordinary peer and the cluster leader? What extra work does the cluster leader do?

What is the purpose of bootstrapping in ipfs-cluster-service? Is this the way for a single node to begin its own cluster?

What does the comment "// The only way I could make this work" in ipfs-cluster/main.go's init function referring to?

What is the purpose of the --leave flag and why is it possible for a node to be considered part of the cluster when its ipfs-cluster-service process is no longer running? Why is --leave not default? It sounds like if a node does not leave on shutdown it can lead to problems, so what is the beneficial use case of leaving it in?

From context clues it seems like ipfs-cluster in the future hopes to provide options for consensus, ipfs connections and monitors? Is this the case? If so what is the purpose of having these options?

ipfshttp appears to communicate with an ipfs daemon listening a local port through an http api. Does the ipfs daemon have other methods of communication and its just a matter of cluster not implementing clients? I suspect this is the case because the ipfsconn folder seems to abstract away ipfs connections from an http api, however the service binary directly calls ipfshttp, so maybe the ipfsconn/ipfshttp hierarchy was intended to mean something else.

Thanks, and more to follow!

Use-case: data archival on a large scale with a volatile network.

At Climate Mirror, we deal with a large amount a data on a few, unfortunately centralized servers. However, a new initiative, Our Data Our Hands (ourdataourhands.org), hopes to shift the burden of storing data to everybody and their grandmothers' computers. We hope to accomplish this by providing docker containers that, once started, join a cluster of similar peers and contribute storage space. We will also sell pre-rolled hardware with large hard drives that can help stabilize the network.

In order to accomplish this, we need a few things:

  1. Restricting the cluster pin function by using some form of signed commands, so that only team members can pin things.
  2. Ideally some form of a (t,n)-threshold signature scheme when pinning, so a threshold of authorized team members must approve of a pin before it's added.
  3. We are counting on a significant number of nodes dropping out/rejoining with some frequency (up to say a quarter of nodes could randomly drop and then rejoin), so replication should be high.
  4. We need to be able to identify bad actors and drop them (I understand this can be done with Merkel DAGs, y'all's favorite).
  5. We need to be able to scale this cluster, ehem, bigly, and be able to handle peers dropping and rejoining with frequency. Raft might be a problem.
  6. Some peers will be able to contribute a gigabyte of storage, others two terabytes.
  7. Older peers = more reliable = more assignments--this should be possible via consensus.

Some of these are dreamy (threshold signatures to keep us from creating a massive botnet under one person's control) but others are fairly important, like 1, 3, 5, and 6.

Peer allocation strategies: Disk-space Informer and Allocator and other strategies

The replication factor feature (#46) is ready (as described in the Captain's log). This adds the possibility of adding different pinning strategies.

Currently we only have a dummy numpin Informer and a numpinalloc PinAllocator for it.

It would be really useful to have other informers, which fetch different metrics, and other allocators which implement different strategies, for example a disk-space metric and allocator.

They just need to implement the Informer interface and the Allocator interface respectively. The existing examples show how this is done in a simple way.

IPFS cluster applications namings

at some point, move ipfs-clusterd (that should be the tool name) into its own repo (ipfs/go-ipfs-clusterd). this doesnt have to be now, but let's keep this repo for prototyping and general discussion. ipfs-cluster will mean a few diff repos.

Improve docs

This wraps documentation in general:

  • Document log levels and how to debug
  • Extend README with instructions on how to set up a Cluster
  • See if it would help to make a video
  • Ascii diagram of components
  • Make sure it is clear how to add new members

Pins should support a replication factor

A Pin should not need to be pinned in every cluster member. We should be able to say that a pin needs to be pinned in 2, 3 cluster members.

We will start with a general replication factor for all pins, then maybe transition to replication factor per-pin.

These are thoughts for the first approach.

Replication factor -1 means Pin everywhere. If replication factor is larger than the number of clusters then it is assumed to be as large.

Pinning

We need a PeerMonitor component which is able to decide, when a pin request arrives, which peer comes next. The decision should be based on pluggable modules: for a start, we will start with one which attempts to evenly distribute the pins, although it should easily support other metrics like disk space etc.

Every commit log entry asking to Pin something must be tagged with the peers which are in charge. The Pin Tracker will receive the task and if it is tagged itself on a pin it will pin. Alternatively it will store the pin and mark it as remote.

If the PinTracker receives a Pin which is already known, it should unpin if it is no longer tagged among the hosts that are in charge of pinning. Somewhere in the pipeline we probably should detect re-pinnings and not change pinning peers stupidly.

Unpinning

Unpinning works as usual removing the pin only where it is pinned.

Re-pinning on peer failure

The peer monitor should detect hosts which are down (or hosts whose ipfs daemon is down).Upon a certain time threshold ( say 5 mins, configurable). It should grep the status for pins assigned to that host and re-pin them to new hosts.

The peer monitor should also receive updates from the peer manager and make sure that there are no pins assigned to hosts that are no longer in the cluster.

For the moment there is no re-rebalancing when a node comes back online.

This assumes there is a single peer monitor for the whole cluster. While monitoring the local ipfs daemon could be done by each peer (and triggering rebalances for that), if all nodes watch eachothers this will cause havoc when triggering rebalances. The Raft cluster leader should probably be in charge then. But this conflicts with being completely abstracted from the consensus algorithm below. If we had a non-leader-based consensus we could assume a distributed lottery to select someone. It makes no sense to re-implement code to choose a peer from the cluster when Raft has it all. Also, running the rebalance process in the Raft leader saves redirection for every new pin request.

UX

We need to attack ipfs-cluster-ctl to provide more human readable outputs as the API formats are more stable. status should probably show succinctly which pins are underreplicated or peers in error, 1 line per pin.

Meeting Notes - 2016-07-01 17:30Z

ipfs cluster meeting on 2016-07-01 17:30Z

Participants:

Agenda

  • rough requirements
  • rough design
  • list out interfaces/components
  • figure out next steps

We itemized all the pieces we need to build, and figured out next steps.

Interfaces / Components

These are all the components that we need to define or build. We need to:

  • translate these lines into issues describing them
  • do the design for all the interfaces
  • expand out what building them entails
  • figure out dependencies between them
  • figure out priorities
  • figure out time estimates

Links:

Tools

  • ipfs-clusterd tool
    • exposes IPFS API
    • API?
    • what does the CLI look like?
  • ipfs-clusterctl tool
    • what does the CLI look like?
    • look at analogs in kubernetes, docker, and other systems
    • figure out authentication (look at how others do it)

APIs

  • ipfs api state changes (figure out what api commands change state, consensus)
    • read/write
  • state replication configuration (UI)
    • RAID
    • (look for other analogs)
    • network/erasure coding
  • state replication data structures (implementation)
    • what does the distributed pinset look like
  • clusterd-to-clusterd protocol
    • what needs to be defined? (beyond consensus)
    • what do we put on consensus? (is it just replicating an op-log?)

libp2p pieces

  • libp2p consensus modules interface
    • raft
    • pbft
    • ethereum
  • libp2p transports
    • p0: clusterd can talk to other clurterd's directly
    • p1: mountable transports / pipes
    • p2: exo transport

Dev/Sys Ops

  • containerization
    • containers? pods?
    • load balancing
  • test harnesses
    • scenario driven tests
    • large network tests
    • CI to test clusters
      • team city?
      • snap-ci?

Other Things

Use Cases

We need to create use case scenarios

  • deployment scenarios
    • write out 6 different scenarios that we want to cover
      (along with the tech to use)
      (translate into modes of operation)
  • business use case scenarios
    get feedback with how people would use this tooling
  • Look at other tools for inspiration
    • Cluster managers (hashicorp, coreos, docker stuff)
    • FileSystems (GlusterFS, Ceph)
    • Replication (RAID)
    • d/ctl tool UX (coreos, hashicorp)

PMing

We need to do the following

  • task breakdown
    • knowledge requirements explicit
  • task dependencies
  • Roadmap / timeline
  • allocations

Next Steps

  • translate pieces into own spec issues
  • do the PM work (tasks, dependencies, allocations)
  • figure out roadmap for pieces
  • figure out other people interested
  • meet again next week 2016-07-06 17:00Z

What is peer behaviour on cluster shutdown/restart?

When there is cluster with let's say two peers and one of them or both gets shutdown for some reason, do peers stay remembered and connect automatically on startup?

I suppose they are automatically saved into service.json file in cluster peer section?

Thanks !

Pinning rings

Users should be able to start an ipfs-cluster node and have it join a pinning ring, that is, an existing set of nodes. These nodes would be archiving some interesting material for the participants. The newcomer should have an easy way to join the effort.

For this to work:

  • The user should just provide a peer ID and the multiaddress where the node is available
  • A ring administrator (or just let an existing participant do it?) adds a new peer to the cluster
  • The new member should be able to autodiscover all members of the ring and join the consensus
  • ipfs-cluster starts storing content on the new node
  • Interesting to explore the idea of observing members (ipfs-cluster nodes which can observe and track pins but do not have the right to become leaders or perform remote rpc requests).

Current state and considerations:

  • It is assumed that an ipfs-cluster is a set of nodes fully managed by the same administrator, while this proposal implies regular John Does providing nodes to a pinning ring.
  • ipfs-cluster cannot add new nodes dynamically, but this is coming soon
  • An ipfs-cluster node cannot autodiscover other nodes, it picks them up from the configuration file.
  • ipfs-cluster updates would be difficult to perform across the pinning ring if nodes depend on different individuals.

The main key here is to understand what is the trust model in a pinning ring, how a pinning ring member gets trusted and loses the trust, and who can take those actions.

Connectivity graph

Need an easy way to list all libp2p nodes involved in cluster (members and IPFS) and see what's connected to what (ideally everything is connected to everything).

End to End testing and benchmarks

Need to automatically bring up a real cluster, in a real cloud/hw, with real IPFS to and perform a number of standard cluster workloads.

Any measures extracted from these tests can be used as future reference regarding the performance of the Cluster.

Users should be able to lend their disk space.

One way I'd help with IPFS' adoption would be to lend 100-500GB of my spare hard disk space. I'd like to simply be able to start up an IPFS software piece and instruct it to be in "lending mode" -- I don't care what gets hosted on my machine if it helps the network, to put it bluntly.

EDIT: I can't see a way how to apply the user-story label here.

Configuration files location and format

  • Study configuration location change to $(pwd)/.ipfs-cluster

I dont like this. It's relative, prone to user error, departs from IPFS convention. If anything configurations live in /etc/ in any standard system. Local configs are usually in .config/<app-name> these days.

  • See if others care about using Go CamelCased keys, as IPFS does, for configuration entries.

I prefer the javascript style because that's the usual with JSON. Also related to how API responses. Look.

  • Handle panics/errors when configuration is really invalid (empty node ID etc).

  • Rename ipfs_port to ipfs_node_port and so on

  • Use multiaddress format like:

    ```
    "ipfs_cluster_api": "/ip4/127.0.0.1/tcp/9095/http",
    "ipfs_node_api":    "/ip4/127.0.0.1/tcp/5001/http",
    ```
    

Dependency management

  • make sure we use go get -u gx use $(shell which gx) or download to local install
  • Fork dependencies and lock versions with Gx. dont go get deps

Autodiscovery - autosetup of cluster members

Given a single existing cluster member, a new cluster node should be able to set itself up, retrieve and connect to all members of the cluster.

Note the trickiness of this:

  • At least some components are not ready to work or should not attempt to work (consensus) before receiving the list of peers
  • It's not as easy as connecting to someone and retrieving current cluster members (i.e. would fail when starting a bunch of new members at the same time). Also involves broadcasting that a new node is available.
  • Need to give more thought. Should aim for something simple and straightfoward. Errors during automated setup are the worst.

Eliminate random test failures in Travis

Travis tests fail sometimes in random places (usually tests around replication). It has usually proven useful to increase delays, but should really look closer into it.

Some user feedback from a test session

Here's some feedback from a user session. we tried it a bit-- got stuck then we moved to something else, but at least we got some info.

  • Trying out ipfs-cluster
    • ipfs-cluster git repo should point to binaries
    • there is no install.sh script
      • maybe we should follow our standard thing (like go-ipfs and other tools do it)
      • but may be fine, after all this is meant more for servers
      • could call it install-local.sh if install.sh is misleading
  • got a cluster up super easily, great!
  • ctl := ipfs-cluster-ctl
  • service := ipfs-cluster-service
  • ctl id
    • should multiaddrs end with ipfs-cluster?
      • no, they shouldn't, just a libp2p endpoint
      • this should probably be mounted on the child node's libp2p node
      • go-ipfs does not yet support this. so this is fine as is now.
  • ctl status may want to be ctl pin status
  • ctl peers ls looks good
    • 2 cluster nodes connected
    • wish it gave me more info on connectivity (who's connected to who?)
      • but this may be a security issue-- meaning, in byzantine setting it doesn't. (leaks info, etc)
    • child ipfs nodes directly connected (and to the public network)
    • it errors if cluster peer not online.
  • ctl pin
    • 20s delay on pinning a small file... not clear why. (how would we diagnose?)
  • INFO log on pin may want to say when it has been replicated to everyone
    • consenus replicating the entry is not enough, we want to replicate the data too
    • looks like right now consensus is clearing (committed and done) after the pin entry
      • should maybe wait for data to be replicated / committed before clearing it in consensus
      • or maybe use 2 phases/ops:
          1. pin (commits the pin to whole cluster),
        • and 2) - pin complete (the pin has actually completed to enough of the cluster to count as consensus)
  • added another peer, bad times
    • somehow got into a bad state, cannot connect peers
    • ctl peers ls on the leader takes a while
    • ctl peers ls does not say what address using to connect to peer.
      • important for debugging
    • we couldn't get the peers all talking to each other
  • tested loosing a network interface (cable)
    • not connecting to peers over other interface
    • finally connected
    • looks to be libp2p issues.

Meeting notes - 2016-07-08 17:00Z

Participants:
@whyrusleeping
@hermanjunge
@jbenet
@christianlundkvist

Notes

daemon startup:

  • read config (whats in the config? do we even need a config?)
  • join cluster (clusterID)
    • autodiscovery mechanisms?
      • mdns
      • ipfs service discovery
    • manually specify hosts
  • get pinset (how? ask master?)
    • question: are pinsets replicated across all nodes?
      • pinset could just be a single hash, consensus to agree on that hash
    • when new nodes join, how are existing pinsets re-striped?
      • at first, this could be static. no restriping
  • once you have pinset, ensure you have all content specified
  • serve API

Api endpoints:

  • control port
  • 'ipfs' API port
  • inter-cluster rpc port

dev timeline

  • step zero:
    • skeletons of clusterd and cluster-control
  • step one:
    • single node cluster
  • step two:
    • many nodes, fully mirrored
  • step three:
    • fancier striping option

Tasks:

#1 (comment)

Repo Breakdown

We'll need these other NEW repos:

ipfs

libp2p

  • https://github.com/libp2p/consensus-interface - the consensus interface
  • wrap ethereum (geth?) in libp2p consensus interface
  • wrap raft in libp2p consensus interface
  • libp2p exo transport interface
  • libp2p mounted transport/pipe interface

may need others.

Erasure Coding Layer

I thought up a possible architecture for a Reed-Solomon (or other erasure coding algorithm) layer on top of IPFS. My notes are here. @hsanjuan informed me that ipfs-cluster was already a tool that was planned, and that this architecture could slot into ipfs-cluster, so I should raise an issue here. One of the key points of the system is that there would be IPFS nodes that provide IPFS files that they do not have locally, but instead have to generate by accessing other files from the IPFS network and re-combining them.

Notes from trying out ipfs-cluster for the first time

This is a quick rundown of a first-ever-user trying the ipfscluster tooling. Take it with a pile of salt: i noted down everything i ran into, including a bunch of problems that are obviously just polish not meant to be there yet, and other stuff which is probably meant to be experimental or a shortcut for now. I noted everything i ran into to give feedback on what was intuitive and what wasn't, first reactions, etc. My proper review is just beginning-- this is just a first stab at playing with it.

Connectivity: the good feedback so far.

Already, I think some work can be done on the connectivity side of things. Here's the basic points. maybe these can be extracted to separate issues later, but keeping these here to retain the context.

  • check connection to child ipfs node, and see what's going on there. (it wasn't clear to me that the communication was working. it was, i just didn't know how or what feedback to expect for correct or incorrect behavior)
  • ideally ipfs members ls or something like it would show connectivity status, particularly both:
    • list cluster nodes ids (this is done) #15
    • list cluster nodes' child ipfs nodes' ids #15
    • whether the cluster nodes are currently connected (or disconnected) #15
    • whether the child ipfs node of this cluster is connected to the child ipfs node of the other cluster
    • should have an option to view multiaddrs of the connected entities. (this could just be done by supporting the ipfs swarm command. in reality, that should be a libp2p thing, so maybe we can lift that command out of go-ipfs and into go-libp2p, so that ipfscluster may take advantage of it). #15
  • help ensure connectivity:
    • should discover other cluster nodes from each other-- i.e. if i connect: A -> C, and B -> C, A and B should find out about each other and connect to each other. #16
    • given A is connected to B and C (cluster nodes), A should tell its child node (Achild) to try to connect to Bchild and Cchild. this can be done with ipfs swarm connect <multiaddr>. #16
    • in practice, this should bust through nats. as long as one cluster node behind a nat manages to connect to a cluster node outside, it should be able to find other cluster nodes, and their child ipfs nodes and connect to everyone.
  • we need some command to output the connectivity graph of all the components, to see who is connected to whom and what's not connected. that way we can determine why something may not be working. Basically, figure out all libp2p nodes involved, and who they're connected to. Can use that to check connectivity in the graph of nodes (cluster nodes, child ipfs nodes, etc). #17

Also, some docs on what log levels or log modules i should listen to to figure things out would be good. Eg if i want to debug connectivity, or the consensus stuff, or the interactions with the ipfs connector, what level should i --loglevel. may be good to support per-module support (i think go-ipfs has this with an ENV var or something, i dont recall. it's useful to isolate a module and hear its debug output only). #18 #19

Trace of trying out ipfscluster

  • good looking readme.
    • could be more extensive. dont see how to setup a proper cluster #19
    • maybe could have a video or something #19
  • good architecture doc. (will review properly later)
  • naming
    • ipfscluster-server = awkward
      • ipfs-cluster-server or ipfsclusterserver?
      • ipfs-cluster-service? ipfs-cluster-daemon?
  • install
    • git co devel -- but that's fine. PR to master.
    • make install failed:
      • go get -u gx. use $(shell which gx)? or download to a local installation. #20
      • go get -u the pkg deps. i had older versions of some packages which did not build.
        • if it had built but been an older, buggy, wrong version and failed silently, it could have been potentially much worse and time waste for me. lucky it did not compile. #20
      • also, don't use a global go get -u -- why should i mess up my system for you? #20
        • i know this is idiomatic to Go, but it's not idiomatic to IPFS.[0] #20
        • we are the people that use hash-linking to make sure we have what we intended to have. we should use that to our advantage here.
      • go get's on other packages do not use gx. we should version lock. ideally with gx.
      • gx install failed:
        • go-ipfs got stuck... i think it's a bitswap bug.
        • failed 3 times, wow... :c -- but hey, reproducible go-ipfs failures! yay! \o/
  • ipfscluster-server -h
    • well written, clear, helpful
    • yay for config path
      • mayb euse --config PATH or --config FILE. it's not any string. #21
    • ideally would fit our other tooling help style. (USAGE, COMMANDS, OPTIONS, etc). maybe see the cli help tool gx uses (i dont recommend the one go-ipfs) #21
  • i think we probably should default the config path to $(pwd)/.ipfs-cluster instead of $(whoami)/.ipfs-cluster, because in server side installations (more typical for clusters), users are not the typical place stuff is installed/stored. This is departing from the convention of go-ipfs and IPFS_PATH, but i think that's fine. #22
  • ipfscluster-server init
    • failed because it's lacking a dir, i think:
      > ipfscluster-server init
      error loading configuration: open /Users/earth/.ipfs-cluster/server.json: no such file or directory
      
    • ipfscluster-server init --config .ipfs-cluster/server.json did not work, took me a bit to realize the global flag had to be before the subcommand name.
    • ipfscluster-server --config .ipfs-cluster/server.json init worked as before (back to failing same way as above).
    • mkdir .ipfs-cluster is not enough. still same error.
    • touch .ipfs-cluster/server.json gets further, but crashes the process:
      panic: multihash too short. must be > 3 bytes #22
      
      goroutine 1 [running]:
      panic(0x682b60, 0xc420074b20)
        /usr/local/go/src/runtime/panic.go:500 +0x1a1
      github.com/ipfs/ipfs-cluster.NewMapPinTracker(0xc4201ef130, 0xc42016baa0)
        /Users/earth/go/src/github.com/ipfs/ipfs-cluster/map_pin_tracker.go:45 +0x370
      main.main()
        /Users/earth/go/src/github.com/ipfs/ipfs-cluster/ipfscluster-server/main.go:134 +0x221
      
    • maybe it's because it's not json? but echo '{}' >.ipfs-cluster/server.json did not work. same error.
    • Ohhh init is an option, not a command! (ipfscluster-server -init, not ipfscluster-server init). that took a while.
      • not sure why i didn't notice this before. i'm biased to init being a subcommand. (git init, ipfs init, etc.), and to "commands" being in subcommand notation, not options. (i know golang flags isn't good about this.)
      • it probably should be a subcommand. #21
      • not a big deal anyway, this is a minor detail. but probably important to figure out for other subcommands too (look slike just version).
    • "try deleting it first." maybe ask user to use -f to overwrite. having to rm manually is annoying. and it's less automation friendly. #21
    • ok yay it worked!
    • maybe the config file keys should be Go-style (CamlCased), as our other config files do it.
      • this is a stylistic choice. i dont care much about it. others may. #22
      • i do prefer consistency though in style #22
    • ok on to setup a cluster.
  • ipfscluster-server init & run on 3x machines.
    • looks like i have to edit the configs manually to add members?
    • wait let's look at ipfscluster
  • ipfscluster has a short command listing, nice.
    • but the help is not too long. and it should have it too.
    • maybe this belongs under COMMANDS section in -h.
  • adding a member
    • looks like ipfscluster member lacks ipfscluster member add. #23
    • hmm i guess it's just a multiaddr? but to what, go-ipfs?
    • it's not clear to me whether i should point to the go-ipfs node or directly to the other ipfscluster-server instances. yeah probably the latter, the config file shows a "api_port" which is different from "ipfs_api_port".
    • Ok, so it looks like ipfscluster-server is going ot use whatever available ipfs node it finds.
      • maybe it should manage its own? (i originally thought it would work this way).
      • it should be configurable either way. i probably should be able to just use the ipfscluster tool to do everything. (launch the ipfscluster-server, launch ipfs daemon, a local one or a global one.).
  • these names are confusing, see comments: #21
    js { // ... "ipfs_api_addr": "127.0.0.1", "ipfs_api_port": 9095, "ipfs_addr": "127.0.0.1", "ipfs_port": 5001 // this is an ipfs_api, so hard to distinguish from ipfs_api_port. // maybe it should be "ipfs_node_port". #22 // ... }
    • Also, don't use separate addrs and ports, use a multiaddr. The IPFS API is not at all guaranteed to be served over HTTP or TCP. it could be over gRPC over a unix domain socket, or utp, or whatever. maybe use:
      "ipfs_cluster_api": "/ip4/127.0.0.1/tcp/9095/http",
      "ipfs_node_api":    "/ip4/127.0.0.1/tcp/5001/http",
      
    • this isn't good yet-- something is missing. a distinguishing word instead of "node". (becauser the cluster one is a node, and the cluster one is really the cluster api).
    • "underlying_ipfs_node_api" is more clear, but long. and "underlying" is not that good of a word. maybe we should clarify the relationship between the cluster (and the ipfs-node it represents) and the sub ipfs-node. maybe "parent/child" works for this, because it works with the tree recursive structure?
      "cluster_node_api": "/ip4/127.0.0.1/tcp/9095/http",
      "child_node_api":    "/ip4/127.0.0.1/tcp/5001/http", 
      // or
      "parent_cluster_api": "/ip4/127.0.0.1/tcp/9095/http",
      "child_node_api":    "/ip4/127.0.0.1/tcp/5001/http", 
      
    • We should consider having an ascii diagram (like the illustration we have from keynote) to convey the components and tree structure in the help text, so that users know what's going on. (@diasdavid is pretty good at ascii diagrams.) #19
    • ok back to adding a member
      • I think I have to link to it manually from the config, in cluster_peers. but now i'm not sure how.
        • If the config doesnt take multiaddrs for the api addresses, what does it take for the cluster_peers array? #24 #19
        • oh, code says multiaddrs. yay.
      • Ok, but what format of multiaddr? /ip4/127.0.0.1/tcp/9095/http? or /ip4/127.0.0.1/tcp/9095? or what?
        • Code looks to be parsing an IPFS multiaddr. (so like /ip4/127.0.0.1/tcp/9095/ipfs/Qmfoobarbazpk...).
        • But is this supposed to point to an ipfs node, or to the peer's cluster node? Oh, is it already using libp2p to mount itself as a libp2p protocol? (dont think so as the corenet stuff isnt merged yet...).
        • Oh got it, cluster creates its own libp2p Node, and connects between the peers using libp2p.
    • reading code to get a better understanding.
      • go-libp2p-rpc is pretty cool
      • ok so it seems we have a libp2p network stack, and a go-libp2p-rpc protocol mounted on top, and that's how cluster nodes talk to each other.
      • hm there's some abstraction/byzantine questions there (eg why does the node call a func on the leader instead of sending a message, etc). #25
      • maybe this is an rpc for clients, and not between cluster nodes? -- No, the docs say "enables components and members of the cluster to communicate and request actions from each other"
      • [got distracted; went to sleep]
      • [return next morning]
      • we must make sure the cluster node to cluster node RPC model fits in a byzantine environment. right now it seems that directly calling a function on the leader is setup to fail in the byzantine case. Also what about leaderless consensus protocols? Many consensus protocols do not use leaders. It seems there's a conflation here between what is the role of the consensus protocol and what is the role of the ipfs-cluster protocol (which right now includes both consensus protocol updates AND direct node-to-node RPC calls). I think we should be able to abstract out all inter-cluster-node communication to ONLY operating on the consensus log. I may be wrong about this, i need to dig into understanding the need for the RPCs better.
      • why does the consensus interface export a Leader() function? that should be a subclass of "Leader-based Consensus Protocols" if anything. #25
      • Is this intended as a shortcut for now (fine)? or is this meant for the long run? (not fine, we need ipfs-cluster to operate on top of leaderless consensus protocols) #25
  • ok, back to adding a member (third time's a charm!)
    • I think i need to construct a multiaddr by combining server.json's "/ip4/%s/tcp/%s/ipfs/%s", config.api_addr, config.api_port, config.id.
    • would be nice if there was a command like ipfscluster id similar to ipfs id. #21
    • i wonder what the ipfs peer id of the entire cluster is, or where that is defined.
      • the id listed in the config seems to be the id of the libp2p ipfs cluster node, which is neither the id of the child ipfs node, NOR the simulated cluster ipfs node, it's a third libp2p node (which is fine), just clarifying for myself/readers).
      • reminder: switch libp2p over to use /p2p protocol prefix, not /ipfs, would be less confusing here.
    • ok, so wrote this tool: https://gist.github.com/jbenet/7007202501fc1c4eb623327e5328cb9d
      • i think it's actually config.cluster_addr, config.cluster_port that we want for this.
      • works: /ip4/0.0.0.0/tcp/9096/ipfs/QmbGvizLZHVWto8ZWU2tbkNcV6W92G6AggKdPfx5gFbLZz
      • oops, 0.0.0.0. need the actual ip addr. nvm, should've added this to ipfscluster. let's do it manually for now. #21
    • yay! got them to connect. sweet.
    • --debug logs are sweet, shows tons of activity.
    • btw i dont think disconnections and reconnections are graceful. killing one node and starting it again shows some errors. not sure if just notices, or actually problematic. we'll see. #26
    • also ^c on one node, then ^c on the other hangs, looks like it's trapping the exit signal and waiting for the other members to respond, so it's stuck. cant kill it, only kill -9.
    • ipfscluster member ls sweet.
      • yeah, killing one node and then doing member ls still shows both, but should only show one now. (or signal who is online and who is not) #15
  • connected, ok now let's pin.
    • ipfs pin add fails:
      // ipfscluster
      Error 500: leader unknown or not existing yet
      ---
      // ipfscluster-server logs
      15:14:24.627  INFO    cluster: pinning:QmcskskhwkUFh1vvZbGFhBJhVMvzg6Hx44niysaoiiQGVt cluster.go:275
      15:14:24.627 ERROR libp2p-rpc: leader unknown or not existing yet client.go:125
      15:14:24.627 ERROR    cluster: sending error response: 500: leader unknown or not existing yet rest_api.go:396
      
    • ok then, maybe i need a third node for there to be a raft leader. let's add a third.
    • can we just add a third to one node and that will propagate info about the cluster? ie, will the other two nodes find each other? #24
    • nope. adding a third only adds the third to that one node who knows about the other. ipfscluster members ls shows (2, 2, 3), instead of (3, 3, 3).
      • this makes startup tricky, particularly because of reconnection correctness uncertainty.
      • ok link like this:. 1->[], 2->[1], 3->[1,2].
      • looks like they all have each other. ipfscluster members ls shows (3, 3, 3)
      • looks like we'll need a "inspect connectivity" command that figures out the cluster connectivity and prints it (possibly graphs it in d3), to make sure our clusters are finding each other and well connected (n^2). #17
      • and we'll have to figure out how node discovery should work. could plug a node discovery protocol directly into the cluster's libp2p node. OR do it at a higher level (probably safer...). #24
  • ok let's try pinning again.
    • seems to have worked:
      > ipfscluster pin add <cid>
      Request accepted
      ---
      // ipfscluster-server logs
      15:21:08.581 ERROR libp2p-raf: QmbGvizLZHVWto8ZWU2tbkNcV6W92G6AggKdPfx5gFbLZz: Pipeline error: EOF transport.go:716
      // is this bad? o/
      15:24:19.916  INFO    cluster: pinning:QmcskskhwkUFh1vvZbGFhBJhVMvzg6Hx44niysaoiiQGVt cluster.go:275
      15:24:19.963  INFO    cluster: pin commited to global state: QmcskskhwkUFh1vvZbGFhBJhVMvzg6Hx44niysaoiiQGVt consensus.go:267
      15:24:20.348  INFO    cluster: IPFS object is already pinned: QmcskskhwkUFh1vvZbGFhBJhVMvzg6Hx44niysaoiiQGVt ipfs_http_connector.go:205
      
    • now, let's verify.
    • ipfscluster pin ls. #2 and #3 have it, #1 does not. probably that pipeline error, got disconnected... but ipfscluster members ls still shows 3 for everyone, but i think #1 disconnected.
    • killed and restarted #1. #1> ipfscluster members ls shows just 1. ok need to reconnect #2 and #3.
    • killed #3 (connected to #1 and #2) and restarted it, oops panic: https://gist.github.com/jbenet/55697b749e9f99d2ebf59ae51083bf51
    • killed #2. 2016/12/30 18:30:23 [INFO] snapshot: Creating new snapshot at /home/jbenet/.ipfs-cluster/data/snapshots/421-8-1483140623423.tmp took a while.
    • started #2 (#1 and #2 on, #3 off). ipfscluster members ls shows (3, 2). probably from #3 before it paniced. ok restart everything.
      • start #1. #1> members ls -> (1)
      • start #2. {#1, #2}> members ls -> (1, 2), #2> members ls -> (1, 2).
      • start #3. {#1, #2, #3}> members ls -> (1, 2, 3), ok all set. (3, 3, 3).
    • ok, i THINK #1 should automatically catch up with the others, and get the pin, no?
      • Woooh! yeah! \o/ it does!. let's inspect the ipfs nodes manually, and verify the pins.
        • ipfs pin ls <cid> ... stuck in 2 machines. looks like it's iterating over the entire damn pinset, hanging the machine...
        • ipfs refs local | grep <cid> shows it in #3 (where it was added), but not on #2 nor on #1. looks like the cluster server knows about the pin, but it did not translate to the child ipfs node i had running in those machines... so it did not pin it.
        • hmm the ipfscluster-server logs should maybe show whether it found + can connect to the child ipfs node.
          > ipfscluster status
            cid: QmcskskhwkUFh1vvZbGFhBJhVMvzg6Hx44niysaoiiQGVt
            status:
                QmTHEzZHGTSiVFFM2h3TgFCSsp2Ecq82U6heAxK7jJRijF: (#2)
                    ipfs: pinning
                QmUmQ2DRe2keGN8meXXLWjUgGbyiBLPWJFXGi4c2kfDGJb: (#1)
                    ipfs: pin_error
                QmbGvizLZHVWto8ZWU2tbkNcV6W92G6AggKdPfx5gFbLZz: (#3)
                    ipfs: pinned
          
        • ok i have verified that the child ipfs #3 is directly connected to the child of #1, but not #2.
        • ok manually connected #3 to #2 (using ipfs ping).
        • something happend in #2's logs. ipfscluster status now shows pinning -> pin_error on #2.
        • okay, i wonder if it's stuck on pin_error forever, or whether the cluster will try to get the node to repin.
        • right now the cluster is in a failed state: the consensus log advanced to track the cid, but 2/3 of the nodes have not pinned. not sure if they will retry, or just get stuck in the failure.
        • let's try issuing it again...
        • same thing. #2 says: 18:46:41.588 WARNI cluster: IPFS unsuccessful: 500: Path 'QmcskskhwkUFh1vvZbGFhBJhVMvzg6Hx44ed' not pinned
        • ok reboot #2.
        • woah, ipfscluster status paniced. then #2 paniced: https://gist.github.com/jbenet/e04c59731ce33a3522603efa7a22f3d3
        • very flaky.
        • now ipfscluster-server wont start. what? https://gist.github.com/jbenet/e04c59731ce33a3522603efa7a22f3d3
        • no idea why. maybe a dir is locked. or something.
        • http://grokbase.com/t/gg/golang-nuts/15315a1hhs/go-nuts-getting-runtime-cgo-pthread-create-failed-resource-temporarily-unavailable-crashes
        • no, maybe the os thread is hosed? or some other resource? may need reboot?
        • ok waited 2min and the os recovered.
      • killed everything, started everything.
      • still nothing, pins wont cross. maybe there's a bitswap bug here. let me reboot the ipfs nodes.
      • stopped all 3 ipfs nodes. stopped all 3 ipfscluster-server nodes.
      • started ipfs nodes. started ipfscluster-server nodes. ipfscluster members ls shows (3, 3, 3).
      • ok, check the pins. still say pinning...
      • ok manually connect #3 to #2 and #1, because #3 is behind a nat ...
      • woah great, now everything says "pinned"!
      • ipfs pin ls <cid> shows the pin on all 3! \o/ Yay.
      • that's awesome, it:
        • kept trying to pin the thing.
        • kept once it got it, the cluster figured it out, and finished getting the pin. \o/
        • i wont try messing with it (ipfs pin rm <cid> in the child manually, hoping the cluster will notice the pin fail).
    • ok let's try a whole directory.
      • yay! it worked just fine!
      • looks like ipfs connectivity problems caused the initial isses, which made ipfscluster-server connectivity problems worse.
      • and ipfs refs local | grep <cid> shows the pin. yay!
    • ok that's sweet. that's great to see.
    • how can i inspect the raft log manually? i guess ipfscluster pin ls
    • wait, ipfscluster pin ls shows the second pin, but no longer the first... the 2nd contains the 1st, but these should not be coalesced.
      • why? i may pin A, pin B, unpin B, and I expect pin A to remain, whether or not B contains A is irrelevant.
    • ok i pinned unrelated file C and now pin ls shows both B and C.
  • ok i'm going to stop for now.

Summary

  • total time playing with ipfs-cluster: around 4-5 hours.
  • i managed to pin things to the cluster. \o/ yay.
  • found some panics :0
  • ipfscluster recovered pinsets after rebooting. sweet!!
  • some feedback on some of the abstractions and tooling construction
  • lots of feedback on connectivity, seems to be the main source of problems i ran into.
  • lots remains to be reviewed + studied
  • i am already very excited about using it to track my personal content!! ๐Ÿ˜„
    • a bunch of tooling comes to mind that i want (get size of pinset, etc), but all that comes later.
  • intense correctness testing comes to mind. we need tests that do all sorts of things, particularly:
    • particularly messing with the underlying child ipfs nodes (i.e. making them drop pins, or removing content without removing the pin (simulate data loss in the child) to make sure the cluster can self heal.
    • we need tests with harsh connectivity settings: certain connections being impossible, certain connections being flakey, bandwidth limitations, etc. another thing to add to the testing lab queue.

Me when the pins succeeded:


[0] Notes on go packaging. (TL;DR: use gx-go. this is the expanded why use gx-go). Warning: this is a contrarian view in respect to the Go language. And this is a standard, sane view from package management, version control, and secure open source. Go packaging is designed for monolithic codebases (well tended sequoia) not open source (haphazard expansive brush forest). Go was designed at Google, baking into the language many of the software engineering practices of Google. In general this is a great thing. In the cases where open source != how google develops, it is not. Google has a single, huge tree of code, with atomic safe updates. You cannot merge something into the tree if ANYTHING across all (most) of google fails to compile/errors. Open Source is fundamentally different. There is no such atomic-safe-update gating. We cannot assume other people's systems are setup like ours, or that they want to update their tree to the version we require (running go get -u for the user may be harmful to them). Or that we know who is depending on our code (lots of private code may depend on our package). Or that whoever is running a package we depend on wont screw everything up by moving something or breaking an API. Go uses location addressing for package identification... not just inside a single dilligent org (which works really well) but in the broader internet (which can fail catastrophically). Despite years of heated arguments on this, the Go team has not yet understood this is a real problem (they washed their hands off of it by having an external committe handle it). But that's ok because we are the people who use hash-linking to securely address everything. Let's use it to our advantage!

Cluster should detect peer failure and re-allocate affected content

Currently Cluster can allocate content to a number of peers but will not detect failures and re-allocate in that case.

Allocation is based on metrics which are regularly pushed to the Leader. If the last metric from a peer is expired or invalid then the peer is not considered as an available allocation when pinning. When re-pinning content, this situation is also detected and a new allocation will be found, so that part is done.

The idea is then give the PeerMonitor the task of producing Alerts on a channel on which the main component is listening. When the PeerMonitor detects that a component is down (because, i.e. last metric has expired), it sends an alert. Cluster will then find which Cids are allocated to the problematic peer and re-trigger Pin operations for each.

This implies PeerMonitors should be made aware of current clusterPeers (or be aware themselves with the RPC API to the pinManager).

Pin IPFS Proxy API /add results in the same node where they are submitted

Some thoughts about it.

Currently we let the allocation strategy dictate where content should be pinned, even when it comes from an intercepted /add request in the ipfs proxy. That means that it could be allocated somewhere else, and thus, content might need to be transferred one more time than if one of those allocations was the peer in which it was added.

If we forced one of the allocations to be the same peer where it was added:

  • We would save some bandwidth in the cases where the node is not allocated the content
  • We would override the pinning strategy, disregarding allocation strategies
  • There would not be danger of GC running at the same time the content is being pinned somewhere else because it was just added (this is not good at the moment).
  • There would not be an unpinned copy of the content taking disk space in another node.

Note that, this can be also part of a pinning strategy in an allocator, where candidate peers are allocated content if they are already pinning it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.