A Pin should not need to be pinned in every cluster member. We should be able to say that a pin needs to be pinned in 2, 3 cluster members.
We will start with a general replication factor for all pins, then maybe transition to replication factor per-pin.
These are thoughts for the first approach.
Replication factor -1
means Pin everywhere. If replication factor is larger than the number of clusters then it is assumed to be as large.
Pinning
We need a PeerMonitor
component which is able to decide, when a pin request arrives, which peer comes next. The decision should be based on pluggable modules: for a start, we will start with one which attempts to evenly distribute the pins, although it should easily support other metrics like disk space etc.
Every commit log entry asking to Pin something must be tagged with the peers which are in charge. The Pin Tracker will receive the task and if it is tagged itself on a pin it will pin. Alternatively it will store the pin and mark it as remote
.
If the PinTracker
receives a Pin which is already known, it should unpin if it is no longer tagged among the hosts that are in charge of pinning. Somewhere in the pipeline we probably should detect re-pinnings and not change pinning peers stupidly.
Unpinning
Unpinning works as usual removing the pin only where it is pinned.
Re-pinning on peer failure
The peer monitor should detect hosts which are down (or hosts whose ipfs
daemon is down).Upon a certain time threshold ( say 5 mins, configurable). It should grep the status for pins assigned to that host and re-pin them to new hosts.
The peer monitor should also receive updates from the peer manager and make sure that there are no pins assigned to hosts that are no longer in the cluster.
For the moment there is no re-rebalancing when a node comes back online.
This assumes there is a single peer monitor for the whole cluster. While monitoring the local ipfs daemon could be done by each peer (and triggering rebalances for that), if all nodes watch eachothers this will cause havoc when triggering rebalances. The Raft cluster leader should probably be in charge then. But this conflicts with being completely abstracted from the consensus algorithm below. If we had a non-leader-based consensus we could assume a distributed lottery to select someone. It makes no sense to re-implement code to choose a peer from the cluster when Raft has it all. Also, running the rebalance process in the Raft leader saves redirection for every new pin request.
UX
We need to attack ipfs-cluster-ctl
to provide more human readable outputs as the API formats are more stable. status
should probably show succinctly which pins are underreplicated or peers in error, 1 line per pin.