Comments (31)
We do want to move away from supporting cidv0. Adding new features to it seems unwise if thats really our goal.
I'm not really sure what a valid usecase for this is, as @diasdavid said earlier, we should just upgrade them to cidv1 and provide an internal conversion layer in cases where it might be needed.
from go-cid.
However, to keep GC working correctly, we'd need to introduce a separate "linkstore" for keeping track of the local DAG.
I do not see why this will be necessary. Pinned hashes will still be stored by the full CID. The GC does not care about the structure of anything not pinned.
from go-cid.
The rules for displaying CidV0 in an alternative base can be as follows:
If the base is in 'base58btc' than just encode the multihash without a base prefix, otherwise prepend a byte of value '0x00' (the version 0 string) to the binary representation and encode that with a base prefix.
This should be fairly easy to implement, I can create a p.r. later today if it will make my idea clearer.
from go-cid.
@diasdavid Thanks for this. Was this tidbit of information should of been documented somewhere?
Agreed and I believe it was, at least 2 times. Post Lisbon Treaty and somewhere in a go-ipfs PR. It should have been, however, done through the IPFS Specs Repo.
Based on what you are saying all network requests for blocks should be made using the multihash only and not the full CID.
The CID can go on the wire (because of in the future, with graphswap that will be useful), however, the assertions should be done in the hash level. If the Repo was checking by Hash, that will be granted by default.
from go-cid.
@diasdavid you replied before I edited my sentence 😄
from go-cid.
I'm not sure, but maybe this is something that we can work through in the upcoming developers meeting. See ipfs-inactive/developer-meetings#16.
from go-cid.
I've been thinking about this too, for the case where we want to represent a CIDv0 in base32 in a URL. The less obvious thing here is that we probably need to be able to recognize such CIDs and convert them back to CIDv0, so that DHT and Bitswap return the expected results.
Do you agree this is a concern? @whyrusleeping was skeptical when I first brought it up a while ago.
It could be done with a multicodec header that says "what's in here is CIDv0".
from go-cid.
for the case where we want to represent a CIDv0 in base32 in a URL.
For those cases, we upgrade the CID to CIDv1 and base encoded it with base32.
from go-cid.
It could be done with a multicodec header that says "what's in here is CIDv0
Yes, and I think that should just be the byte '0x00' which is the variable-length encoding of the number 0. Hence the equivalent of setting the version number to 0.
For those cases, we upgrade the CID to CIDv1 and base encoded it with base32.
The ideas is we want to be able to retrieve a Cidv0 object using base32. Upgrading the CID will not work in this case.
from go-cid.
The ideas is we want to be able to retrieve a Cidv0 object using base32. Upgrading the CID will not work in this case.
@diasdavid what Kevin and I are thinking about is the next step after receiving a gateway request with a CIDv0-in-CIDv1: content routing and bitswap, where we need the original CIDv0 because that's what provider records are published for, and what's used as a key in the blockstore of the provider nodes. It's kind of a matter of backward compatibility: there will always be nodes who continue to create CIDv0 stuff.
from go-cid.
@kevina I'd go a step further and start using the CIDv1 formatted CIDv0s by default. I believe that will actually be less work and, given that we're planning on switching to base32 by default anyways, shouldn't cause any additional breakage. Does that make sense or am I missing something?
from go-cid.
@Stebalien What exactly will be a CIDv1 formatted CIDv0?
The base used to encode a CID is for the most part an interface issue. The base is is not stored in the key of objects stored as part of a link or in the blockstore. The Cid version and hash used is, on the other hand, part of the key.
You can't retrieve an Cidv0 object if it is reformatted to be Cidv1 and sometimes you may still want to store a Cidv0 key in another object.
from go-cid.
What exactly will be a CIDv1 formatted CIDv0?
The multibase-cidv0-multihash
sequence. That is, if we switch to base32 as the default, we'll now be returning base32-cidv0-multihash
from interfaces so we might as well start returning base58btc-cidv0-multihash
when the user requests base58btc.
from go-cid.
@Stebalien sorry I misinterpreted your reply (as I am prone to do :(
).
The idea behind not returning base58btc-cidv0-multihash
was (1) to provide only one way to represent a Cid in a given base and (2) to simplify the U.I. If we do switch to base58btc-cidv0-multihash
then we probably need to provide a way to get to the old style hash (for compatibility, if no other reason) and will need to provide an extra option for that. If we have the rule that base58btc
implies the old style hash string (no multibase-version) prefix we avoid this problem.
from go-cid.
I kind of agree... I'd just like to phase out non-multibase strings as much as possible as soon as possible. But, I guess, if we move to base32 by default we'll be doing that anyways.
from go-cid.
It turns out I made this more complicated then necessary. The CidV0 bit is NOT needed. Just prefixing a CidV0 string with a multibase prefix will already work as expected. For example
QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco
zQmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco
BCIQIZOL2ACTMSCQU5TYYFWBWGWB5IAXWTEM6HDYETDW5TWJKLQBKPNA
will all work already.
The only problem is this little bit of code (which gave me the impression that a CidV0 are incompatible with multibase)
func (c *Cid) StringOfBase(base mbase.Encoding) (string, error) {
switch c.version {
case 0:
if base != mbase.Base58BTC {
return "", ErrInvalidEncoding
}
return c.hash.B58String(), nil
case 1:
return mbase.Encode(base, c.bytesV1())
default:
panic("not possible to reach this point")
}
}
which should likely work on CidV0 string and return a string with the multibase prefix if the code is not 'z'.
Thus I propose we fix this function and call it a day. 😄
from go-cid.
Here's an update to the spec to take this into account: multiformats/cid#14
from go-cid.
@diasdavid what Kevin and I are thinking about is the next step after receiving a gateway request with a CIDv0-in-CIDv1: content routing and bitswap, where we need the original CIDv0 because that's what provider records are published for, and what's used as a key in the blockstore of the provider nodes. It's kind of a matter of backward compatibility: there will always be nodes who continue to create CIDv0 stuff.
There are two issues here. One is:
- a) where we need the original CIDv0 because that's what provider records.
- b) what's used as a key in the blockstore of the provider nodes.
a)
This was accepted as kind of fine since that is also a valid concern for when someone picks different hashes or bases for their content. Essentially, the content should be identified by it's multihash and not by a key that includes type and multibase. If the hash matches, it is the same content. If an app wants to interpret content "asdldasjdkla" as multicodec A and another app as multicodec B, that's fine. This way we ensure deduplication
b)
This is what I think will be the be crux.
From memory, the Lisbon Treaty (the 2~3 hack nights that me, @jbenet and @whyrusleeping spent figuring all of the details out), the actual solution for this problem was not to use CIDs as Keys in the Repo, instead, keep using the multihashes as it was before. This would solve this issue from the bottom because independently of reading CIDv0 or CIDv1, the repo would always pick up the block from the multihash and that would be the same for both.
Right now, go-ipfs uses base32.encode(rawCID) to get the keys for storing blocks in the repo. Eventually, js-ipfs followed so that both repos are compatible. However, I can't recall if there was a valid reason for this change. Please help me refresh my and everyone's memory if you know where that was discussed and approved.
Using base32.encode(rawCID)
also means that if I have a block that for an IPLD graph, one of the node is considered _Raw) (because, for that graph, it really doesn't care for those bytes nor it knows what it is) and for another graph, it consuming the same content but now it is a git Object, now we effectively have to store the block 2x, one with CID having the multicodec for Raw and another with multicodec for CID.
PS: This brings a old subject where we had similar issues for having updates on how the Repo works (exactly the same module) without having those updates first proposed to the Repo Spec on https://github.com/ipfs/specs/tree/master/repo
from go-cid.
@diasdavid Thanks for this. Was this tidbit of information documented somewhere?
Based on what you are saying all network requests for blocks should be made using the multihash only and not the full CID.
I agree that we should be storing blocks using just the multihash. The change to base32 was made before the CID proposal to fix ipfs/kubo#2601, some additional discussion was done here ipfs/go-datastore#39, and the change was made in ipfs/kubo#2903.
It should be rather simple to store just the multihash component in the datastore. The biggest pain will be the repo migration.
from go-cid.
Right now, go-ipfs uses base32.encode(rawCID) to get the keys for storing blocks in the repo.
This was also cleared up but after Lisbon (I think this point might not have been considered , without this it would be impossible to enumerate blocks given only repo for example as in case of pinset loss, it has saved us once already). This also means that the data isn't self descriptive in storage in blockstore.
It might be possible to store it aside (in other part of the datastore) but it would require us to reevaluate the leveldb (it might just not scale to number of entries we keep).
EDIT: I had this comment sitting in comment windows for ~1h
from go-cid.
@diasdavid I fully agree with all your points. Nevertheless, I still think it will be useful to be able to represent CidV0 in an alternative base. For example if we switch to Base32 users might still have a need to store a CidV0 string as part of a link in a dag. Also converting a CidV0 link in a dag to a Cidv1 for display in same cases can cause problems. If we don't allow multibase prefixed for CidV0 then it will be base32 for everything but the case where CidV0 is still required. That seams odd to me.
from go-cid.
Could I have a link to the "Post Lisbon Treaty"?
from go-cid.
Could I have a link to the "Post Lisbon Treaty"?
I want too
from go-cid.
I would like to renew this discussion. I don't think that supporting CIDv0 in multibase is wise, it will linger forever everywhere.
We should flesh out clear upgrade path that will lead to us not using CIDv0 internally anymore.
Quick example of possible upgrade path:
- Finally upgrade to outputting CIDv1
- Create repo migration of v0 -> v1
- If we ever decode CIDv0, upgrade it to v1
- Push out both CIDv0 and CIDv1 where compatible to the network.
- In places where legacy CIDv0 is need (network, DHT, so on) make dual requests and track metrics on how many times CIDv0 exclusively allowed us to access content (so making requests to CIDv0 saved us)
- When this number falls low enough (enough nodes upgraded), stop providing compatibility layer of publishing CIDv0.
This way we can get CIDv0 mostly out of our codebases (apart from UI layer) and upgrade to new protocol. If we don't do this soon enough we will be stuck with it forever and it decreases performance, increases complexity and reduces our velocity (complexity is bane of moving protocols forward).
We need a clear way to move forward with CIDv1 and how finally embrace it fully.
from go-cid.
I don't think that supporting CIDv0 in multibase is wise
I do not agree with this and don't see the argument as "it will linger forever then everywhere" very convincing.
from go-cid.
@Kubuxu ideally, the datastore wouldn't care about the CID, just the multihash. However, to keep GC working correctly, we'd need to introduce a separate "linkstore" for keeping track of the local DAG. @lgierth indicated he was interested in working on this. (#34 (comment))
from go-cid.
I'm not really sure what a valid usecase for this is, as @diasdavid said earlier, we should just upgrade them to cidv1 and provide an internal conversion layer in cases where it might be needed.
I will repeat by earlier comment #34 (comment) so its not lost in the noise:
Nevertheless, I still think it will be useful to be able to represent CidV0 in an alternative base. For example if we switch to Base32 users might still have a need to store a CidV0 string as part of a link in a dag. Also converting a CidV0 link in a dag to a Cidv1 for display in same cases can cause problems. If we don't allow multibase prefixed for CidV0 then it will be base32 for everything but the case where CidV0 is still required. That seams odd to me.
from go-cid.
Also, as a side note, we can not fully convert CidV0 Dag objects to CidV1 without changing the hash of the object as the links inside will also need to be converted to CidV1. If we do this it will no longer be possible to look up old content. If we don't do this we will need to support CidV0 forever, at least to some capacity.
from go-cid.
@kevina you're right.
from go-cid.
In addition to all the points I said above I would also like to add that not allowing CidV0 to be represented in Base32 breaks abstraction and creates this weird special case. As I see it multibase is something that is independent of CID. That is it can be used to encode any sort of (short) binary string. If we first convert to binary and then decode the CID then any sort of base will be permissible (as the multibase code has no idea it is decoding a CID, and the CID decoder just sees the binary string), however if we do it all at once then the CID decoder will reject it.
I fully agree with should work to migrate away from CIDv0, however for this reason and reasons in previous comments I think not allowing CIDv0 to be encoded using multibase creates unnecessary complications.
In all fairness I did think of one valid use case for not allowing multibase prefixed for CIDv0 that is to distinguish raw multihashes from multihashes that are also CIDv0. That is we switch to storing raw multihashes in the datastore we will need to list them to the user. We could in this case list them with a multibase encoding and define any multihash with a multibase prefix as a raw multihash and not a CIDv0. However, as I see it this is a fragile distinction and it still doesn't solve the problem on how to handle working with existing CIDv0 stored as link in objects.
from go-cid.
Closing:
- Current consensus is that when alternative base needs to be used,
one needs to convert it to CIDv1 - Libp2p is adopting support for CIDv1 as text representation too.
See https://github.com/libp2p/specs/blob/master/RFC/0001-text-peerid-cid.md and libp2p/specs#216.
To follow migration to CIDv1 (default base32) see: https://github.com/ipfs/ipfs/issues/337
from go-cid.
Related Issues (20)
- Implement 'K'-encoding (base36 cids) HOT 2
- Inaccurate godoc comment for uvarint HOT 1
- Possible to calculate cid using Writer interface instead of reading full file into memory? HOT 3
- add CidFromReader HOT 9
- cid Undef cannot be marshalled HOT 3
- Support parse (github.com/ipfs/go-path).Path HOT 16
- Provide an efficient API to check whether a CID has `IDENTITY` multihash code HOT 3
- CodecToStr and Codecs in v0.3.1 could cause data corruption HOT 3
- CidFromReader wraps valid io.EOF in ErrInvalidCid HOT 1
- Bloom filter backed version of Cid.Set HOT 1
- Extract non-core functionaly into new package HOT 5
- LRU cache HOT 3
- protobuf vs dag-pb HOT 2
- Should the JSON represenetation of CID always be formatted as links. HOT 6
- Generated CIDv0 differs from the one generated by IPFS HOT 13
- Proposal: Moving CID to multiformats Org HOT 1
- Prefix.Sum creates bad inline CIDs
- Qm hash identifiers duplicated within the IPFS ecosystem HOT 1
- Harden varint encoding. HOT 1
- Support for reading concatenated CIDs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from go-cid.