It may be useful to represent a CidV0 object in an alternative base. For example if w

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="organization" data-hovercard-

Provide Way of Representing CidV0 objects in an alternative base,about ipfs/go-cid

Comments (31)

whyrusleeping commented on July 21, 2024 2

We do want to move away from supporting cidv0. Adding new features to it seems unwise if thats really our goal.

I'm not really sure what a valid usecase for this is, as @diasdavid said earlier, we should just upgrade them to cidv1 and provide an internal conversion layer in cases where it might be needed.

from go-cid.

kevina commented on July 21, 2024 2

However, to keep GC working correctly, we'd need to introduce a separate "linkstore" for keeping track of the local DAG.

I do not see why this will be necessary. Pinned hashes will still be stored by the full CID. The GC does not care about the structure of anything not pinned.

from go-cid.

kevina commented on July 21, 2024 1

The rules for displaying CidV0 in an alternative base can be as follows:

If the base is in 'base58btc' than just encode the multihash without a base prefix, otherwise prepend a byte of value '0x00' (the version 0 string) to the binary representation and encode that with a base prefix.

This should be fairly easy to implement, I can create a p.r. later today if it will make my idea clearer.

from go-cid.

daviddias commented on July 21, 2024 1

@diasdavid Thanks for this. Was this tidbit of information should of been documented somewhere?

Agreed and I believe it was, at least 2 times. Post Lisbon Treaty and somewhere in a go-ipfs PR. It should have been, however, done through the IPFS Specs Repo.

Based on what you are saying all network requests for blocks should be made using the multihash only and not the full CID.

The CID can go on the wire (because of in the future, with graphswap that will be useful), however, the assertions should be done in the hash level. If the Repo was checking by Hash, that will be granted by default.

from go-cid.

kevina commented on July 21, 2024 1

@diasdavid you replied before I edited my sentence 😄

from go-cid.

kevina commented on July 21, 2024 1

I'm not sure, but maybe this is something that we can work through in the upcoming developers meeting. See ipfs-inactive/developer-meetings#16.

from go-cid.

commented on July 21, 2024

I've been thinking about this too, for the case where we want to represent a CIDv0 in base32 in a URL. The less obvious thing here is that we probably need to be able to recognize such CIDs and convert them back to CIDv0, so that DHT and Bitswap return the expected results.

Do you agree this is a concern? @whyrusleeping was skeptical when I first brought it up a while ago.

It could be done with a multicodec header that says "what's in here is CIDv0".

from go-cid.

daviddias commented on July 21, 2024

for the case where we want to represent a CIDv0 in base32 in a URL.

For those cases, we upgrade the CID to CIDv1 and base encoded it with base32.

from go-cid.

kevina commented on July 21, 2024

@lgierth

It could be done with a multicodec header that says "what's in here is CIDv0

Yes, and I think that should just be the byte '0x00' which is the variable-length encoding of the number 0. Hence the equivalent of setting the version number to 0.

@diasdavid

For those cases, we upgrade the CID to CIDv1 and base encoded it with base32.

The ideas is we want to be able to retrieve a Cidv0 object using base32. Upgrading the CID will not work in this case.

from go-cid.

commented on July 21, 2024

The ideas is we want to be able to retrieve a Cidv0 object using base32. Upgrading the CID will not work in this case.

@diasdavid what Kevin and I are thinking about is the next step after receiving a gateway request with a CIDv0-in-CIDv1: content routing and bitswap, where we need the original CIDv0 because that's what provider records are published for, and what's used as a key in the blockstore of the provider nodes. It's kind of a matter of backward compatibility: there will always be nodes who continue to create CIDv0 stuff.

from go-cid.

Stebalien commented on July 21, 2024

@kevina I'd go a step further and start using the CIDv1 formatted CIDv0s by default. I believe that will actually be less work and, given that we're planning on switching to base32 by default anyways, shouldn't cause any additional breakage. Does that make sense or am I missing something?

from go-cid.

kevina commented on July 21, 2024

@Stebalien What exactly will be a CIDv1 formatted CIDv0?

The base used to encode a CID is for the most part an interface issue. The base is is not stored in the key of objects stored as part of a link or in the blockstore. The Cid version and hash used is, on the other hand, part of the key.

You can't retrieve an Cidv0 object if it is reformatted to be Cidv1 and sometimes you may still want to store a Cidv0 key in another object.

from go-cid.

Stebalien commented on July 21, 2024

What exactly will be a CIDv1 formatted CIDv0?

The multibase-cidv0-multihash sequence. That is, if we switch to base32 as the default, we'll now be returning base32-cidv0-multihash from interfaces so we might as well start returning base58btc-cidv0-multihash when the user requests base58btc.

from go-cid.

kevina commented on July 21, 2024

@Stebalien sorry I misinterpreted your reply (as I am prone to do :( ).

The idea behind not returning base58btc-cidv0-multihash was (1) to provide only one way to represent a Cid in a given base and (2) to simplify the U.I. If we do switch to base58btc-cidv0-multihash then we probably need to provide a way to get to the old style hash (for compatibility, if no other reason) and will need to provide an extra option for that. If we have the rule that base58btc implies the old style hash string (no multibase-version) prefix we avoid this problem.

from go-cid.

Stebalien commented on July 21, 2024

I kind of agree... I'd just like to phase out non-multibase strings as much as possible as soon as possible. But, I guess, if we move to base32 by default we'll be doing that anyways.

from go-cid.

kevina commented on July 21, 2024

It turns out I made this more complicated then necessary. The CidV0 bit is NOT needed. Just prefixing a CidV0 string with a multibase prefix will already work as expected. For example
QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco zQmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco BCIQIZOL2ACTMSCQU5TYYFWBWGWB5IAXWTEM6HDYETDW5TWJKLQBKPNA will all work already.

The only problem is this little bit of code (which gave me the impression that a CidV0 are incompatible with multibase)

func (c *Cid) StringOfBase(base mbase.Encoding) (string, error) {
	switch c.version {
	case 0:
		if base != mbase.Base58BTC {
			return "", ErrInvalidEncoding
		}
		return c.hash.B58String(), nil
	case 1:
		return mbase.Encode(base, c.bytesV1())
	default:
		panic("not possible to reach this point")
	}
}

which should likely work on CidV0 string and return a string with the multibase prefix if the code is not 'z'.

Thus I propose we fix this function and call it a day. 😄

from go-cid.

Stebalien commented on July 21, 2024

Here's an update to the spec to take this into account: multiformats/cid#14

from go-cid.

daviddias commented on July 21, 2024

@diasdavid what Kevin and I are thinking about is the next step after receiving a gateway request with a CIDv0-in-CIDv1: content routing and bitswap, where we need the original CIDv0 because that's what provider records are published for, and what's used as a key in the blockstore of the provider nodes. It's kind of a matter of backward compatibility: there will always be nodes who continue to create CIDv0 stuff.

There are two issues here. One is:

a) where we need the original CIDv0 because that's what provider records.
b) what's used as a key in the blockstore of the provider nodes.

This was accepted as kind of fine since that is also a valid concern for when someone picks different hashes or bases for their content. Essentially, the content should be identified by it's multihash and not by a key that includes type and multibase. If the hash matches, it is the same content. If an app wants to interpret content "asdldasjdkla" as multicodec A and another app as multicodec B, that's fine. This way we ensure deduplication

This is what I think will be the be crux.

From memory, the Lisbon Treaty (the 2~3 hack nights that me, @jbenet and @whyrusleeping spent figuring all of the details out), the actual solution for this problem was not to use CIDs as Keys in the Repo, instead, keep using the multihashes as it was before. This would solve this issue from the bottom because independently of reading CIDv0 or CIDv1, the repo would always pick up the block from the multihash and that would be the same for both.

Right now, go-ipfs uses base32.encode(rawCID) to get the keys for storing blocks in the repo. Eventually, js-ipfs followed so that both repos are compatible. However, I can't recall if there was a valid reason for this change. Please help me refresh my and everyone's memory if you know where that was discussed and approved.

Using base32.encode(rawCID) also means that if I have a block that for an IPLD graph, one of the node is considered _Raw) (because, for that graph, it really doesn't care for those bytes nor it knows what it is) and for another graph, it consuming the same content but now it is a git Object, now we effectively have to store the block 2x, one with CID having the multicodec for Raw and another with multicodec for CID.

PS: This brings a old subject where we had similar issues for having updates on how the Repo works (exactly the same module) without having those updates first proposed to the Repo Spec on https://github.com/ipfs/specs/tree/master/repo

from go-cid.

kevina commented on July 21, 2024

@diasdavid Thanks for this. Was this tidbit of information documented somewhere?

Based on what you are saying all network requests for blocks should be made using the multihash only and not the full CID.

I agree that we should be storing blocks using just the multihash. The change to base32 was made before the CID proposal to fix ipfs/kubo#2601, some additional discussion was done here ipfs/go-datastore#39, and the change was made in ipfs/kubo#2903.

It should be rather simple to store just the multihash component in the datastore. The biggest pain will be the repo migration.

from go-cid.

Kubuxu commented on July 21, 2024

Right now, go-ipfs uses base32.encode(rawCID) to get the keys for storing blocks in the repo.

This was also cleared up but after Lisbon (I think this point might not have been considered , without this it would be impossible to enumerate blocks given only repo for example as in case of pinset loss, it has saved us once already). This also means that the data isn't self descriptive in storage in blockstore.

It might be possible to store it aside (in other part of the datastore) but it would require us to reevaluate the leveldb (it might just not scale to number of entries we keep).

EDIT: I had this comment sitting in comment windows for ~1h

from go-cid.

kevina commented on July 21, 2024

@diasdavid I fully agree with all your points. Nevertheless, I still think it will be useful to be able to represent CidV0 in an alternative base. For example if we switch to Base32 users might still have a need to store a CidV0 string as part of a link in a dag. Also converting a CidV0 link in a dag to a Cidv1 for display in same cases can cause problems. If we don't allow multibase prefixed for CidV0 then it will be base32 for everything but the case where CidV0 is still required. That seams odd to me.

from go-cid.

kevina commented on July 21, 2024

Could I have a link to the "Post Lisbon Treaty"?

from go-cid.

commented on July 21, 2024

Could I have a link to the "Post Lisbon Treaty"?

I want too

from go-cid.

Kubuxu commented on July 21, 2024

I would like to renew this discussion. I don't think that supporting CIDv0 in multibase is wise, it will linger forever everywhere.

We should flesh out clear upgrade path that will lead to us not using CIDv0 internally anymore.

Quick example of possible upgrade path:

Finally upgrade to outputting CIDv1
Create repo migration of v0 -> v1
If we ever decode CIDv0, upgrade it to v1
Push out both CIDv0 and CIDv1 where compatible to the network.
In places where legacy CIDv0 is need (network, DHT, so on) make dual requests and track metrics on how many times CIDv0 exclusively allowed us to access content (so making requests to CIDv0 saved us)
When this number falls low enough (enough nodes upgraded), stop providing compatibility layer of publishing CIDv0.

This way we can get CIDv0 mostly out of our codebases (apart from UI layer) and upgrade to new protocol. If we don't do this soon enough we will be stuck with it forever and it decreases performance, increases complexity and reduces our velocity (complexity is bane of moving protocols forward).

We need a clear way to move forward with CIDv1 and how finally embrace it fully.

from go-cid.

kevina commented on July 21, 2024

I don't think that supporting CIDv0 in multibase is wise

I do not agree with this and don't see the argument as "it will linger forever then everywhere" very convincing.

from go-cid.

Stebalien commented on July 21, 2024

@Kubuxu ideally, the datastore wouldn't care about the CID, just the multihash. ~~However, to keep GC working correctly, we'd need to introduce a separate "linkstore" for keeping track of the local DAG. @lgierth indicated he was interested in working on this.~~ (#34 (comment))

from go-cid.

kevina commented on July 21, 2024

I'm not really sure what a valid usecase for this is, as @diasdavid said earlier, we should just upgrade them to cidv1 and provide an internal conversion layer in cases where it might be needed.

I will repeat by earlier comment #34 (comment) so its not lost in the noise:

Nevertheless, I still think it will be useful to be able to represent CidV0 in an alternative base. For example if we switch to Base32 users might still have a need to store a CidV0 string as part of a link in a dag. Also converting a CidV0 link in a dag to a Cidv1 for display in same cases can cause problems. If we don't allow multibase prefixed for CidV0 then it will be base32 for everything but the case where CidV0 is still required. That seams odd to me.

from go-cid.

kevina commented on July 21, 2024

Also, as a side note, we can not fully convert CidV0 Dag objects to CidV1 without changing the hash of the object as the links inside will also need to be converted to CidV1. If we do this it will no longer be possible to look up old content. If we don't do this we will need to support CidV0 forever, at least to some capacity.

from go-cid.

Stebalien commented on July 21, 2024

@kevina you're right.

from go-cid.

kevina commented on July 21, 2024

In addition to all the points I said above I would also like to add that not allowing CidV0 to be represented in Base32 breaks abstraction and creates this weird special case. As I see it multibase is something that is independent of CID. That is it can be used to encode any sort of (short) binary string. If we first convert to binary and then decode the CID then any sort of base will be permissible (as the multibase code has no idea it is decoding a CID, and the CID decoder just sees the binary string), however if we do it all at once then the CID decoder will reject it.

I fully agree with should work to migrate away from CIDv0, however for this reason and reasons in previous comments I think not allowing CIDv0 to be encoded using multibase creates unnecessary complications.

In all fairness I did think of one valid use case for not allowing multibase prefixed for CIDv0 that is to distinguish raw multihashes from multihashes that are also CIDv0. That is we switch to storing raw multihashes in the datastore we will need to list them to the user. We could in this case list them with a multibase encoding and define any multihash with a multibase prefix as a raw multihash and not a CIDv0. However, as I see it this is a fragile distinction and it still doesn't solve the problem on how to handle working with existing CIDv0 stored as link in objects.

from go-cid.

lidel commented on July 21, 2024

Closing:

Current consensus is that when alternative base needs to be used,
one needs to convert it to CIDv1
Libp2p is adopting support for CIDv1 as text representation too.
See https://github.com/libp2p/specs/blob/master/RFC/0001-text-peerid-cid.md and libp2p/specs#216.

To follow migration to CIDv1 (default base32) see: https://github.com/ipfs/ipfs/issues/337

from go-cid.

Provide Way of Representing CidV0 objects in an alternative base about go-cid HOT 31 CLOSED

Comments (31)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent