The index-provider from ipni

Provider engine panic on explore recursive edge with no parent

2022-01-18T14:04:13.499+0200	INFO	provider/engine	engine/linksystem.go:45	Retrieved advertisement from datastore	{"cid": "baguqeeqqj2huagbnkzkxipqqi2j2vce7ya", "size": 958}
panic: Traversed Explore Recursive Edge Node With No Parent

goroutine 153112616 [running]:
github.com/ipld/go-ipld-prime/traversal/selector.ExploreRecursiveEdge.Decide(...)
	/root/code/pkg/mod/github.com/ipld/[email protected]/traversal/selector/exploreRecursiveEdge.go:31
github.com/ipld/go-ipld-prime/traversal/selector.ExploreRecursive.Decide(...)
	/root/code/pkg/mod/github.com/ipld/[email protected]/traversal/selector/exploreRecursive.go:165
github.com/ipld/go-ipld-prime/traversal.Progress.walkAdv(0xc038df92c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc00cfc0310, ...)
	/root/code/pkg/mod/github.com/ipld/[email protected]/traversal/walk.go:159 +0x83
github.com/ipld/go-ipld-prime/traversal.Progress.WalkAdv(0xc038df92c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc00cfc0310, ...)
	/root/code/pkg/mod/github.com/ipld/[email protected]/traversal/walk.go:147 +0xd9
github.com/ipfs/go-graphsync/ipldutil.(*traverser).start.func1(0xc011069900)
	/root/code/pkg/mod/github.com/ipfs/[email protected]/ipldutil/traverser.go:230 +0x3d1
created by github.com/ipfs/go-graphsync/ipldutil.(*traverser).start
	/root/code/pkg/mod/github.com/ipfs/[email protected]/ipldutil/traverser.go:198 +0x55

Write up a basic integration guide

Write up docs that would explain how to use this repo in order to implement a provider.

tracking of new cars

Currently the mvp runs as a daemon.

we should add a code interface path for noticing a new CAR file
we should then implement one or both of

P1: client command pointing to a car reference-provider add my.carv2
P2: filesystem inotify watcher that notices a new car added to a folder

Specify "No Longer Published" content in advertisement

We need a way to specify that some content is "no longer published" by a provider to prevent indexers from having to fetch the full chain of advertisement to sync. It may be the case that by fetching the few latest updates is already enough to fully sync to the latest state.

This is linked with the idea discussed off-band of having and "snapshot scheme" (or something similar) to allow providers to "prune their chain" when it is getting to long, signalling that by getting up to N=some index is enough to fully sync to the latest state (as the rest is no longer provided).

Related: ipni/storetheindex#53

ReferenceProvider MVP

Draft task list for MVP of indexer reference provider. It will be extended as new tasks are identified.
Feel free to take any of the tasks of the list, create an individual issue for it, and assign it to yourself

script test failing on windows due to temp directory removal permission issue

investigate why

2022/01/18 10:54:39 cannot delete temporary directory: CreateFile C:\Users\RUNNER~1\AppData\Local\Temp\testscript-main127752611\bin\provider.exe: Access is denied.

https://github.com/filecoin-project/index-provider/runs/4852070256?check_suite_focus=true

Graphsync / data-transfer metadata fields

Expect metadata for a storage provider needs at least

format string  // "filecoin/1"
pieceCID CID
free bool
fastRetrieval bool

consider putting this in a new repo / mod

if car supplier gets a carv1 it should save an index for it (as the dagstore would do)

currently a restart would need to re-scan the full carv1 if a retrieval or callback for it's index is ever encountered.

Authenticate requests for advertisements

depends on filecoin-project/go-legs#20
strategy TBD
should probably be locked tight (don't accept anything that's not a known advertisement CID, possibly validate indexer address)

Import ux slow / needs improvment

$ ./index-provider import car -l http://localhost:3102 -i bafybeihh.carv2
Post "http://localhost:3102/admin/import/car": EOF

this is a carv2, but the length of time for import seems to be the same as for a carv1
the 'EOF' seems not like a useful response

Intermittent failure to retrieve advertised content

Sometimes after publishing an advertisement, when the indexer asks the reference provider for the content of that advertisement, the reference provider cannot find the content. It seems like there is some delay before it is available from the linksystem cache.

This can be forced to work by re-importing the car file and making the reference provider re-publish its advertisement, which triggers the indexer to ask for content again… and it usually works after one or two retries. Of course, this is not acceptable.

This problem appears to be on the reference provider side, because it appears that the indexer is asking for exactly the same content, in exactly the same way/order for both the success and failure case.

Set defaults in config closest to where they are applied

When engine is instantiated with an empty config directly, defaults don't get set. As a result chunk size ends up being zero which results in a single sized chunks.

Make sure wherever constructors take config, the config is checked for sanity.

Add cache manager for ingestion in LinkSystem

We currently generate synchronously the entries linked list used for ingestion by triggering the callback in the linksystem if they are not available in the cache. The exchange can't proceed until the full linked list for the entries is generated and stored in the cache.
https://github.com/filecoin-project/indexer-reference-provider/blob/78768e2546a7f81fe7c6237644f5598a83ea5258/core/engine/linksystem.go#L68

The aim of this is to avoid having to keep redundant storages persisting the list of entries advertised. We only generate the linked list for a list of entries when they are requested. However, this process of "generating on-the-fly" is currently quite inefficient.
https://github.com/filecoin-project/indexer-reference-provider/blob/78768e2546a7f81fe7c6237644f5598a83ea5258/core/engine/linksystem.go#L76

My suggestions to make this faster would be:

Add a cacheManager that keeps track of what has been asked for ingestion, what linked lists need to be generated, and which ones are already in cache and ready to be served. Even more, this would allow us to parallelize the generation of linked lists so at soon as we have the first few nodes of the linked list, we can start serving them using the linked system.
The cacheManager should also include an offline algorithm to garbage collect linkedList (we currently don't have any garbage collection strategy). As a first approach we could use a naïve approach where we only let a small number of linked list in cache at a time (assuming that after an advertisement publication, all indexers will be looking to ingest the same few set of entries in order to sync).

Review version generation and remove if always out of date

Looks like GitVersion var is not set when binaries are built. Remove if always out of date.

Originally posted by @willscott in #129 (comment)

Incomplete sentence on README.md

This line in the README is missing the end.

Can't start market node after upgrade

Issue: Market node crashing when starting up. ERROR:

2022-02-15T10:28:56.826+0100    INFO    dagstore.migrator       dagstore/wrapper.go:354 deal not ready; skipping        {"deal_id": 0, "piece_cid": "baga6ea4seaqk54tcailxwsaqkfsmacu2h43e4z2ytn2xqwj762guubryalaaega"}
2022-02-15T10:28:56.826+0100    INFO    dagstore.migrator       dagstore/wrapper.go:354 deal not ready; skipping        {"deal_id": 0, "piece_cid": "baga6ea4seaqhmohnt4uukhxf3enrgoyn5yldhe6hk3ebltr7jan57j6kkytokfq"}
2022-02-15T10:28:56.826+0100    INFO    dagstore.migrator       dagstore/wrapper.go:358 registering deal in dagstore with lazy init     {"deal_id": 2985231, "piece_cid": "baga6ea4seaqgj2d6h7pzqlvld5qdc5kfnjuxsbxagn2qk253vwvhpxbwehrukmq"}
ERROR: creating node: starting node: failed to connect index provider host with the full node: failed to connect index provider host with the full node: failed to dial 12D3KooWNCcog7KWPsjWa1FmKyqeTeBxxc5cJf27X6vKBVqm1mW3:
  * [/ip4/127.0.0.1/tcp/10231] dial tcp4 127.0.0.1:10231: connect: connection refused
  * [/ip4/192.168.x.x/tcp/10231] failed to negotiate security protocol: read tcp4 192.168.x.x:35397->192.168.x.x:10231: read: connection reset by peer
  * [/ip4/x.x.x.x/tcp/10231] failed to negotiate security protocol: read tcp4 192.168.x.x:35397->x.x.x.x:10231: read: connection reset by peer
  * [/ip4/x.x.x.x/tcp/10231] dial tcp4 0.0.0.0:35397->x.x.x.x:10231: i/o timeout

OBS: My market nodes runs on a separate physical server and needs to connect to the daemon over the network.
Network connectivity is fine and system was just running perfectly on v1.14.0.
It is only allowed to connect on the internal network, so I would believe the error here is on these lines:

[/ip4/x.x.x.x/tcp/10231] failed to negotiate security protocol: read tcp4 192.168.x.x:35397->x.x.x.x:10231: read: connection reset by peer

Lotus Daemon and Miner is running the same version, as the Market node: master-spx.idxprov.rc-1

$ lotus-miner version
Daemon:  1.15.0-dev+mainnet+git.1bf7e6a40+api1.3.0
Local: lotus-miner version 1.15.0-dev+mainnet+git.1bf7e6a40

The Lotus-miner is also connecting externally on the daemon, so I know for sure the daemon is reachable and is currently connected to the lotus-miner, as it is running...

Add command to remove car content

Similar to the import car command, there should be a remove car command. It should invoke the admin/remove/car endpoint described be #77

Add dealID (or alike) to Notify interface

Providers will need to include additional context information about the CIDs/metadata being notified (such as a dealID or any other unique identifier) so they can inform indexers of the specific entry an update/removal refers to.

This would mean changing the reference-provider interface to something like:

	NotifyPutCids(ctx context.Context, dealID cid.Cid, metadata []byte) (cid.Cid, error)

	NotifyRemoveCids(ctx context.Context, dealID cid.Cid) (cid.Cid, error)

       // Callback that goes from dealID to a carv2.Index used to fetch the list of CIDs to index.
        type CidCallback func(dealID cid.Cid) carv2.Index
       // Register Callback to fetch CIDs when triggering `NotifyPut`  
        RegisterCidCallback(cb CidCallback)

If this ID ends up not being the dealID we should maybe choose some other name for it (ctxID?)

//cc @aarshkshah1992

Handle `remove` Advertisements in LinkSystem when list of entries aren't available

A provider may not have the list of multihashes available when receiving a request for ingestion of a remove advertisement. In this case, instead of generating the linked list of entries that need to be removed, the linksystem should check if it is a remove advertisement, and not follow the link for the list of entries (as they won't be available and it would make the exchange fail)
https://github.com/filecoin-project/indexer-reference-provider/blob/78768e2546a7f81fe7c6237644f5598a83ea5258/core/engine/linksystem.go#L100

For advertismeents with isRm == true we should avoid trigerring the callback and traversing the Entries link: https://github.com/filecoin-project/indexer-reference-provider/blob/78768e2546a7f81fe7c6237644f5598a83ea5258/core/engine/linksystem.go#L109

This issue is related with ipni/storetheindex#78

Consider providing a hash over items, rather than the items themselves

If we provide not just the direct multihashes of content, but instead a hash over that multihash, then the index system will not itself know what content is being queried.
This can help limit some enumeration attacks on the network, which can be useful.

Avoid using storetheindex specific config data structures for public clients

IngestClient expects as an argument an Identity config data structure. This means that we need to generate an ad-hoc storetheindex identity config in refere-provider to make an IndexContent call from the provider. We should consider removing this dependency from storetheindex specific config data structures.

https://github.com/filecoin-project/indexer-reference-provider/blob/78768e2546a7f81fe7c6237644f5598a83ea5258/core/engine/engine.go#L174

Expose the ability to construct sub config with default values

The only way to make a config construct with defaults just now is to to initialise one with identity, take the sub config needed and throw the rest away. Default values are not exported.

Expose the ability to construct an instance of a config/sub-config with default values set.

removal generates invalid advertisement

when removing by metadata context ID:

2022-01-03T08:52:29.283Z        INFO    provider/engine engine/engine.go:331    Generating removal list for advertisement
2022-01-03T08:52:29.287Z        ERROR   provider/engine engine/engine.go:380    Error generating new advertisement: storetheindex: invalid metadata: encountered protocol ID 0 on encode
2022-01-03T08:52:29.287Z        ERROR   adminserver     http/removecar_handler.go:47    failed to remove CAR: storetheindex: invalid metadata: encountered protocol ID 0 on encode      {"err": "storetheindex: invalid metadata: encountered protocol ID 0 on encode", "key": "NQe/ch+eG2aP3DE1xSMcbGF+mJGLH6WmiA8z4xB5QXg="}

Fix flaky tests

The goal of this issue is to track the list of unit tests that are flaky and need some extra care to make it work:
If you see any other flaky test that needs to be investigated please add it to the list

TestNotifyPublish from engine_test.go (fails in MacOS go 1.17)
- This one is probably due to the use of time.Sleep instead of an async approach for tests. It fails in go test -race with runs slower than vanilla go test

manual release created (v0.1.0)

@gammazero just pushed a release tag: v0.1.0.
Please manually verify validity (using gorelease), and update version.json to reflect the manually released version, if necessary.
In the future, please use the automated process.

Multiple protocol support in metadata

Just a note, but i think we can in a protocol-compatible way, say that metadata is not simply <protocol ID><payload> but rather [<protocol id><protocol-speicfic-payload>], so e.g.
uvarint{graphsync filecoin} || cbor{<cid>, true, true} || uvarint{bitswap} to indicate that both protocols are supported.
since the cbor payload of graphsync can already be determined where the end is, if there are trailing bytes they can be interpreted as the next protocol

Originally posted by @willscott in #187 (comment)

Support change of retrieval addresses after adv is published

Retrieval addresses are passed as constructor arguments to engine. They may change. Add mechanism to get those updated.

Entries chain cache grows indefinitely

During chunk generation if a single linked list is more than the cache's current capacity, the cache is resized to hold the entire linked list.

However the final resize is always to the length of the cache itself, which means none of the entries are ever evicted and cache grows indefinitely.

Test:

func Test_CacheIsEvicted(t *testing.T) {
	cfg := config.NewIngest()
	cfg.LinkedChunkSize = 10
	cfg.LinkCacheSize = 50
	engine := mkEngineWithConfig(t, cfg)

	mhCount := 100
	expectedChunkCount := mhCount / cfg.LinkedChunkSize

	for i := 0; i < 10; i++ {
		idx := index.NewMultihashSorted()
		cids, err := testutil.RandomCids(mhCount)
		require.NoError(t, err)
		var records []index.Record
		for i, c := range cids {
			records = append(records, index.Record{
				Cid:    c,
				Offset: uint64(i + 1),
			})
		}

		err = idx.Load(records)
		require.NoError(t, err)
		iterator, err := provider.CarMultihashIterator(idx)
		require.NoError(t, err)

		_, err = engine.generateChunks(iterator)
		require.NoError(t, err)
	}

	store, ok := engine.cache.(*lrustore.LRUStore)
	require.True(t, ok)
	require.Equal(t, expectedChunkCount, store.Cap())
}

remove by car doesn't work

provider:index-provider> ./index-provider rm car -l http://localhost:3102 -i /data/snip/deal-cars/bafybeihh.carv2
Not Found: no CAR file found for the given Key

provider:index-provider> ./index-provider import car -l http://localhost:3102 -i /data/snip/deal-cars/bafybeihh.carv2
Conflict: CAR already advertised

Consider stronger checks to assert type of a given IPLD node

Because, any JSON with field Signature might not be an advertisement.

https://github.com/filecoin-project/indexer-reference-provider/blob/7070cea94188077aa0a1cdb12e3aad4f681bb183/core/engine/linksystem.go#L85-L88

Importing the same car file with the same metadata should not generate a new advertisement

Each time a car file is imported, this results in a different advertisement CID. This is preventing the indexer from determining that it has already ingested the advertised indexes.

The reason that the CID is different is that each new advertisement links to the previous and the CID is calculated over the advertisement data and its link to the previous. What needs to happen is that the previous advertisement needs to be looked up by context ID, and if a previous advertisement exists for that context ID (coming from car supplier) and the previous advertisement metadata is equal to the current metadata, then the import request should be ignored.

Note: The context ID comes from the car supplier, which comes from the import request, which is a hash of the car file path. The metadata also originates from the import request.

Remove now redundant libp2p server from provider

Remove the libp2p server exposed by the provider, because the server predates "go-legs" and what it provides is now satisfiable via alternative already exposed endpoints:

GET_LATEST is now exposed via go-legs' head publisher over go stream
GET_AD is available via graphsyc.

The server is used in end to end tests which needs to be refactored to use graphsyc instead.

Once removed, the client library for it can also go away here.

Note that storetheindex uses the libp2p to get the latest advertisement instead of go-legs head publisher which also needs to be updated, captured in ipni/storetheindex#137

Add endpoint to admin server to remove content by context ID

The admin server needs a new endpoint for removing car content. This new admin/remove/car endpoint will complement the existing admin/import/car endpoint, and will take a contextID or a file path as input. The file path is used to generate the contextID if contextID is not supplied.

Invoking this endpoint must result in a remove advertisement when previously imported content is removed from the provider.

Simplify HTTP server responses

The admin server delivers error responses using an errRes data structure that gets serialized by the respond() function. This was originally to have a common way of delivering errors over both libp2p and http, but is not needed here.

Here is an example of the current error response:

errRes := newErrorResponse("failed to supply CAR. %v", err)                                                                                                                             
respond(w, http.StatusInternalServerError, errRes)

I would prefer to see:

http.Error(w, fmt.Sprintf("failed to supply CAR: %s", err), http.StatusInternalServerError)

Even that can be refined, so that if the file is not found it returns http.StatusBadRequest

Restarted engine does not restore latest advertisement as head in legs publisher

When provider engine is restarted, the head of advertisement exposed via legs head publisher is cid.Undef until the next advertisement is published. Internally, provider engine maintains a mapping to the latest advertisement in datastore.

legs publisher API however does not allow the head to be set without also publishing it onto gossipsub or http (depending on the publisher implementation).

expose an API in go-legs publisher to allow the initial head CID to be set without publishing anything.
On engine.Start check if mapping to latest advertisement is present and set the current head to it.

Provider daemon shuts down ungracefully due to cancelled context

Engine shutdown call always returns context cancelled error when daemon is stopped via a single SIGINT:

./provider daemon
2021-10-15T14:17:15.180+0100    INFO    command/reference-provider      provider/daemon.go:87   libp2p host initialized {"host_id": "12D3KooWK9NsyCndVed4QR71wNayaXHcLnRRTq11QPEe8n1w45Jy", "multiaddr": "/ip4/0.0.0.0/tcp/3103"}
2021-10-15T14:17:15.268+0100    INFO    dt-impl impl/impl.go:145        start data-transfer module
2021-10-15T14:17:15.268+0100    INFO    provider/engine engine/engine.go:64     Retrieval address not configured, using /ip4/192.168.68.105/tcp/3103
2021-10-15T14:17:15.268+0100    INFO    command/reference-provider      provider/daemon.go:134  libp2p servers initialized      {"host_id": "12D3KooWK9NsyCndVed4QR71wNayaXHcLnRRTq11QPEe8n1w45Jy"}
2021-10-15T14:17:15.268+0100    INFO    command/reference-provider      provider/daemon.go:154  admin server initialized        {"address": "127.0.0.1:3102"}
Starting admin server on 127.0.0.1:3102 ...2021-10-15T14:17:15.268+0100 INFO    adminserver     http/server.go:58       admin http server listening     {"addr": "127.0.0.1:3102"}
^CReceived interrupt signal, shutting down...
(Hit CTRL-C again to force-shutdown the daemon.)
2021-10-15T14:17:16.584+0100    INFO    pubsub  [email protected]/pubsub.go:608   pubsub processloop shutting down
2021-10-15T14:17:16.584+0100    INFO    command/reference-provider      provider/daemon.go:170  Shutting down daemon
2021-10-15T14:17:16.584+0100    ERROR   command/reference-provider      provider/daemon.go:184  Error closing provider core     {"err": "context canceled"}
2021-10-15T14:17:16.585+0100    INFO    adminserver     http/server.go:63       admin http server shutdown
2021-10-15T14:17:16.586+0100    INFO    command/reference-provider      provider/daemon.go:199  node stopped
daemon did not stop gracefully

I believe the cause is that the context used during shutdown is the parent context from CLI and the shutdown function does not use the passed context since underlying go-legs closer does not accept it.

When shutdown is called the CLI context is already cancelled hence the error - otherwise shutdown would not have been triggered; see:

https://github.com/filecoin-project/indexer-reference-provider/blob/main/cmd/provider/daemon.go#L171-L185

Removal advertisements are not syncable

Syncing removal advertisements fails because the link system tries to look up an internal mapping of advertisement context ID which is removed as part of publishing the removal advertisement.

This makes all syncs for such advertisements fail.

Consider checking if the advertisement is a removal ad and do not attempt to look up entries for it from callback, since they may no longer be there.

For "remove all" advertisements, the check is straight forward: check if entries link is NoEntries and if so do not attempt to look up entries.
For removal advertisements with explicit list of multihashes rework in engine is needed to store that list permanently, since engine must be able to serve the advertisements it publishes.

Use IPLD schema generator to defile `FilecoinV1Data` metadata schema

The metadata schema for FilecoinV1Data metadata is defined programatically.

Now that the schema generator work in IPLD is operational, use the generator similar to the work done in storetheindex.

verify / list seems not to work when rm advertisements present

fails trying to get a block that doesn't exist

Implement a utility for verifying that expected multihashes are ingested by `storetheindex`

Write a utility that verifies multihashes are known by indexer node , more specifically storetheindex implementation of it.

The utility is to be used for verification in miners in its first iteration. It will accept as input:

miner peer ID
CAR index files
storetheindex endpoint

and for each multihash found in the index checks that storetheindex has it associated to a provider with the same peer ID.

The tool should also be able to get the list of advertisements from the provider's graphsync API as a source of multihashes to verify.

Define convenient flags for specific metadata type

The metadata flag takes base64 encoded metadata bytes. Consider integrating built-in generation of metadata for known types as a convenient way for a user to specify them instead of having to deal with manual byte encoding.

Originally posted by @gammazero in #198 (comment)

Increase default chunk to ~32K

The rationale is CIDs of size ~128 bytes and 4MB message size

https://github.com/filecoin-project/index-provider/blob/6fb67c52db394731bf448fca398e9df765442f77/config/ingest.go#L5

Align provider dependencies with Lotus

Dependencies in go-legs would also need to be aligned.

Relates to:

filecoin-project/lotus#7622

Code reorganization for readability and reusability

This issue records proposed changes to make the reference provider more easily consumable by parties wishing to implement their own provider based on the reference provider, and by reusing packages from the reference provider.

Remove provider subdirectory in cmd/provider
Move engine/linksystem into a separate package
Consider moving engine/engine.go to top-level directory.
Expose some of the packages in internal
Move engine/dscache.go into separate pacakge. Possibly move it into a package under go-datastore.

Write up README for repo

Add readme for this repo

need `list` command

there's currently no way to query which files / items are being provided by the current provider daemon.

limit acceptable selectors

the legs handler should filter which selector requests are allowed, and limit them to:

selection rooted at an individual known car index
selection of metadata items (field recursive of known format)

ipni / index-provider Goto Github PK

index-provider's People

Contributors

Stargazers

Watchers

Forkers

index-provider's Issues

Recommend Projects

Recommend Topics

Recommend Org