Code Monkey home page Code Monkey logo

index-provider's People

Contributors

adlrocha avatar alanshaw avatar dependabot[bot] avatar dirkmc avatar galargh avatar gammazero avatar hannahhoward avatar ischasny avatar kaloyan-raev avatar marcopolo avatar masih avatar mvdan avatar rjan90 avatar robquistnl avatar rvagg avatar web-flow avatar web3-bot avatar willscott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

index-provider's Issues

Provider engine panic on explore recursive edge with no parent

2022-01-18T14:04:13.499+0200	INFO	provider/engine	engine/linksystem.go:45	Retrieved advertisement from datastore	{"cid": "baguqeeqqj2huagbnkzkxipqqi2j2vce7ya", "size": 958}
panic: Traversed Explore Recursive Edge Node With No Parent

goroutine 153112616 [running]:
github.com/ipld/go-ipld-prime/traversal/selector.ExploreRecursiveEdge.Decide(...)
	/root/code/pkg/mod/github.com/ipld/[email protected]/traversal/selector/exploreRecursiveEdge.go:31
github.com/ipld/go-ipld-prime/traversal/selector.ExploreRecursive.Decide(...)
	/root/code/pkg/mod/github.com/ipld/[email protected]/traversal/selector/exploreRecursive.go:165
github.com/ipld/go-ipld-prime/traversal.Progress.walkAdv(0xc038df92c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc00cfc0310, ...)
	/root/code/pkg/mod/github.com/ipld/[email protected]/traversal/walk.go:159 +0x83
github.com/ipld/go-ipld-prime/traversal.Progress.WalkAdv(0xc038df92c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc00cfc0310, ...)
	/root/code/pkg/mod/github.com/ipld/[email protected]/traversal/walk.go:147 +0xd9
github.com/ipfs/go-graphsync/ipldutil.(*traverser).start.func1(0xc011069900)
	/root/code/pkg/mod/github.com/ipfs/[email protected]/ipldutil/traverser.go:230 +0x3d1
created by github.com/ipfs/go-graphsync/ipldutil.(*traverser).start
	/root/code/pkg/mod/github.com/ipfs/[email protected]/ipldutil/traverser.go:198 +0x55

tracking of new cars

Currently the mvp runs as a daemon.

we should add a code interface path for noticing a new CAR file
we should then implement one or both of

  • P1: client command pointing to a car reference-provider add my.carv2
  • P2: filesystem inotify watcher that notices a new car added to a folder

Specify "No Longer Published" content in advertisement

We need a way to specify that some content is "no longer published" by a provider to prevent indexers from having to fetch the full chain of advertisement to sync. It may be the case that by fetching the few latest updates is already enough to fully sync to the latest state.

This is linked with the idea discussed off-band of having and "snapshot scheme" (or something similar) to allow providers to "prune their chain" when it is getting to long, signalling that by getting up to N=some index is enough to fully sync to the latest state (as the rest is no longer provided).

Related: ipni/storetheindex#53

ReferenceProvider MVP

Draft task list for MVP of indexer reference provider. It will be extended as new tasks are identified.
Feel free to take any of the tasks of the list, create an individual issue for it, and assign it to yourself

  • Indexing

    • Advertisement generation from Indexed data.
      • List of CIDs
      • CARs
      • Hook/trigger for new advertisement generation.
    • Local index/datastore with the chain of advertisements linked to the corresponding provided data. This will be used by the ingestion protocol (according to the specific ingestion protocol used, we may want to wrap go-indexer-core in a datastore interface).
  • Ingestion

    • Advertisement publication
      • Publish to pubsub channel
      • PublishLocal to make an advertisement available through an endpoint for indexers to proactively request them. Publish always assumes a subsequent PublishLocal.
      • /advertisement/latest and /advertisement/<id> endpoints for indexers to be able to request new/missed advertisements for a provider.
        • HTTP
        • Libp2p
    • IPLD-aware ingestion
      • go-legs subscriber (or alike) to serve go-data-transfers of indexed data through advertisement selectors.
      • Proactive sync by indexers
    • Push single update. Provider pushes update directly to indexer.
    • IPLD-unaware ingestion (out of scope for MVP?)
      • Request response protocol for paginated sync of indexed data (see design)
  • Process

    • Config
      • Identity (required to sign advertisements)
    • Start from CLI
    • Docker deployment
      • deferred due to K8S deployment
    • Embedded instance.

Graphsync / data-transfer metadata fields

Expect metadata for a storage provider needs at least

format string  // "filecoin/1"
pieceCID CID
free bool
fastRetrieval bool

consider putting this in a new repo / mod

Import ux slow / needs improvment

$ ./index-provider import car -l http://localhost:3102 -i bafybeihh.carv2
Post "http://localhost:3102/admin/import/car": EOF
  • this is a carv2, but the length of time for import seems to be the same as for a carv1
  • the 'EOF' seems not like a useful response

Intermittent failure to retrieve advertised content

Sometimes after publishing an advertisement, when the indexer asks the reference provider for the content of that advertisement, the reference provider cannot find the content. It seems like there is some delay before it is available from the linksystem cache.

This can be forced to work by re-importing the car file and making the reference provider re-publish its advertisement, which triggers the indexer to ask for content again… and it usually works after one or two retries. Of course, this is not acceptable.

This problem appears to be on the reference provider side, because it appears that the indexer is asking for exactly the same content, in exactly the same way/order for both the success and failure case.

Set defaults in config closest to where they are applied

When engine is instantiated with an empty config directly, defaults don't get set. As a result chunk size ends up being zero which results in a single sized chunks.

Make sure wherever constructors take config, the config is checked for sanity.

Add cache manager for ingestion in LinkSystem

We currently generate synchronously the entries linked list used for ingestion by triggering the callback in the linksystem if they are not available in the cache. The exchange can't proceed until the full linked list for the entries is generated and stored in the cache.
https://github.com/filecoin-project/indexer-reference-provider/blob/78768e2546a7f81fe7c6237644f5598a83ea5258/core/engine/linksystem.go#L68

The aim of this is to avoid having to keep redundant storages persisting the list of entries advertised. We only generate the linked list for a list of entries when they are requested. However, this process of "generating on-the-fly" is currently quite inefficient.
https://github.com/filecoin-project/indexer-reference-provider/blob/78768e2546a7f81fe7c6237644f5598a83ea5258/core/engine/linksystem.go#L76

My suggestions to make this faster would be:

  • Add a cacheManager that keeps track of what has been asked for ingestion, what linked lists need to be generated, and which ones are already in cache and ready to be served. Even more, this would allow us to parallelize the generation of linked lists so at soon as we have the first few nodes of the linked list, we can start serving them using the linked system.
  • The cacheManager should also include an offline algorithm to garbage collect linkedList (we currently don't have any garbage collection strategy). As a first approach we could use a naïve approach where we only let a small number of linked list in cache at a time (assuming that after an advertisement publication, all indexers will be looking to ingest the same few set of entries in order to sync).

Can't start market node after upgrade

Issue: Market node crashing when starting up. ERROR:

2022-02-15T10:28:56.826+0100    INFO    dagstore.migrator       dagstore/wrapper.go:354 deal not ready; skipping        {"deal_id": 0, "piece_cid": "baga6ea4seaqk54tcailxwsaqkfsmacu2h43e4z2ytn2xqwj762guubryalaaega"}
2022-02-15T10:28:56.826+0100    INFO    dagstore.migrator       dagstore/wrapper.go:354 deal not ready; skipping        {"deal_id": 0, "piece_cid": "baga6ea4seaqhmohnt4uukhxf3enrgoyn5yldhe6hk3ebltr7jan57j6kkytokfq"}
2022-02-15T10:28:56.826+0100    INFO    dagstore.migrator       dagstore/wrapper.go:358 registering deal in dagstore with lazy init     {"deal_id": 2985231, "piece_cid": "baga6ea4seaqgj2d6h7pzqlvld5qdc5kfnjuxsbxagn2qk253vwvhpxbwehrukmq"}
ERROR: creating node: starting node: failed to connect index provider host with the full node: failed to connect index provider host with the full node: failed to dial 12D3KooWNCcog7KWPsjWa1FmKyqeTeBxxc5cJf27X6vKBVqm1mW3:
  * [/ip4/127.0.0.1/tcp/10231] dial tcp4 127.0.0.1:10231: connect: connection refused
  * [/ip4/192.168.x.x/tcp/10231] failed to negotiate security protocol: read tcp4 192.168.x.x:35397->192.168.x.x:10231: read: connection reset by peer
  * [/ip4/x.x.x.x/tcp/10231] failed to negotiate security protocol: read tcp4 192.168.x.x:35397->x.x.x.x:10231: read: connection reset by peer
  * [/ip4/x.x.x.x/tcp/10231] dial tcp4 0.0.0.0:35397->x.x.x.x:10231: i/o timeout

OBS: My market nodes runs on a separate physical server and needs to connect to the daemon over the network.
Network connectivity is fine and system was just running perfectly on v1.14.0.
It is only allowed to connect on the internal network, so I would believe the error here is on these lines:

[/ip4/x.x.x.x/tcp/10231] failed to negotiate security protocol: read tcp4 192.168.x.x:35397->x.x.x.x:10231: read: connection reset by peer

Lotus Daemon and Miner is running the same version, as the Market node: master-spx.idxprov.rc-1

$ lotus-miner version
Daemon:  1.15.0-dev+mainnet+git.1bf7e6a40+api1.3.0
Local: lotus-miner version 1.15.0-dev+mainnet+git.1bf7e6a40 

The Lotus-miner is also connecting externally on the daemon, so I know for sure the daemon is reachable and is currently connected to the lotus-miner, as it is running...

Add dealID (or alike) to Notify interface

Providers will need to include additional context information about the CIDs/metadata being notified (such as a dealID or any other unique identifier) so they can inform indexers of the specific entry an update/removal refers to.

This would mean changing the reference-provider interface to something like:

	NotifyPutCids(ctx context.Context, dealID cid.Cid, metadata []byte) (cid.Cid, error)

	NotifyRemoveCids(ctx context.Context, dealID cid.Cid) (cid.Cid, error)

       // Callback that goes from dealID to a carv2.Index used to fetch the list of CIDs to index.
        type CidCallback func(dealID cid.Cid) carv2.Index
       // Register Callback to fetch CIDs when triggering `NotifyPut`  
        RegisterCidCallback(cb CidCallback)

If this ID ends up not being the dealID we should maybe choose some other name for it (ctxID?)

//cc @aarshkshah1992

Handle `remove` Advertisements in LinkSystem when list of entries aren't available

A provider may not have the list of multihashes available when receiving a request for ingestion of a remove advertisement. In this case, instead of generating the linked list of entries that need to be removed, the linksystem should check if it is a remove advertisement, and not follow the link for the list of entries (as they won't be available and it would make the exchange fail)
https://github.com/filecoin-project/indexer-reference-provider/blob/78768e2546a7f81fe7c6237644f5598a83ea5258/core/engine/linksystem.go#L100

For advertismeents with isRm == true we should avoid trigerring the callback and traversing the Entries link: https://github.com/filecoin-project/indexer-reference-provider/blob/78768e2546a7f81fe7c6237644f5598a83ea5258/core/engine/linksystem.go#L109

This issue is related with ipni/storetheindex#78

Avoid using storetheindex specific config data structures for public clients

IngestClient expects as an argument an Identity config data structure. This means that we need to generate an ad-hoc storetheindex identity config in refere-provider to make an IndexContent call from the provider. We should consider removing this dependency from storetheindex specific config data structures.

https://github.com/filecoin-project/indexer-reference-provider/blob/78768e2546a7f81fe7c6237644f5598a83ea5258/core/engine/engine.go#L174

Expose the ability to construct sub config with default values

The only way to make a config construct with defaults just now is to to initialise one with identity, take the sub config needed and throw the rest away. Default values are not exported.

Expose the ability to construct an instance of a config/sub-config with default values set.

removal generates invalid advertisement

when removing by metadata context ID:

2022-01-03T08:52:29.283Z        INFO    provider/engine engine/engine.go:331    Generating removal list for advertisement
2022-01-03T08:52:29.287Z        ERROR   provider/engine engine/engine.go:380    Error generating new advertisement: storetheindex: invalid metadata: encountered protocol ID 0 on encode
2022-01-03T08:52:29.287Z        ERROR   adminserver     http/removecar_handler.go:47    failed to remove CAR: storetheindex: invalid metadata: encountered protocol ID 0 on encode      {"err": "storetheindex: invalid metadata: encountered protocol ID 0 on encode", "key": "NQe/ch+eG2aP3DE1xSMcbGF+mJGLH6WmiA8z4xB5QXg="}

Fix flaky tests

The goal of this issue is to track the list of unit tests that are flaky and need some extra care to make it work:
If you see any other flaky test that needs to be investigated please add it to the list

  • TestNotifyPublish from engine_test.go (fails in MacOS go 1.17)
    • This one is probably due to the use of time.Sleep instead of an async approach for tests. It fails in go test -race with runs slower than vanilla go test

Multiple protocol support in metadata

Just a note, but i think we can in a protocol-compatible way, say that metadata is not simply <protocol ID><payload> but rather [<protocol id><protocol-speicfic-payload>], so e.g.
uvarint{graphsync filecoin} || cbor{<cid>, true, true} || uvarint{bitswap} to indicate that both protocols are supported.
since the cbor payload of graphsync can already be determined where the end is, if there are trailing bytes they can be interpreted as the next protocol

Originally posted by @willscott in #187 (comment)

Entries chain cache grows indefinitely

During chunk generation if a single linked list is more than the cache's current capacity, the cache is resized to hold the entire linked list.

However the final resize is always to the length of the cache itself, which means none of the entries are ever evicted and cache grows indefinitely.

Test:

func Test_CacheIsEvicted(t *testing.T) {
	cfg := config.NewIngest()
	cfg.LinkedChunkSize = 10
	cfg.LinkCacheSize = 50
	engine := mkEngineWithConfig(t, cfg)

	mhCount := 100
	expectedChunkCount := mhCount / cfg.LinkedChunkSize

	for i := 0; i < 10; i++ {
		idx := index.NewMultihashSorted()
		cids, err := testutil.RandomCids(mhCount)
		require.NoError(t, err)
		var records []index.Record
		for i, c := range cids {
			records = append(records, index.Record{
				Cid:    c,
				Offset: uint64(i + 1),
			})
		}

		err = idx.Load(records)
		require.NoError(t, err)
		iterator, err := provider.CarMultihashIterator(idx)
		require.NoError(t, err)

		_, err = engine.generateChunks(iterator)
		require.NoError(t, err)
	}

	store, ok := engine.cache.(*lrustore.LRUStore)
	require.True(t, ok)
	require.Equal(t, expectedChunkCount, store.Cap())
}

remove by car doesn't work

provider:index-provider> ./index-provider rm car -l http://localhost:3102 -i /data/snip/deal-cars/bafybeihh.carv2
Not Found: no CAR file found for the given Key

provider:index-provider> ./index-provider import car -l http://localhost:3102 -i /data/snip/deal-cars/bafybeihh.carv2
Conflict: CAR already advertised

Importing the same car file with the same metadata should not generate a new advertisement

Each time a car file is imported, this results in a different advertisement CID. This is preventing the indexer from determining that it has already ingested the advertised indexes.

The reason that the CID is different is that each new advertisement links to the previous and the CID is calculated over the advertisement data and its link to the previous. What needs to happen is that the previous advertisement needs to be looked up by context ID, and if a previous advertisement exists for that context ID (coming from car supplier) and the previous advertisement metadata is equal to the current metadata, then the import request should be ignored.

Note: The context ID comes from the car supplier, which comes from the import request, which is a hash of the car file path. The metadata also originates from the import request.

Remove now redundant libp2p server from provider

Remove the libp2p server exposed by the provider, because the server predates "go-legs" and what it provides is now satisfiable via alternative already exposed endpoints:

  • GET_LATEST is now exposed via go-legs' head publisher over go stream
  • GET_AD is available via graphsyc.

The server is used in end to end tests which needs to be refactored to use graphsyc instead.

Once removed, the client library for it can also go away here.

Note that storetheindex uses the libp2p to get the latest advertisement instead of go-legs head publisher which also needs to be updated, captured in ipni/storetheindex#137

Add endpoint to admin server to remove content by context ID

The admin server needs a new endpoint for removing car content. This new admin/remove/car endpoint will complement the existing admin/import/car endpoint, and will take a contextID or a file path as input. The file path is used to generate the contextID if contextID is not supplied.

Invoking this endpoint must result in a remove advertisement when previously imported content is removed from the provider.

Simplify HTTP server responses

The admin server delivers error responses using an errRes data structure that gets serialized by the respond() function. This was originally to have a common way of delivering errors over both libp2p and http, but is not needed here.

Here is an example of the current error response:

errRes := newErrorResponse("failed to supply CAR. %v", err)                                                                                                                             
respond(w, http.StatusInternalServerError, errRes)                                                                                                                                       

I would prefer to see:

http.Error(w, fmt.Sprintf("failed to supply CAR: %s", err), http.StatusInternalServerError)

Even that can be refined, so that if the file is not found it returns http.StatusBadRequest

Restarted engine does not restore latest advertisement as head in legs publisher

When provider engine is restarted, the head of advertisement exposed via legs head publisher is cid.Undef until the next advertisement is published. Internally, provider engine maintains a mapping to the latest advertisement in datastore.

legs publisher API however does not allow the head to be set without also publishing it onto gossipsub or http (depending on the publisher implementation).

  1. expose an API in go-legs publisher to allow the initial head CID to be set without publishing anything.
  2. On engine.Start check if mapping to latest advertisement is present and set the current head to it.

Provider daemon shuts down ungracefully due to cancelled context

Engine shutdown call always returns context cancelled error when daemon is stopped via a single SIGINT:

./provider daemon
2021-10-15T14:17:15.180+0100    INFO    command/reference-provider      provider/daemon.go:87   libp2p host initialized {"host_id": "12D3KooWK9NsyCndVed4QR71wNayaXHcLnRRTq11QPEe8n1w45Jy", "multiaddr": "/ip4/0.0.0.0/tcp/3103"}
2021-10-15T14:17:15.268+0100    INFO    dt-impl impl/impl.go:145        start data-transfer module
2021-10-15T14:17:15.268+0100    INFO    provider/engine engine/engine.go:64     Retrieval address not configured, using /ip4/192.168.68.105/tcp/3103
2021-10-15T14:17:15.268+0100    INFO    command/reference-provider      provider/daemon.go:134  libp2p servers initialized      {"host_id": "12D3KooWK9NsyCndVed4QR71wNayaXHcLnRRTq11QPEe8n1w45Jy"}
2021-10-15T14:17:15.268+0100    INFO    command/reference-provider      provider/daemon.go:154  admin server initialized        {"address": "127.0.0.1:3102"}
Starting admin server on 127.0.0.1:3102 ...2021-10-15T14:17:15.268+0100 INFO    adminserver     http/server.go:58       admin http server listening     {"addr": "127.0.0.1:3102"}
^CReceived interrupt signal, shutting down...
(Hit CTRL-C again to force-shutdown the daemon.)
2021-10-15T14:17:16.584+0100    INFO    pubsub  [email protected]/pubsub.go:608   pubsub processloop shutting down
2021-10-15T14:17:16.584+0100    INFO    command/reference-provider      provider/daemon.go:170  Shutting down daemon
2021-10-15T14:17:16.584+0100    ERROR   command/reference-provider      provider/daemon.go:184  Error closing provider core     {"err": "context canceled"}
2021-10-15T14:17:16.585+0100    INFO    adminserver     http/server.go:63       admin http server shutdown
2021-10-15T14:17:16.586+0100    INFO    command/reference-provider      provider/daemon.go:199  node stopped
daemon did not stop gracefully

I believe the cause is that the context used during shutdown is the parent context from CLI and the shutdown function does not use the passed context since underlying go-legs closer does not accept it.

When shutdown is called the CLI context is already cancelled hence the error - otherwise shutdown would not have been triggered; see:

Removal advertisements are not syncable

Syncing removal advertisements fails because the link system tries to look up an internal mapping of advertisement context ID which is removed as part of publishing the removal advertisement.

This makes all syncs for such advertisements fail.

Consider checking if the advertisement is a removal ad and do not attempt to look up entries for it from callback, since they may no longer be there.

For "remove all" advertisements, the check is straight forward: check if entries link is NoEntries and if so do not attempt to look up entries.
For removal advertisements with explicit list of multihashes rework in engine is needed to store that list permanently, since engine must be able to serve the advertisements it publishes.

Implement a utility for verifying that expected multihashes are ingested by `storetheindex`

Write a utility that verifies multihashes are known by indexer node , more specifically storetheindex implementation of it.

The utility is to be used for verification in miners in its first iteration. It will accept as input:

  • miner peer ID
  • CAR index files
  • storetheindex endpoint

and for each multihash found in the index checks that storetheindex has it associated to a provider with the same peer ID.

The tool should also be able to get the list of advertisements from the provider's graphsync API as a source of multihashes to verify.

Code reorganization for readability and reusability

This issue records proposed changes to make the reference provider more easily consumable by parties wishing to implement their own provider based on the reference provider, and by reusing packages from the reference provider.

  • Remove provider subdirectory in cmd/provider
  • Move engine/linksystem into a separate package
  • Consider moving engine/engine.go to top-level directory.
  • Expose some of the packages in internal
  • Move engine/dscache.go into separate pacakge. Possibly move it into a package under go-datastore.

need `list` command

there's currently no way to query which files / items are being provided by the current provider daemon.

limit acceptable selectors

the legs handler should filter which selector requests are allowed, and limit them to:

  • selection rooted at an individual known car index
  • selection of metadata items (field recursive of known format)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.