ipni / indexstar Goto Github PK
View Code? Open in Web Editor NEW:star:️ A load splitter for storetheindex :star:
License: Other
:star:️ A load splitter for storetheindex :star:
License: Other
reframe
is deprecated in favour of HTTP delegated routing. Remove support for it at cid.contact.
caskadht
correctly implements the IPNI specification when it comes to responding to OPTIONS
request: it includes X-IPNI-Allow-Cascade
with the labels it supports.
However, indexstar does not propagate OPTIONS
request to the backends and therefore does not include the supported labels. Change the implementation such that response to such request will include supported X-IPNI-Allow-Cascade
header.
When a backend service is being redeployed it may be momentarily unreachable. Ideally, indexstar should implement a circuit breaker pattern to avoid blindly hitting hosts that are down for every incoming request.
When making a batch find request for multiple multihashes (using POST request method), the batch query does not work when running indexstar with the --translateNonStreaming
flag, and works without the flag.
Indexstar logs associated with the query:
{"level":"warn","ts":"2023-05-02T20:03:30.255Z","logger":"indexstar/mux","caller":"app/find_ndjson.go:202","msg":"Request processing was not successful","backend":"ber-indexer:3000","status":405,"body":"\n"}
{"level":"error","ts":"2023-05-02T20:03:30.255Z","logger":"indexstar/mux","caller":"app/scatter_gather.go:44","msg":"failed to scatter on target","target":"ber-indexer:3000","err":"status 405 response from backend ber-indexer:3000","maxWait":2}
{"level":"warn","ts":"2023-05-02T20:03:30.255Z","logger":"indexstar/mux","caller":"app/find_ndjson.go:202","msg":"Request processing was not successful","backend":"cali-indexer:3000","status":405,"body":"\n"}
{"level":"error","ts":"2023-05-02T20:03:30.255Z","logger":"indexstar/mux","caller":"app/scatter_gather.go:44","msg":"failed to scatter on target","target":"cali-indexer:3000","err":"status 405 response from backend cali-indexer:3000","maxWait":2}
{"level":"warn","ts":"2023-05-02T20:03:30.257Z","logger":"indexstar/mux","caller":"app/find_ndjson.go:202","msg":"Request processing was not successful","backend":"dhfind-cluster-ip:40080","status":400,"body":"input isn't valid multihash\n"}
{"level":"error","ts":"2023-05-02T20:03:30.257Z","logger":"indexstar/mux","caller":"app/scatter_gather.go:44","msg":"failed to scatter on target","target":"dhfind-cluster-ip:40080","err":"status 400 response from backend dhfind-cluster-ip:40080","maxWait":2}
{"level":"warn","ts":"2023-05-02T20:03:30.257Z","logger":"indexstar/mux","caller":"app/find_ndjson.go:202","msg":"Request processing was not successful","backend":"ago-indexer:3000","status":405,"body":"\n"}
{"level":"error","ts":"2023-05-02T20:03:30.258Z","logger":"indexstar/mux","caller":"app/scatter_gather.go:44","msg":"failed to scatter on target","target":"ago-indexer:3000","err":"status 405 response from backend ago-indexer:3000","maxWait":2}`
Add an endpoint that produces response with same schema as storetheindex on GET /providers
. Note that conflicting results may be returned from different indexer nodes. The majority of such conflicts can be resolved using the timestamp in the responses where the combined result would converge on the most recent.
The collection of providers returned by GET /routing/v1/providers/{CID}
should be paginated to the default limit of 100.
See IPIP-337 for the specification on pagination support.
It wold be nice if indexstar also mimics the admin API exposed by a regular indexer node, and upon receiving a request it sends it to all configured backends. Example usage: trigger manual sync from a specific provider. Without it, the admin sync command needs to be executed on each node individually.
Note the admin API should be exposed on a different port and be excluded from public ingress.
curl "https://cid.contact/cid/bafybeidbjeqjovk2zdwh2dngy7tckid7l7qab5wivw2v5es4gphqxvsqqu"
fails with a 404
note that the x-ndjson variant succeeds
curl -H "Accept: application/x-ndjson" "https://cid.contact/cid/bafybeidbjeqjovk2zdwh2dngy7tckid7l7qab5wivw2v5es4gphqxvsqqu"
is okay
Currently, cid.contact returns all HTTP nft.storage announcements like this (example):
{
"Protocol":"unknown",
"Schema":"unknown",
"ID":"QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp",
"Addrs":["/dns4/dag.w3s.link/tcp/443/https"]
}
The problem is that results with "Schema":"unknown"
are being ignored by boxo client (conformed in ipfs/boxo#422 (comment)), and we would like Kubo and other users of boxo client library to access ID
and Addrs
information present there (+ setting standard for doing the same in JS in Helia).
@masih would it be possible to change to Schema: "peer"
(from IPIP-417) for these /routing/v1/providers responses?
All we need is:
{
- "Protocol":"unknown",
- "Schema":"unknown",
+ "Schema":"peer",
"ID":"QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp",
"Addrs":["/dns4/dag.w3s.link/tcp/443/https"]
}
Handle HEAD and OPTIONS on root path /
Add metrics to measure of the results returned what count had what protocol ID, for GraphSync, BitSwap and Other.
Allow runtime configuration change for the list of backends
If the first backend is dhfind, web ui results in 404 regardless of what endpoint is configured as fallback.
Now that dhstore supports non-encrypted lookups (dhfind), there is no need to separate dhstore backends and not-dhstore backends, unless we intend to allow mixed dh and non-dh storage deployments.
the verify ingest tool uses this interface for validation
Add metrics that measure the number of requests requesting PUT
on /routing/v1/*
in order to build understanding of volume event though PUT
on that path is not implemented by indexstar.
dev.cid.contact
ipni/storetheindex#1125cid.contact
ipni/storetheindex#1127@gammazero just pushed a release tag: v0.0.7.
Please manually verify validity (using `gorelease`), and update `version.json` to reflect the manually released version, if necessary.
In the future, please use the automated process.
as backends have differing latencies, we should be able to return fast results we know about without waiting for all backends to return.
Reduce downtime for the clients by handling SIGINT gracefully; the service should:
Once this feature is added, the termination timeout should be adjusted in K8S manifests to allow the container to wait enough for the requests to complete. Note the default timeout is 30 seconds which in the case of indexstar
is plenty considering the current HTTP timeout configurations.
The pattern used for finding encrypted metadata is "Find First" which is slightly different than scatter gather. There is room for optimisation here where we do not wait for all results to come back before responding to the client.
For now what's here is fine; we should capture an issue to later optimise this.
Originally posted by @masih in #63 (comment)
this route is currently just sent to the fallback backend
A set of backend types were introduced to then selectively route traffic like this. But a matcher functionality exists already that would select appropriate backends depending on the incoming request.
The matcher mechanism is much more extensive since it allows us to inspect HTTP requests fully. It also results in a less verbose and more readable code to select backends depending on incoming requests. An example of this is already present for cascading backends here.
Refactor the code to use matchers instead of backend types for choosing which route to take for a given request.
streaming is implemented for all backend but missing hookup for HTTP delegated routing endpoints.
The indexstar logs are filled with an extremely high volume of messages about context Canceled and DeadlineExceeded. There are so many such messages that it makes it difficult to find anything more meaningful. Canceled context messages should be logged at debug level.
Add a health endpoint that can be used to configure readiness checks in k8s. This will allow zero downtime rollout of index star deployments
The IPNI reader privacy provides an additional endpoint to fetch encrypted metadata by base58 encoded key. expose this in indexstar.
Add metrics for:
Once metrics are implemented in go, manifests need to be updated to add prometheus pod monitor, and the deployment needs to be updated to export the debug and metrics server if it is separated from HTTP server.
Invalid requests are forwarded to the backends regardless, even though they are destined to fail. This causes spikes in upstream latency.
Check find requests for:
indexstar integration with dhstore does not find double hashed multihashes even though they exist and are found successfully via dhstore GET /multihash API.
Example multihash: 2wvkjD7mo3EfXptjbYX1tUGv6eouczbyYrP1k11RaKYB1sP
Integrate DHT lookup cascade with optional query parameter check which should result in search on both IPNI and the IPFS DHT.
The double hash multicodec code is not dedicated to encrypted records. Therefore we cannot implicitly assume that it represents one.
Separate encrypted lookup path by adding /encrypted
prefix.
Context:
In cases where 404s can potentially be satisfied in a later time, e.g. max internal timeout was hit, or 5 minutes caching of 404s, add extra headers to aid http clients make better decisions on retries.
See:
It should return 400/404 depending on the validity of peer ID; instead we get:
$ curl -qsv https://indexstar.prod.cid.contact/providers/fish
...
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 200
< date: Tue, 18 Oct 2022 12:17:43 GMT
< content-type: application/json; charset=utf-8
< content-length: 58
< strict-transport-security: max-age=15724800; includeSubDomains
< access-control-allow-origin: *
< access-control-allow-credentials: true
< access-control-allow-methods: GET, PUT, POST, DELETE, PATCH, OPTIONS
< access-control-allow-headers: DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization
< access-control-max-age: 1728000
<
* Connection #0 to host indexstar.prod.cid.contact left intact
{"AddrInfo":{"ID":"","Addrs":[]},"LastAdvertisement":null}
$ curl -s https://dev.cid.contact/routing/v1/providers/bafykbzacedzugqdzfjjbd4pgsvwrpkfdpk7rzfrsgwqj2kehxv2ffjy2tlny6 | jq -r '.Providers[].Protocol'
transport-bitswap
transport-graphsync-filecoinv1
The correct protocol id for bitswap is bitswap
. (These are intentionally different from multicodec names so that they can be versioned without having to add a new multicodec code.)
Finding providers via /reframe
when more than one backend is configured fails with:
unexpected content after end of json object
2022-08-23T15:36:48.258+0100 ERROR service/client/delegatedrouting proto/proto_edelweiss.go:1233 client received error response (unexpected content after end of json object)
Results return by reframe aggregation logic filter out duplicates by provider ID. But HTTP find and reframe translator do not do such filtering.
Ideally, the response should not include duplicate provider information.
Pebble exposes key count estimation, which is exposed via Pebble store implementation API.
Indexstar already knows about all of dhstore instances, configured as --backend
.
Expose an endpoint in indestar /stats to return the total number of unique pebble keys. Make sure to cache the requests to pebble, since getting the estimated count is expensive. So indexstar should cache the response of /stats and update as needed with expiry of 30 minutes or so.
To truly measure this, we would have to enable event logs on cloudfront which would log every request flushed to a S3 bucket on hourly basis. Then write code to process it, filtering requests by the gateway IPs.
Add metrics to answer what fraction of queries are getting content only from the cascaded DHT source, as a way of understanding indexer DB’s fraction of total.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.