Request from <a class="user-mention notranslate" data-hovercard-type="user" data-hover

for the meta question: The IPNI team has some measurement infr

Awesome writeup <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Lookg good <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Hi everyone, following up on our <a href="https://www.notion.so/pl-strflt/2023-05-16-C

RFM: IPNI Lookup Performance,about probe-lab/network-measurements

Comments (19)

dennis-tra commented on August 26, 2024 1

We discussed this issue during our colo yesterday, and here are a few questions and remarks that came up on our end:

We'd like to integrate IPNI measurements into our parsec code base. In our current experiments, one node publishes provider records for random content into the DHT, and all the others look up these provider records. Two questions:
1. In the case of IPNI I believe all lookup requests for a single CID (except for the first one) would hit a cache, right? We could easily change the lookup logic to only lookup a CID once, but it would be great to clarify what exactly we want to measure. It could be cached, uncached or both performances.
2. We are just interested in the provider-record lookup performance, right? Our experiment is currently lacking code to actually retrieve the content.
Is publication performance also of interest in this IPNI measurement? The issue title suggests it's just the lookup performance. If we also wanted to measure the publication performance, I'd like to precisely define the stopwatch start/stop points (this probably depends on the next point).
What is the simplest way for us to publish CIDs to indexers? The index-provider seems to fit our purpose well.

A meta question:

Isn't the IPNI team already measuring and tracking the lookup performance? If so, what value would we add here? I'm not opposed to the idea, but just want to clearly articulate the goal here. Some things we could potentially offer in addition to what the IPNI team is already doing:
- independent measurements
- measurements from different vantage points around the world
- tight integration into our measurement infrastructure, which means easy integration into our measurement hub probelab.io or general KPI's.

cc @BigLep, @masih, @TorfinnOlsen

from network-measurements.

willscott commented on August 26, 2024 1

for the meta question:

The IPNI team has some measurement infrastructure set up, mostly as a way to make sure our service is online, and working properly. we aren't doing any real analysis - and what exists is mostly reactive. Having someone think about what the right system is for measurement here, and take a bit of time to draw out conclusion rather than just purely correctness likely can find a lot of value-add over what exists from the IPNI side currently.

from network-measurements.

masih commented on August 26, 2024 1

try to avoid testing custom code paths

non cached path exist already. The statement above makes sense 👍

Do you have any other concerns about this approach? Would the second point be a prohibiting factor here?
Do you think this is within the realm of what's possible?

No concerns, all good.
Thanks for explaining Dennis

from network-measurements.

BigLep commented on August 26, 2024 1

Awesome writeup @dennis-tra! I love the thinking in general and the specificity. Yeah, I'd like to make sure great notes like this don't get lost in an issue.

I'm good to go with your definition of "uncached" and "cached". Given numberOfCloudfrontPops >> numberOfAwsRegions, it seems reasonable to assume that the first request from any region will hit a non-cached POP.

I think you'd be fine to take requests beyond request number 2 as being cached, but leaving what you have sounds good too.

Thanks again!

from network-measurements.

BigLep commented on August 26, 2024 1

Lookg good @dennis-tra . Thanks! I think you're fine to resolve from my point of view.

from network-measurements.

yiannisbot commented on August 26, 2024

The ProbeLab team will discuss where this fits into our timelines and roadmap and will report back here.

from network-measurements.

BigLep commented on August 26, 2024

Thanks @yiannisbot for the response. A few thoughts related to the above:

Lets treat IPNI like the other content routing system we're measuring (public IPFS DHT). Having the same infra measuring it in a similar way enables more comparisons to be made about performance.
L7 caching is a property of the IPNI system. We should let that be part of the measuring. Presumably when looking at high percentiles (e.g., p99, p100) we're going to see the cache miss case.
- Bonus would be to have a special metric for the latency for the first call after publish so we can articulate precisely the cache miss latency. But again, I think this is bonus and not required if it's extra work.
A big value that probelab brings here is to report on "client side" latency (what a user will actually see) rather than what is being observed purely from the server-side. Any public service should have a pulse on client-side latency. Client-side latency is usually an extra setup for teams though, and it looks like probelab fortunately has this infra that can be used here :)
I don't think we need to focus on provide latency currently (or it at least is lower priority).
In terms of software to use for providing, I don't know the latest, but you can look at what @aschmahmann recently did in https://github.com/aschmahmann/icarus

from network-measurements.

yiannisbot commented on August 26, 2024

Great, thanks for the feedback everyone! Focusing on the retrieval performance only makes it much easier for us. We're looking into it and will likely have some progress done this or next week.

One extra note, just to make sure we set the right expectations: we currently have no real-time monitoring and alerting in place. Our workflow is: carry out experiments, gather results, analyse and produce results/plots on a daily or weekly basis (depending on the experiment). Alerting on problematic performance is not something we're currently focusing on and falls outside of the scope of our work - it might or might not become in the future, but we're not there yet.

from network-measurements.

dennis-tra commented on August 26, 2024

Hi everyone, following up on our WG meeting #11 from yesterday.

The most important takeaway from my side is that ProbeLab can indeed offer a unique perspective into client-side latencies.

Other areas that came up which are out-of-scope of this issue (as I see it):

IPNI size - What is the overlap of CIDs index by the DHT vs in IPNIs?
CID popularity measurement
Magnitude of traffic (IPNI, DHT, Bitswap)

For both items, there were ideas floating around, which are documented in the linked Notion page above. @yiannisbot, what do you think of separate issues or even RFMs? I could go ahead and create them.

For what we want to measure in the scope of this issue, I would do the following: Please point out flaws or a wrong direction.

I would configure our privileged node (scheduler) to generate random data and publish an advertisement to cid.contact. After the data was successfully indexed (see below, that this is a challenge), the scheduler would go ahead and instruct our seven distributed probes to do an HTTP GET on https://cid.contact/cid/{cid} and measure the experienced latency. Pseudo-Go code how the latency would be measured:

   start := time.Now()
   resp, err := http.Get("http://cid.contact/cid/" + cid)
   latency := time.Since(start)
   ...

AFAIU, the first successful request, would place the response for that CID in an IPNI cache (please correct me if I'm wrong here). Therefore, the remaining six nodes would experience a lower latency. So instead of instructing all seven nodes simultaneously to request the content, I would configure the scheduler to do it sequentially and for every new CID to start the sequence at a different node. This way, we can differentiate between cached vs non-cached latencies from all our vantage points.

Question to IPNI operators: This approach would spam the index with garbage data. If we went with this option, should we mark the advertisements somehow as synthetic or make an effort to cleanup after ourselves?

@masih, you mentioned yesterday that there could be a huge delay between advertising CIDs to Indexers and them actually having the information available. If I remember correctly, this usually takes a few seconds but can sometimes take up to a few hours (!). Here are some ideas on how we could handle this:

Depending on how often such long delays (a few hours) between publication and indexing happen, we could just live with it. We will probably aggregate latency data on a daily or weekly time frame, and even if there were such long publishing delays once a day, we would gather enough data to provide statistically significant metrics (I believe).

When the scheduler instructs the first probe to request the CID and the probe cannot find it (although the scheduler thinks it has advertised it), the scheduler instructs the probe repeatedly to request the CID (with a backoff and deadline of a few hours or so) until the probe receives a successful response. Doing it this way we could also roughly measure the time from advertisement to successfully indexed.
Instead of crafting random CIDs, we could use a predefined list which we know have been indexed and request them over and over. Optimally, the scheduler should be aware of cache eviction so that it instructs the probes only to request CIDs evicted from IPNI's cache so that we can still track the cached vs non-cached latencies.
Is there some other kind of mechanism for how the scheduler could know that cid.contact has indexed its advertisement? Would, at any point in the process, cid.contact reach out to the scheduler, which we could use as an indicator?
Another option?

Regardless of which option we choose, I would like this measurement setup to be agnostic to the actual indexing service. Then we could also measure FILSwan etc.

Let me know your thoughts :)

from network-measurements.

masih commented on August 26, 2024

Hi Dennis, thank you for the writeup. That makes sense 👍

1 and 2 both seem reasonable, though 2 seems to be a lot simpler.

Re 2, we can provide cache-backed and non-cache endpoints. That would then simplify the experiment where the endpoint is a parameter to the experiment, rather than the experiment having to be made aware of "caching".

Re 3, once can check cid.contact/providers/<provider-id> to see the CID of the latest advertisement processed. That is probably the easiest way to check. To answer your question, yes the indexer do reach out

Re 4, would it make sense to break this up into 2 scenarios, where:

the experiment uses a pre-computed set if CIDs for lookup that have been published already to IPNI. It then measures latency and correctness of lookup for two endpoints: cached and non-cached. This would help understand the lookup performance of IPNI as a routing system, i.e. the read path.
the experiments makes publication of random CIDs part of the experiment itself, where it:

generates a set of random CIDs
announces them to IPNI
measures the time it took from announcement to the point at which the CIDs became discoverable by IPNI

This would help understand the publication semantics in IPNI routing system, i.e. the write path.
Small footnote: there are two protocols over which the announcements can be made to IPNI, which most likely offer different performance characteristics. Both of them are implemented in index-provider library and would be a matter of config change to switch between them.

WDYT? Does that make sense?

from network-measurements.

dennis-tra commented on August 26, 2024

Hi @masih, thanks for the clarifications!

we can provide cache-backed and non-cache endpoints.

Does this already exist, or would it be a custom endpoint just for the experiment? If it's the latter, I'd be concerned that we won't be able to apply the measurement setup to other indexers (e.g., FilSwan, etc.). I'd be hesitant to assume that other operators will also expose this endpoint. I would also try to avoid testing custom code paths instead of the real ones (if possible, with reasonable effort, of course).

Re 3, once can check cid.contact/providers/ to see the CID of the latest advertisement processed.

Nice, that's actually exactly what I was looking for!

Re 4, would it make sense to break this up into 2 scenarios

Given the current structure of our measurement infrastructure, I would only do 2. and then look up the same CIDs after the above endpoint reports that the latest advertisement was indexed. It's not too complex to add the option to request pre-configured CIDs, but I'd rather keep the experiment stateless. By stateless, I mean that the experiment is not dependent on some pre-configured indexer state. The state here would be a set of necessarily indexed CIDs.

My pros for doing your option Re4 2. + lookups:

we would test the real production endpoints
technique can be applied to other indexers
we could measure publication performance (gossiping and/or HTTP)
embeds nicely with our current infrastructure

My cons for doing your option Re4 2. + lookups:

measurement would be cache-aware in the sense that the order in which we request CIDs matters. However, this is also nice to know instead of just comparing cache vs non-cache. E.g., performance improvement from 1st -> 2nd request is X, performance improvement from 2nd -> 3rd request is Y.
the measurement sample size is dependent on the indexer's indexing speed

Do you have any other concerns about this approach? Would the second point be a prohibiting factor here?

For reference, we're writing ~450 provider records to the DHT per day for our DHT performance measurement, which we then look up from our six other regions. This gives us more than enough data to calculate our metrics confidently. I think if we'll come anywhere close to providing 450/d (20/h) advertisements per hour to indexers we are fine. Do you think this is within the realm of what's possible?

from network-measurements.

BigLep commented on August 26, 2024

I like the approach you're taking here @dennis-tra to:

Keep this well-scoped
As much as possible, avoid doing cid.contact specifics

I would also vote for publishing random CIDs so we are more closely matching the DHT measurement process. We can give a timeout value for how long we'll wait before they become available (e.g., 5m). We can have a metric on how often we are unsuccessful for seeing a CID published during this time. If this becomes an issue, we can devise alternative solutions.

Concerning caching, one thing I realized that I didn't account for in my comment about caching and first call is that there are actually multiple "caches" here to warm up, particular for this globally distributed measuring. Assuming AWS Cloudfront is used, there are multiple POPs that will initially have a cache miss and need to hit the origin. We don't have control over which POPs our measurement instances are going to hit. Maybe we just make a few requests for each CID from each regional node. We can then look at the higher percentiles to infer what cache misses usually look like.

I do agree with my original comment about "Lets treat IPNI like the other content routing system we're measuring (public IPFS DHT). Having the same infra measuring it in a similar way enables more comparisons to be made about performance." That said, I do want to call out that measuring the client-side latency globally of a public HTTP service endpoint is not a new problem and that there are undoubtedly other services that can be leveraged here (although I personally am not up on the latest here). I think it makes sense for us to do what is being scoped here to get going, but it's also not a big lift for IPNI operators to set up some of this themselves as well if their needs/asks come beyond what we can support amongst other priorities.

Thanks again all

from network-measurements.

yiannisbot commented on August 26, 2024

what do you think of separate issues or even RFMs? I could go ahead and create them.

@dennis-tra I'm definitely in favour of treating them as small studies, so where we expect that a short report with links to the tools we used will make sense to write, let's do RFMs, I'd say 👍

from network-measurements.

BigLep commented on August 26, 2024

@yiannisbot @dennis-tra : I see there have been some PRs on this and that we have https://probelab.io/ipni/cid.contact/ - cool!

Some feedback since the PRs have been merged:

probe-lab/website#30
probe-lab/website#31
(Feel free to copy this feedback elsewhere if you don't want it here.)

Meta: I think it would be great to be referencing the issue(s) that spawned the PR so that we get the automatic 2-way linking between issue/PR.

General: anytime we're showing percentiles, I want to find out the sample size so I can figure out how much confidence to put in the percentile values.

Description feedback:

We don't describe that we publish a random CID before querying cid.contact. I think that's an important detail.
"We start the stopwatch right before we do the https://{ipni}/cid/{cid} HTTP request to the indexer and stop it when we have received and parsed the full record". I assume we stop the stopwatch when we have received and parsed the first full record. Is that right?
I don't think our breakdown of "cached" vs. "uncached" is fully accurate given the multiple Points Of Presence (POPs) at play here. (AWS Cloudfront has 400+ POPs currently.) We can reason that the first request from the first region is cached. We'd expect that the following requests from that same region are cached. We don't know about the first requests from other regions though. They may be hitting different POPs and thus not be cached. If we want to have really clear delineation of what uncached vs. cached mean, I think we should follow something like:
- uncached: first request from the first region
- cache: request number X or higher from any region (where X could be something like 3 or 5?)
(nit) In the graph legend, I would put "uncached" before "cached" since "uncached" responses will come before "cached" ones.
"DHT Comparison": maybe get "p90" into the title or legend so someone has the relevant context if they snapshot the graph.
The description at the end about "The network indexer’s lookup latencies are divided into two categories" seems redundant.

from network-measurements.

dennis-tra commented on August 26, 2024

@BigLep great, you had a look!

General: anytime we're showing percentiles, I want to find out the sample size so I can figure out how much confidence to put in the percentile values.

Noted. Ian and I also discussed this in the past and wanted to add it eventually 👍

We don't describe that we publish a random CID before querying cid.contact. I think that's an important detail.

This is buried in our tools section but also not really clear there. Definitely something we should address 👍

"We start the stopwatch right before we do the https://{ipni}/cid/{cid} HTTP request to the indexer and stop it when we have received and parsed the full record". I assume we stop the stopwatch when we have received and parsed the first full record. Is that right?

I don't think our breakdown of "cached" vs. "uncached" is fully accurate given the multiple Points Of Presence (POPs) at play here. (AWS Cloudfront has 400+ POPs currently.) We can reason that the first request from the first region is cached. We'd expect that the following requests from that same region are cached. We don't know about the first requests from other regions though. They may be hitting different POPs and thus not be cached. If we want to have really clear delineation of what uncached vs. cached mean, I think we should follow something like:

uncached: first request from the first region

cache: request number X or higher from any region (where X could be something like 3 or 5?)

Let me describe the process and deployment in detail (I may copy this to the website), which hopefully addresses your points.

The scheduler node in our privileged us-east-1 region instructs one of our seven server nodes to publish a random CID to either the DHT or an IPNI. If the scheduler instructs a server to publish to an IPNI, the server goes ahead, and announces its possession of the CID/content to an IPNI (instead of writing provider records to the DHT). Each server is configured to interact with a single IPNI, which is `cid.contact for now [0].

Announcing the possession of a CID/content can be done via HTTP and/or GossipSub and involves the generation of an advertisement that has its own CID (AdCID). For now, we're only using the HTTP method and make a request to the /announce endpoint of cid.contact. In our case, the advertisement actually will include two CIDs 1) the original one that was received from the scheduler and 2) another random "probe" CID, which will become relevant later.

At this point, the CID was not indexed yet, so the server waits until cid.contact reaches out to it to sync the advertisement chain. To detect that moment, we started watching GraphSync events for the AdCID right when we called the /announce endpoint. After the IPNI reached out to the server and we received a GraphSync completion event for the AdCID, we can be relatively certain that the CID was indexed. However, there might still be some internal IPNI delay until it is fully indexed. Therefore, the server requests the "probe" CID to check if the advertisement was indeed indexed [1] [2]. If the request for that CID returns a record, we know the advertisement was indexed, and the server gives control back to the scheduler, which then starts instructing the remaining servers to request the original CID.

The above might sound quite convoluted, and I could have taken shortcuts by, e.g., using special, uncached endpoints. However, I wanted to make the setup as agnostic to the specific IPNI as possible. Other instances might not expose uncached endpoints.

Now, the scheduler instructs the other servers to retrieve the CID from the indexer by calling the /cid/{cid} endpoint. Every time a retrieval request resolves, the scheduler instructs that server to retrieve the CID again (up to five times). Each server measures the latency of individual requests. The stopwatch starts right before the HTTP request starts and stops right after we received and parsed the response.

Each of the above request latencies becomes an individual data point in our database. Now, what do we classify as uncached/cached?

Uncached

As opposed to "first request from the first region" I'm currently classifying uncached as "first request from any region." My assumption here is that the request from one region doesn't affect the cache in another region.

I think your classification is definitely accurate. In practice (looking at the data), the current classification also seems accurate. I can easily adjust the plots to only consider the first request from the first region as "uncached" - this would only have a hit on the sample size. Definitely no strong opinion here; just wanted to shed light on what's going on.

Cached

As opposed to "request number X or higher from any region (where X could be something like 3 or 5?)" I'm currently classifying cached requests as only "request number 2 from any region". I'm basically discarding all data points >2 at the moment because I wasn't sure if the cache responses of requests >1 differ somehow. Therefore, I didn't want to mix them up. Also happy to change that.

[0] If we wanted to measure other IPNIs, we'd need to deploy seven additional servers. I couldn't configure a single libp2p host to interact with two "IPNI provider engines" because they'd share the same data transfer manager, which led to errors. I figured this was not a big problem because the servers can run with almost the smallest available resources (0.5CPU + 1GB Memory - perhaps even fewer resources would work). You could argue I could just spawn multiple libp2p hosts on a single server, but this would have involved a bit of AWS configuration to open up ports, update security rules, and code changes.

[1] If the request for the probe CID returns a 404 (aka the advertisement was not indexed at this point), the negative response is cached for 5 minutes, so even if the indexing was done just a second later, the server would only find out after 5 minutes. From my observations, this happens super rarely. An improvement here would be to generate multiple probe CIDs and loop through them with some delay in between.

[2] The server cannot just check if the latest advertisement for the provider as returned by /providers/{serverPeerID} matches the AdCID because the /providers/{serverPeerID} response is cached for one hour. So a subsequent round of publications won't see the updated record.

from network-measurements.

yiannisbot commented on August 26, 2024

Thanks for leading this effort @dennis-tra! We've now got: https://probelab.io/ipni/cid.contact/ - is there anything else needing to be done with this issue, or are we good to close it?

from network-measurements.

BigLep commented on August 26, 2024

Happy to discuss, but for closing this, I think we want:

Reporting on "n" - number of data points. Maybe can graph that on the right y axis of https://probelab.io/ipni/cid.contact/#ipni-trend-latencies-cidcontact-plot with a dotted line?
More of the details on how the measurement is done in terms of publishing a random CID and the cached vs. uncached methodology. I'm fine with a shortcut for now of just linking to this issue, but lets give a way for someone to find out this info.

(I don't think someone will be able to self-service go from https://probelab.io/ipni/cid.contact/ to this issue unless we add a link. To help make that more self service in future, I could imagine a footer at the bottom of each page which is "view this page history on github". Presumably from there one could find the commit history → PRs → linked issues.)

from network-measurements.

dennis-tra commented on August 26, 2024

I addressed both of your points @BigLep in probe-lab/website#72. I already merged it but feel free to drop comments on that PR and I'll address them in a follow-up change 👍

from network-measurements.

yiannisbot commented on August 26, 2024

Closing issue - I think all comments were addressed. Feel free to re-open if needed, or start a new issue in this repo. Thanks everyone for contributing and @dennis-tra for the hard work! 👏🏼

from network-measurements.

RFM: IPNI Lookup Performance about network-measurements HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent