Code Monkey home page Code Monkey logo

Comments (14)

TheSkorm avatar TheSkorm commented on July 18, 2024 1

It's pretty close to what I desire and would certainly make a good first pass of the feature. The only enhancement I would consider is making it path aware as http://test/0123/blah and http://test/0123/foo/blah would be classified as http://test/*/blah under a basic wild card approach when really they should be two entries.

from aws-xray-sdk-python.

metaskills avatar metaskills commented on July 18, 2024 1

Seen this issue in some of our Python work at Custom Ink. Subscribed to the issue to see where we land on this solution. Thanks ahead of time!

In the meantime, I think we have utilized wrapt to freedom patch record_subsegment and that seems to have been working well.

from aws-xray-sdk-python.

haotianw465 avatar haotianw465 commented on July 18, 2024

Hi,

Thank you for your time for the writeup. I agree with you. With the presence of path based routing, the SDK doesn't have enough information to do the grouping correctly. In your example it is very likely that the resource/* is served by a single service. But there are also counterexamples: data/1 is routed to control-plane fleet while data/2 is routed to data-plane fleet.

I think it really depends on who owns those services. If you own those services that serving resource/* and they are also integrated with X-Ray, they will also actively emit segments. The X-Ray service graph will respect the segment emitted from the server side other than caller. So if all resource/1, resource2 are served with a Django app which emits segment named my_service, then in service graph you will only see a node called my_service.

If you don't own the service that serving those requests, then the question comes down to if you want to aggregate them or not and aggregate up to which level on the url schema. Generally I would recommend to not aggregate anything because you don't know things behind the scene and it can be difficult to identify issues if all urls are aggregated. But I definitely would like to know more about your use case here. The SDK will for sure provide better customer experience by having more flexibility on user configuration.

Please let me know your thoughts.

from aws-xray-sdk-python.

pfreixes avatar pfreixes commented on July 18, 2024

Thanks for your answer @haotianw465, some comments about your comments

I think it really depends on who owns those services. If you own those services that serving resource/* and they are also integrated with X-Ray, they will also actively emit segments. The X-Ray service graph will respect the segment emitted from the server side other than caller. So if all resource/1, resource2 are served with a Django app which emits segment named my_service, then in service graph you will only see a node called my_service

Well in an organization that is starting to adopt the tracing technology, where only a few of services have this technology in place it's the perfect scenario which can't trust on the caller to name the node with the proper service name. True that at least for all of the services that you own at some point once the tracing technology is adopted all of the nodes will be named with the proper name. But it will take some time.

In any case, there is the case when you are calling an external service that you don't have any kind of control. In that case, the name of the node that will persist will be always the URL with all of the issues that I've commented.

Going back to your example of having two different URLs that are prefixed by the same hostname but can end up in a different service, data/1 and data/2. This is a scenario is quite similar to one of the use cases in our infrastructure. Where, there is an HTTP middleware that implements some commonalities such as Auth, Rate limiting and routes the traffic to downstream services using part of the URI to identify it univocally. In that case, I'm still thinking that the best option is to use the hostname to identify the node. So, having a unique node in the AWS Xray console that identifies all of the calls to this intermediate layer, while later, the downstream services that are behind to each URL path will have the chance to be identified also in the AWS Xray console with their proper name.

And last but not least take into account that the full URL is always recorded as a tag value.

I'm still thinking that the first part of the URL is the least bad.

from aws-xray-sdk-python.

haotianw465 avatar haotianw465 commented on July 18, 2024

Thank you for the explanation. I totally understand your concerns and this use case is very reasonable and common. I'm open to discuss a way of configuring custom url name capture. Please let me know if you have any suggestions of how you want to conveniently configure url renaming.

We have a dynamic naming configuration for middleware to name segment based on host header from incoming request. See https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-python-middleware.html#xray-sdk-python-middleware-naming. Please let me know if you would like to see something similar on subsegment naming for outgoing http requests.

from aws-xray-sdk-python.

TheSkorm avatar TheSkorm commented on July 18, 2024

Screen Shot 2019-05-09 at 8 27 46 pm

I just recently ran into this issue with a backend that uses url parameters for calling an external service.

What I was hoping for was either a config where I could something like

http://hostname/path/*/blah/*

where * gets aggregated with a placeholder in the uploaded segments.

Or alternatively prior to using requests being able to specify a custom name for that segment.

from aws-xray-sdk-python.

chanchiem avatar chanchiem commented on July 18, 2024

Hey, thanks for the feedback! We'll definitely take this into consideration for the UX for this feature request. Do you have any recommendations for how custom names for downstream requests should be specified at the SDK? Would similar to what was mentioned above by Haotian465 be a good experience for naming these traces? https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-python-middleware.html#xray-sdk-python-middleware-naming

from aws-xray-sdk-python.

sreid avatar sreid commented on July 18, 2024

I have also ended up with a graph much like @TheSkorm has. As an intermediary solution, would it be possible to configure in such a way so that you have the option of only using the domain name? (a boolean config vs custom path matching)

from aws-xray-sdk-python.

chanchiem avatar chanchiem commented on July 18, 2024

Can you elaborate more with what you mean?
As an example, let's say we have a key AggregateDomainName and it's a boolean flag.

Setting it to true would mean that any subsegment which has the same domain name would be aggregated into a single node by having their subsegments all have the same name with just the domain. This would mean that https://amazon.com/get/food and https://amazon.com/get/some/books would both aggregate as https://amazon.com/.

Is this an accurate depiction of what you're requesting?

from aws-xray-sdk-python.

sreid avatar sreid commented on July 18, 2024

Yes, that's along the lines of what I was thinking. It's not as configurable as the wildcard style matching mentioned previously but perhaps easier to implement?

Right now the requests tracing is not helpful in my particular situation, as there is no aggregation on response times/codes for HTTP requests (my graph looks similar to @TheSkorm )

from aws-xray-sdk-python.

chanchiem avatar chanchiem commented on July 18, 2024

Thanks for the response. We will consider both cases and mark this accordingly as a feature request.

The first case would be to allow customers to enable wildcards for url names and aggregate them based on the expression:
For example,
http://hostname/path/*/blah/* would aggregate the following as the same nodes
http://hostname/path/somepath/blah/unknown
http://hostname/path/diffpath/blah/somestring

The second case would be to aggregate whole nodes based on their host names:
This would mean that https://amazon.com/get/food and https://amazon.com/get/some/books would both aggregate as https://amazon.com/.

In either case, we would need a centralized system to be able to keep track of downstream calls. It would probably be better to utilize existing centralized sampling rules to to aggregate known domain names together, and have the SDKs name the downstream calls accordingly.

from aws-xray-sdk-python.

chanchiem avatar chanchiem commented on July 18, 2024

It turns out that the subsegment name should only contain the hostname of the endpoint that it's being targetted and has been a behavior consistent across all our other SDKs except this one.

WIth PR #192, does this fix the issue you guys were having? All then nodes were intended to be aggregated and not be differentiated into different nodes for each unique path.

from aws-xray-sdk-python.

sreid avatar sreid commented on July 18, 2024

Can confirm that the 2.4.3 release that includes #192 fixes the issue enough for me to make requests tracing usable. Thanks!

from aws-xray-sdk-python.

srprash avatar srprash commented on July 18, 2024

Hi @sreid
That's great! Hope this fixes the issue for everyone. In case the issue still persists, feel free to reopen this or create a new one.

from aws-xray-sdk-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.