Code Monkey home page Code Monkey logo

Comments (14)

danielmarbach avatar danielmarbach commented on July 20, 2024 1

@aarondandy you might be affected by

dotnet/SqlClient#1810

from azure-amqp.

aarondandy avatar aarondandy commented on July 20, 2024 1

Microsoft.Data.SqlClient 5.1.0-preview2.22314.2 brought the memory usage down from gigabytes to megabytes 🎉. Thanks for the tip @danielmarbach . I'm closing this as I'm pretty sure it's unrelated to this library now, @xinchen10 .

from azure-amqp.

jsquire avatar jsquire commented on July 20, 2024

//fyi: @xinchen10, @JoshLove-msft, @danielmarbach

from azure-amqp.

danielmarbach avatar danielmarbach commented on July 20, 2024

Probably from here?

return this.OpenAsync(new CancellationTokenSource(timeout).Token);

CancellationTokenSource with a timeout need to be disposed after usage because they are associated with a timer. Happy to send a PR

from azure-amqp.

danielmarbach avatar danielmarbach commented on July 20, 2024

#234 would fix at least that usage.

from azure-amqp.

aarondandy avatar aarondandy commented on July 20, 2024

Good find @danielmarbach . I'm not sure if that is the cause of my specific retention issue, but it might be 🤷‍♂️. It all depends on if the ASB client uses that method and I'm just too overburdened at the moment to investigate. It definitely looks like an issue though!

Regarding contributing a fix for the explicit timer dispose: I just don't have enough confidence I wouldn't do harm by touching those code paths. I can't wrap my head around the object lifetimes in that area and wouldn't know how to property test or validate any of it.

from azure-amqp.

xinchen10 avatar xinchen10 commented on July 20, 2024

@aarondandy how many ReceivingAmqpLink+ReceiveAsyncResult objects did you see? Also 10,000+? The receiving link always disposes the cancellation token registration when the async result is completed. If you saw roughly the same amount of ReceiveAsyncResult objects as CallbackNode, it would mean that the AMQP receiving link does not dispose the registration token in some cases. Otherwise it could be leaks from some other places where registration tokens are created, especially if you use the same cancellation token source for all the calls.

from azure-amqp.

aarondandy avatar aarondandy commented on July 20, 2024

The crazy amount of CallbackNode could be some EF, Polly, or other shenanigans we are up to. The core issue for us though in this context is just the TimerHolder that is hanging on to all of them. I don't have the dumps anymore, but there were maybe only 100-1000 TimerHolder instances in the finalization queue. Also, possibly related, I think there are some framework level optimizations with CallbackNodes which may be contributing to them growing so large but 🤷‍♂️. Even if there were fewer CallbackNodes I don't think our finalization queues would be processed fast enough to rely on finalization over object disposal.

Also, worth noting that I added a background thread to force the finalization queue to be processed and that has greatly reduced the symptoms of this issue for us.

from azure-amqp.

xinchen10 avatar xinchen10 commented on July 20, 2024

The timers are disposed in the receiving link. They show up in finalization queue because it has a finalizer method, but the method should be skipped after Dispose is called. Without a dump it is going to be difficult to find the root cause, unless we can reproduce it by simulating what your code does.

  • how do you use the amqp library, directly or via another SDK (e.g. Azure SB/EH)?
  • what is the timeout values when you call ReceiveAsync?
  • do most calls return messages or nothing?
  • do you create a new cancellation token source for every call, or reuse the same one?

from azure-amqp.

aarondandy avatar aarondandy commented on July 20, 2024

They show up in finalization queue because it has a finalizer method

Could be. I'm making the assumption that uses GC.SuppressFinalize as part of the dispose pattern and won't end up in the finalization queue as a result when explicitly disposed, but 🤷‍♂️.

Without a dump

I'm not comfortable sharing my dumps, but I can undo my finalization hack and let it run for a few days and we can hop on a call and pick through the pile together if you want. I haven't been able to make a minimum reproduction of this as I don't know of a good way to prevent the finalization queue from being processed timely on my local machine. I also just don't have time to dig into the internals of the library sadly.

how do you use the amqp library

The Azure service bus library

what is the timeout values when you call ReceiveAsync

I'm not really familiar with how that works but if this is related to the older Close issue I would guess they use 60 seconds

do most calls return messages or nothing

I think about each second we have at least one message, so it wouldn't be waiting too long. This is not a problem in our test environment where most ReceiveAsync calls would not retrieve a message, but that also means it has plenty of time to clean up the finalization queue.

do you create a new cancellation token

EF, Polly, and some of my code may create some linked CancellationTokenSource but all of them should lead back to the CancellationToken we get from ProcessMessageAsync in Azure Service Bus.

from azure-amqp.

aarondandy avatar aarondandy commented on July 20, 2024

I have a new memory dump with over 700mb retained by a Timer made by ReceivingAmqlLink+ReceiveAsyncResult. I can see that the TimerQueueTimer._canceled field is true. I can also see that the CancellationTokenSource+CallbackNode's CallbackState held by the cancellationTokenRegistration is no longer pointing to the ReceiveAsyncResult callback state, but a completely different object instance's state. I think this is enough to indicate that both timer.Dispose() and cancellationTokenRegistration.Dispose() have been called here as expected.

So, maybe even though the root is the Timer made by ReceiveAsyncResult the reason for this CallbackNode chain being so large is unrelated to the ASB or AMQP libraries like I thought. Looking at the remaining chain of CancellationTokenSource+Registrations, I wonder if this is actually due to Microsoft.Data.SqlClient failing to Dispose of a CancellationTokenRegristration so I guess I'll go poke around over there.

I'll hold onto this dump for a few days if anybody out there was curious enough to dig through it with me.

from azure-amqp.

aarondandy avatar aarondandy commented on July 20, 2024

@aarondandy you might be affected by

dotnet/SqlClient#1810

Oh I think this is exactly it! I feel like I came across this a month ago but wasn't smart enough to piece it together at the time. I'm not sure if this will work for me, but while I wait for that fix I wonder if wrapping the cancellationToken I get from ASB/AMQP in a new CancellationTokenSource would be enough to decouple the linked lists held by the ReceiveAsyncResult timer and the one used by SqlCommand, to let the GC take care of things quicker.

from azure-amqp.

aarondandy avatar aarondandy commented on July 20, 2024

Looking at the code behind CancellationTokenSource, I think I'm just doomed and going to need to try their preview release or go through constant service restarts 😬. I'll give the preview a try for a bit and if that resolves the issue then I think it definitely proves my issues are unrelated to ASB or AMQP.

from azure-amqp.

danielmarbach avatar danielmarbach commented on July 20, 2024

Damn sorry to hear. I wish I could be of more help, but I'm just a community sidekick occasionally sneaking into some repos making some noise.

from azure-amqp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.