Code Monkey home page Code Monkey logo

Comments (16)

HeinrichApfelmus avatar HeinrichApfelmus commented on July 18, 2024 3

… which is weird, because the test execute accumE, issue #261: specifically tests a very similar network and doesn't detect any growth in network size.

Fixed it: The issue was an accumulation of finalizers, which was due to a short-cut that I took when implementing insertEdge. As of commit 0044039, the program mentioned in #261 (comment) now runs in constant space, as indicated by the heap profile obtained through cabal run space -- +RTS -hT -RTS && hp2pretty space.hp. 🥳

Unfortunately, the earlier program mentioned in #261 (comment) , involving observeE and switchE, does not run in constant space yet. 🤔

from reactive-banana.

ocharles avatar ocharles commented on July 18, 2024 2

Ok, I might have another fix:

 connectChild parent child = do
     w <- mkWeakNodeValue child child
     modify' parent $ update childrenP (w:)
+
+    -- Add a finalizer to remove the child from the parent when the child dies.
+    case child of
+      P c@(Ref r _) -> addFinalizer r $ removeParents c
+      _ -> return ()
+
     mkWeakNodeValue child (P parent)        -- child keeps parent alive

The idea is pretty trivial - when a Pulse is unreachable by the GC, then remove it from all parents. I think that the only thing we're actually leaking (in the examples here) is the Weak value for the child and the cons cell in the parents list of children. I tested this with the following (even simpler) repro:

{-# language BlockArguments #-}
module Main where

import Control.Monad
import Control.Monad.IO.Class
import Data.Functor
import Reactive.Banana
import Reactive.Banana.Frameworks
import System.Mem
import System.Mem.Weak
import Control.Concurrent (threadDelay, yield)

withGhcDebug = id

main :: IO ()
main = withGhcDebug do
  (ah1, fire1) <- newAddHandler

  actuate =<< compile do
    e <- fromAddHandler ah1

    e2 <- execute $ e $> do
      accumE () (id <$ e)

    reactimate $ return () <$ e2

  performGC
  putStrLn "Running"

  replicateM_ 10000 $ do
    fire1 ()
    performMajorGC
    -- yield so finalizers can run.
    yield

  putStrLn "Done"

Ran with -hi profile and analyzed in eventlog2html, we see

image

Some noise, but that clear blue line is the signal - a clear leak.

With the fix above, we get:

image

But I also have to run 10x the amount of iterations otherwise it terminates too quickly!

So I think I've got a good handle on at least one fix. I think the way to proceed from here is to add a finalizer when we call newLatch or newPulse though - connectChild is obviously the wrong place.

from reactive-banana.

HeinrichApfelmus avatar HeinrichApfelmus commented on July 18, 2024 2

I have fixed an additional space leak with switchP, where old pulses would not be disconnected from the network. The cause is somewhat embarrassing: do notation was picking up the reader monad instead of the IO monad, and the function clearPredecessors that disconnects the old pulses was just never called. 😳

But now, as of commit 29776bd , the heap profile for all programs mentioned in this thread is constant. 🥳 Here is the heap profile for @ocharles' initial example program:

cabal run space -- +RTS -hT -RTS && hp2pretty space.hp

space

In addition, several variants of the space leaks reported here are now part of the automated test suite — the tests will fail if the network grows unexpectedly.

I think that it's time to successfully close this issue. 💪 If you do find more space leaks, please don't hesitate to bring them to my attention!

from reactive-banana.

luke-clifton avatar luke-clifton commented on July 18, 2024 2

I'd watch it 🤷

from reactive-banana.

luke-clifton avatar luke-clifton commented on July 18, 2024 1

What a marathon effort! I've been reading these updates with bated breath, each one delivering a fraction but leaving me wanting more. Congratulations 🎉

from reactive-banana.

mitchellwrosen avatar mitchellwrosen commented on July 18, 2024

Nice, can you share how you made the pretty graph?

from reactive-banana.

ocharles avatar ocharles commented on July 18, 2024

Sure! I just used eventlog2html 😄

More specifically:

$ cabal run leak -- +RTS -l -hi

and build with -rtsopts -eventlog.

I often also build with -finfo-table-map -fdistinct-constructor-tables, as per https://well-typed.com/blog/2021/01/first-look-at-hi-profiling-mode/

from reactive-banana.

ocharles avatar ocharles commented on July 18, 2024

The same problem is present with just dynamic event switching - no need to bring Behaviors in:

{-# language BlockArguments #-}
module Main where

import Control.Monad
import Data.Functor
import Reactive.Banana
import Reactive.Banana.Frameworks
import System.Mem
import System.Mem.Weak

withGhcDebug = id

main :: IO ()
main = withGhcDebug do
  (ah1, fire1) <- newAddHandler

  actuate =<< compile do
    e <- fromAddHandler ah1

    let e2 = observeE $ e $> do
          accumE () (id <$ e)

    e3 <- switchE never e2

    reactimate $ return <$> e3

  performGC
  putStrLn "Running"

  replicateM_ 10000 $ do
    fire1 ()
    performGC

I'll try and solve this leak first.

from reactive-banana.

ocharles avatar ocharles commented on July 18, 2024

Ok, the fix for both of these leaks isn't hard - we can just modify doAddChild to cull any dead children:

doAddChild (P parent) (P child) = do
    level1 <- _levelP <$> readRef child
    level2 <- _levelP <$> readRef parent
    let level = level1 `max` (level2 + 1)
    w <- parent `connectChild` P child

    -- Remove any dead children. These three lines are new.
    let alive w = maybe False (const True) <$> deRefWeak w
    children' <- filterM alive . _childrenP =<< readRef parent
    modify' parent $ set childrenP children'

    modify' child $ set levelP level . update parentsP (w:)

But I'm not particularly happy with this solution. When the switchE fires I feel we should be able to propagate this information all the way up to e. I'll have a think about how to do this.

from reactive-banana.

HeinrichApfelmus avatar HeinrichApfelmus commented on July 18, 2024

When implementing this, I was hoping to use finalizers to remove dead children — i.e. when switchE switches to a new event, the old event may become garbage and the corresponding finalizer would remove it from the _childrenP field.

Hm. Finalizers are run concurrently, but to keep our sanity, changes to the network need to be sequential and scheduled (e.g. using a writer part of Build monad). Perhaps we should implement our own GC pass that is executed at the end of every step, and the finalizers simply tell our GC more specific information about which weak pointers it should remove? 🤔

(One issue that I didn't think deeply enough about is the question of how fast we can remove transitive dependencies. I.e. event e3 may depend on e2 which depends on e1. Now, if e3 is not used anymore, then both e2 and e1 can be garbage collected, but that should preferably happen in a single GC pass as opposed of two GC passes where we first discover that e2 is dead and only in the next pass that e1 is also dead because e2 is dead. The GHC GC does this alright, but any GC addendum that we implement might not.)

from reactive-banana.

ocharles avatar ocharles commented on July 18, 2024

@HeinrichApfelmus I've also thought about using finalizers, but the whole thing seems a lot more complex/action-at-a-distance than it needs to be. As far as I'm aware, we always have the entire graph right in front of us, through a Pulses parent/children lists. So if we dynamically switch away from something, we should - at that point - be able to find strongly connected components within this graph that are no longer reachable and nuke the whole lot.

I don't like finalizers partly because it's unclear when they will run, but more that it's unclear if they will run at all! I'd hate to be in a position where I accumulate just enough garbage to impact performance, but not enough to trigger the right generation GC to solve the problem.

from reactive-banana.

HeinrichApfelmus avatar HeinrichApfelmus commented on July 18, 2024

So if we dynamically switch away from something, we should - at that point - be able to find strongly connected components within this graph that are no longer reachable and nuke the whole lot.

Yes and no. The trouble is twofold:

  1. The program may reference a Pulse (e.g. through an Event) even though that Pulse is currently not an active part of the network — but it may become part of the network again later. For example, a switchE periodically switching between two events e1 and e2, has this property — both events need to be kept alive (especially if they involve state), but only one of them is in the transitive closure of the current list of reactimate. This implies that we do need help from the garbage collector.
  2. Conversely, the garbage collector may still think that a Pulse is alive during a switchE, even though that Pulse becomes dead through the switch. Hence, the garbage collector may have some delay, and tell us that a Pulse can be removed only some time after the moment of switching. This implies that we need to expect help from the garbage collector in an asynchronous manner.

I don't like finalizers partly because it's unclear when they will run, but more that it's unclear if they will run at all!

I do agree that the documentation on finalizers is rather pessimistic. However, I feel that we may not have a choice, and in practice, it does not seem too bad (well, if it is bad, then we can report this as a bug in GHC. 😄)

from reactive-banana.

ocharles avatar ocharles commented on July 18, 2024

Yea, I was thinking over 1 yesterday! Thanks for sharing. Something I also want to do is try modelling our graph in Alloy and to use a model checker to work out the complexities here!

from reactive-banana.

ocharles avatar ocharles commented on July 18, 2024

A note to myself as to why we can't just use Ref:

  • Assume we have some chain of Pulses p1, p2, ... pn, where each pulse is a child of the previous (so p1 is the parent of p2, etc).
  • We have some top-level pulse p which is the parent of p1.
  • Now, introduce pX, which is derived by dynamically switching between some other Pulse and pn.
  • If we clean up the graph the instant pX switches out of pn, then we'll end up detaching p1 from p.
  • However, if we switch back to pn, then we'll never get any events, because pn is disconnected from p!

This is why we think we need help from the GC. When pX switches out of pn then yes, pX should reparent. But we do still need to keep pushing p through pn because the dynamic event switch will keep pn alive. I agree that it's going to be very hard to do this without letting the GC inform us.

Note that if we used dynamic event switching and switched out of pn and don't have the possibility of switching back (e.g., something uses never or some other mechanism that makes it impossible), then we'd lose any strong pointer to pn allowing it to be GCed.

I need to think about promptly cleaning up a whole sequence of Pulses, but otherwise this is taking shape

from reactive-banana.

HeinrichApfelmus avatar HeinrichApfelmus commented on July 18, 2024

Thank you @ocharles !

I have revisited this problem and decided to redesign the low-level implementation entirely in order to separate concerns better. I have created a Graph data structure without garbage collection and a GraphGC data structure with garbage collection; this allows us to automatically test that garbage collection indeed behaves as desired. #268

The following program now runs in constant space:

{-# language BlockArguments #-}
module Main where

import Control.Monad
import Control.Monad.IO.Class
import Data.Functor
import Reactive.Banana
import Reactive.Banana.Frameworks
import System.Mem
import System.Mem.Weak
import Control.Concurrent (threadDelay, yield)

withGhcDebug = id

main :: IO ()
main = withGhcDebug do
  (ah1, fire1) <- newAddHandler

  actuate =<< compile do
    e <- fromAddHandler ah1

    e2 <- execute $ e $> do
      accumE () never -- previously: accumE () (id <$ e)

    reactimate $ return () <$ e2

  performGC
  putStrLn "Running"

  replicateM_ 10000 $ do
    fire1 ()
    performMajorGC
    -- yield so finalizers can run.
    yield

  putStrLn "Done"

… but the program with accumE () (id <$ e) instead of accumE () never does not. 🤔 … which is weird, because the test execute accumE, issue #261: specifically tests a very similar network and doesn't detect any growth in network size.

from reactive-banana.

HeinrichApfelmus avatar HeinrichApfelmus commented on July 18, 2024

What a marathon effort! I've been reading these updates with bated breath, each one delivering a fraction but leaving me wanting more. Congratulations 🎉

Thanks! 😊 What do you think — maybe I could pitch the story of these update to Netflix? 🤔

from reactive-banana.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.