Code Monkey home page Code Monkey logo

Comments (3)

pvh avatar pvh commented on August 18, 2024 1

I'm going to reopen this because 2.5s is still quite long for a document this small. Thanks for the trace, though. The save() seems like about what I'd expect at 180ms (of which 40% is compression).

Looking at the rest, we've got several big processes running in this trace. In order:

  1. 500ms - receiveSyncMessage
  • 50% of which is calls to sha256 (!?) isn't this all in one change()? what's getting all that hashing?
  1. 1300ms - progressDocument
  • this is building the JSON object for the frontend
  • you mentioned there are a lot of fields; this may be hard to fix for now
  1. 200ms - saveDoc
  • we should have the whole doc in the sync message, why don't we just save that;
  • this might require a little plumbing in automerge-repo and is not the lowest hanging fruit but is at least a straightforward 10% win
  1. 500ms - automerge_diff
  • @orionz shouldn't this be cached as the last value used? maybe we missed an edge case?
  1. 200ms - generateSyncMessage
  • hey, look: all compress again. another gratuitious save() call

Looking at this trace I think we could plausibly get ~1s improvement with a little bit of plumbing work (reuse save() content, use last diff cache). The progressDocument stuff feels like there shsould be room to improve (though I haven't looked closely enough to have a fix) and the receiveSyncMessage stuff itself should be helped by Orion's current memory work (though we'll have to wait and see.)

@kid-icarus one useful reference would be a performance comparison to raw JSON with JSON.{stringify/parse}. That can be our performance reference since it's likely about as well optimized as browser vendors are able.

from automerge-repo.

pvh avatar pvh commented on August 18, 2024

Awesome, @kid-icarus. Note that there's a little sleight of hand in here for file size -- we're deflating at some point.

I see you're on 2.1.10 in the package.json. There was recently a performance fix for documents with lots of small objects. I think it went into 2.11.13 (which I think I shipped in the most recent automerge-repo a few days ago.)

Can you see if/how much that helps?

That said, there's an eventual physical limit to how much we can optimize the message receipt for a very large message: you can imagine a 1GB text field, for example. Our long-term strategy here has a few components:

  • send a snapshot of the document first (this doesn't help for this synthetic case but large documents often have long, irrelevant histories)
  • stop sending all changes in a single round-trip message
  • allow automerge parsing to be interrupted -- we want to avoid blocking the render thread

Architecturally, there are two parts to Automerge: a CRDT that has to decide what the state of the world is/should be and a materialized view that represents the full document and is updated when changes occur. The latter has to live in the render thread (it's the state of your program, you need it there) but theoretically we could house the former elsewhere and communicate over some messaging channel.

Historically, Automerge worked this way (way back in the early, very slow days pre 1.0) but marshalling messages over thread boundaries is prone to performance and latency problems as well (not to mention architectural complexity for implementers), so in the 2.0 / Rust line we put an emphasis on "just be fast enough".

I think most of the work we've described above is "shovel ready" in the sense that we could move on it at any time if we had the development bandwidth to do so... but the project is not funded at a scale where we can do that. We are not a VC backed startup and our funding is earmarked to support our research goals.

So if you want us to do something, you can

  • wait (this is free but can be difficult),
  • fund us to do it,
  • do it yourself.

In the third case, I'll note that even sending things upstream via the DIY approach is very welcome but it does impose costs on us: we have to review code, write tests, cut releases, and otherwise maintain the software going forward. If you build a thing we don't want or like... we won't merge it (so if you want us to merge it, keep us in the loop during the design.) That said, we have a bunch of folks in the community who have just become committers on the project and we'd love to have more.

At the moment our big projects are to rebuild the sync system to better support many documents (funded by NLNet), deliver rich text support (funded by sponsors), work on memory use (needed for in-house work). I think left to my own devices my own next big push will be around auth stuff. Maybe some version control APIs. Hard to say.

Anyway, every bit of research like this is extremely welcome and helpful! Working on these problems is very much "on the list" and we'd love to get to them just as soon as we're able. Thanks for reading my rant :)

from automerge-repo.

kid-icarus avatar kid-icarus commented on August 18, 2024

Thanks for the timely response @pvh, I really appreciate it 🙏🏻

So first of all, apologies for not using the latest automerge(-repo) versions. I'm seeing a drastic improvement with automerge 2.1.13 and automerge-repo 1.1.15.

so in the 2.0 / Rust line we put an emphasis on "just be fast enough".

The 46 second wait is now roughly 3-4 seconds. I think this is certainly fast enough :) I'll go ahead and close this issue out. When not using RawString on all the strings, this increases to 9 seconds but roughly 90% of all the strings do not need to be collaborative and should really be RawStrings.

Without any changes, and using only RawString for strings in the document. Here is a CPU profile:
CPU-20240411T174355.cpuprofile

I'll take some time to respond to the rest of your comment, but in short I would love to contribute back in a way that isn't a burden on everyone else. I understand the challenges of being resource-constrained, and wouldn't open any PRs without gauging interest and keeping folks in the loop beforehand. Feel free to reach out to me in the #devs channel in Discord (or you can always DM me if any of these issues become frustrating). I know everyone is doing their best given project priorities and constraints.

from automerge-repo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.