Code Monkey home page Code Monkey logo

Comments (11)

bdkjones avatar bdkjones commented on June 20, 2024

I don't know much about how Realm's internals work. It seems like Realm is establishing a "history" of changes so that it can sync writes. If that's the case, it would be really great if asyncOpen() could at least load the latest "snapshot" of the Realm in a read-only state so that data can be displayed in the UI immediately while this restoration of changesets continues in the background. When the changesets are all integrated, then the Realm could be available for writes.

Unfortunately, I assume that this process works "from the bottom up", so that it's not possible to load the latest state of the Realm (even in a read-only state) until all 12,000+ changesets have been "integrated". That makes Realm Sync a really poor choice for any large, shared data store.

from realm-swift.

tgoyne avatar tgoyne commented on June 20, 2024

Each of the log messages are printed after committing a batch of changesets, and if you do a non-async open during bootstrap application you can read the data written so far. We don't release the write lock while this is happening so trying to write while bootstrap application is happening will block until it's complete.

I think our assumption around query bootstrapping would be that the download would be the slow part rather than the application, but that's clearly not the case here.

from realm-swift.

Jaycyn avatar Jaycyn commented on June 20, 2024

Question: What is your use case for .asyncOpen? The code presented in the question doesn't show how/why it's implemented.

There is a property wrapper in SwiftUI @AsyncOpen that leverages Realm.asyncOpen() which, according to the docs does have a progress indicator

We've got a fairly large flexSync dataset with about 10,000 objects and never really experienced any kind of delays in accessing the data. We're trying to understand how that would be used so we can avoid crazy long delays going forward.

from realm-swift.

bdkjones avatar bdkjones commented on June 20, 2024

Sure, happy to add context!

  1. This is a Mac app built with AppKit targeting macOS 13+.

  2. It's an enterprise application for a company in Hollywood.

  3. The closest analogy is iTunes: the app holds about 30 pieces of metadata (artist, album, etc.) for roughly 1 million audio files.

  4. That master list of audio files is then used to track when a particular file appears in a project (and for how long it appears) so that the studio can determine licensing fees to be paid. There are about 10,000 "projects" right now and each links to about 800-1000 audio files.

  5. There are about 20 employees who all use this app simultaneously and it is not possible for me to narrow down what data is loaded--ALL of the audio files have to be loaded because when a new project is added, we need to scan all of the existing files to see if we already have a record for a given file that's used in the project. All projects must be loaded because we display them in a giant outlineView on the left side of the window.

  6. On iOS, where screens are small and we don't display much at a time, I gather we'd load only a subset of the data--grab only projects to list until a user taps something, then slide over and load more data for whatever he tapped. Unfortunately, on Mac, that's not how UIs work so there is less ability to "segment" the loads.

  7. A design requirement for this app was real-time sync. It's supposed to be like a giant version of iTunes where 20 people are editing metadata on tracks or adding new playlists and all that work should show up live, in realtime, for everyone--like a Google Doc.

  8. Once the database is loaded, Realm works great. It's just the initial bootstrapping that faces a long delay.

from realm-swift.

tgoyne avatar tgoyne commented on June 20, 2024

Progress notifications for flexible sync downloads are in progress. The reason we originally didn't have them is that the server is just streaming query results to the client, rather than buffering the full result set in memory on the server before sending them. The server has now implemented estimated progress information for this, and we're working on surfacing it in the client.

We're planning on eliminating the separate download and application steps for query bootstraps, at least in the async open case. This would make the whole thing quite a bit faster, and relevant to this issue, make it so that the download progress notifications cover everything you need. We probably need to keep the separate download and apply steps for changes to subscriptions while the app is running, so that won't completely remove the need for bootstrap application progress notifications.

from realm-swift.

bdkjones avatar bdkjones commented on June 20, 2024

@Jaycyn regarding asyncOpen(): I'm aware of the SwiftUI property wrapper (covered at https://www.mongodb.com/developer/products/realm/realm-asyncopen-autoopen/), but this is not a SwiftUI app. Moreover, I think that progress case applies only during data download, not during bootstrapping?

The end goal is to disallow users from editing data in the app unless the model is up-to-date with the latest changes. This is a desktop app where the Mac is virtually guaranteed to be online (unless a server is down, etc.).

Suppose a project named "Top Gun" exists. If we open the Realm with empty data and it takes a few minutes to sync everything down in the background, we don't want the user to go, "Oh, the Top Gun project is missing. I'll just create it again." Then, once sync finishes, we have two "Top Gun" projects that we now have to merge/de-dupe. This is my use-case for asyncOpen(): I want the full local Realm populated before the user starts making changes.

from realm-swift.

Jaycyn avatar Jaycyn commented on June 20, 2024

I see and thanks for the explanation - we are a macOS developer and no SwiftUI so we understand - please bear with my question.

I am not really clear on the difference (speed wise) between the bootstrapping and downloading.

We've used await code (previously a closure) that shows a spinner while downloading data with a 'please wait' message which doesn't allow the user to create anything new while downloading, just browse existing data.

let realm = try await Realm(configuration: config, downloadBeforeOpen: .always)

once that completes, the data is fully downloaded and the user can then create objects. While not ideal for the user, it at least avoids them creating duplicate objects.

I am no longer finding references or examples to asyncOpen in the documentation, other than in the previously mentioned context of the SwiftUI wrapper and a blurb in the API.

That then begs the question; what about AsyncOpenTask? Which states

This task object can be used to observe the state of the download or to cancel it. This should be used instead of trying to observe the download via the sync session as the sync session itself is created asynchronously, and may not exist yet when Realm.asyncOpen() returns.

So... and this is more of a @tgoyne question; is AsycOpenTask not functional? Or will that be the implementation with asyncOpen in the future. Or is the await the way to go?

Asking questions as while the delays we experience are handled via the above, as the dataset and app size grows, so will the delays so what's the best practice here - the workaround provided in the original post?

from realm-swift.

bdkjones avatar bdkjones commented on June 20, 2024

@Jaycyn yea, in my case the download is pretty quick: 20 seconds or so. It's the bootstrapping that requires MINUTES to complete. I don't think that progresstask tracks the bootstrapping process after the actual bytes have finished downloading from Atlas.

@tgoyne mentioned these two steps are going to be combined. At that point, the progresstask might be useful.

from realm-swift.

bdkjones avatar bdkjones commented on June 20, 2024

@tgoyne In the meantime, what can I do to minimize the number of changesets that must be bootstrapped? I've lowered the client-reset window to 5 days on Atlas so the app isn't keeping 30 days' of sync events. Do many small write transactions (as opposed to fewer, larger write transactions) produce more changesets? If there's something I can do to optimize/reduce these, I'm game.

from realm-swift.

tgoyne avatar tgoyne commented on June 20, 2024

The changesets in the query bootstrapped phase are synthesized history from the backing mongodb data and nothing you do other than change how much data you're subscribing to will change how many of them there are. It's only after bootstrapping is complete that you start receiving changesets from other clients. The raw number of changesets is also not super meaningful, as we apply the changesets in batches to reduce the per-transaction overhead. The batch size is currently only 1 MB which is probably too small, but increasing that to a much larger number would probably be something like a 10-20% speedup and not something which makes it not painfully slow.

from realm-swift.

nirinchev avatar nirinchev commented on June 20, 2024

Hey, so I'll close this as a duplicate of #8476 for the progress notifications case. It's something we've been working on for a while server-side and hope to expose support for it on the client in the near future.

For the actual changeset application speedup, you can subscribe to realm/realm-core#7285 for updates. This is more of a medium-term project with a lot of moving parts though, so I imagine it'll take a bit longer to land.

from realm-swift.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.