Code Monkey home page Code Monkey logo

Comments (13)

ViktorQvarfordt avatar ViktorQvarfordt commented on May 26, 2024 1

The whole thing with sending a "revert operation" may be (unnecessarily) complex. It would suffice to simply terminate the client connection. The client should be informed of the reason for disconnect so the client can handle the error gracefully (perhaps clearing the local state and reconnecting). This approach guarantees that no invalid operations get transmitted to other clients.

Taking this line of reasoning further: Validating on each change may be very costly if changes come often and the document is large. Instead, we are considering validating just before we persist the state. This operations is reasonably debounced. If a validation happens here the entire ydoc is reverted to the last saved state. This approach allows invalid operations to be transmitted to clients for a short period of time (until the backend validation kicks in and resets the ydoc). This case should be thought of as an edge case that doesn't really need to be optimized further, since one can run the same validation code as the backend on every client 1) before sending data and 2) after receiving data. (Clients may run on different versions, so it is good practice to validate at every step.)

from hocuspocus.

georeith avatar georeith commented on May 26, 2024 1

@ViktorQvarfordt currently UndoManager is the biggest cause of issues for me: yjs/yjs#317

But also getting some corruption issues from latest versions of HocusPocus (I'm going to open a ticket about them shortly but they are quite hard to debug).

Sounds interesting would be interested to see it (your state management).

We actually moved from Redux to Mobx (because it works better with mutability but is also a lot faster which matters a lot to our application) and have a Proxy layer on-top of YJS that lets us edit it like a regular JS object (but shares the same memory as the YJS document so no increase in size) and then Mobx bindings that detect reads from the Proxy layer and writes using YJS observe functions, I intend to share that Proxy layer and Mobx binding once we've released too but its working very well for us and is most importantly very fast.

It's actually more of a hybrid approach as we're still using our redux reducers (they just mutate the store instead of doing immutable updates) and store (because I haven't wanted to switch them all over yet) just not using react-redux or any subscriptions on redux itself.

from hocuspocus.

hanspagel avatar hanspagel commented on May 26, 2024 1

@georeith A lot of exciting stuff! Feel free to create the data corruption issue, even if you don’t find time to debug it in detail. Happy to look into that! Got a probably related issue in my inbox already, maybe it’ll help to have more parts of the puzzle.

from hocuspocus.

ViktorQvarfordt avatar ViktorQvarfordt commented on May 26, 2024

I've been thinking about this as well, I was just about to open an issue but then I found this, so I add my notes here to continue the conversation.

The problem I am facing
I want to validate changes. For example, maintain valid reference ids or constrain numbers to certain values.

The solution I would like
Introduce onValidate hook that runs before onChange. If an error is thrown in that hook, the change is rejected and the client that sent the operation receives a revert operation.

This can be achieved by, before calling onValidate, creating a copy of the document, applying the operation and passing the new version of the copied document to onValidate. If the hook does not throw, then the operation is applied to the real document.

Alternatives I have considered
Dealing with invalid state in the application code. For example, normalizing the data such that invalid references are removed or numbers are clamped into valid ranges.

Additional context
The validation code used in onValidate should be used on the client before submitting an operation, to avoid that user temporarily being in an invalid state until hocuspocus sends back a revert operation (that is, if the operation yielded an invalid state). Still, the validation needs to be on the backend because one can never trust a client.

from hocuspocus.

georeith avatar georeith commented on May 26, 2024

@ViktorQvarfordt agreed around the revert operation I have actually implemented some of this since and have reached the same conclusion that it is not necessary.

I have a custom Redis adapter that wraps y-redis and disables the automatic persistence when the document updates. Instead I build up a queue of updates using async#cargo(), each time it finishes a write it takes the current queue, validates the document, if it passes it merges the updates together and persists them, if it fails it does not allow it into Redis.

Long term storage just validates the document and stores a snapshot of it on a larger interval.

On the client I am using the same validation function but not on each change which is far too expensive for my application (have a target of 60FPS alongside a constant stream of changes) but I do validate before loading out of indexeddb (and if its invalid I discard it and load from the server) or if the application crashes (to detect whether the application crashed because of a corrupt document) in which case I reload the application with the state from the server (after a small delay to let the server drop its connection).

Currently I am not forcefully closing the connections, but that is a next step, if one client errors they all error and the connection closes automatically so am relying on that in the short term.

Problems I'm still facing:

  • Validation is too expensive to do on every change, at least for me my documents can grow big, and even a 10ms validation can create a bottleneck (multiple users working in realtime). This presents the issue that it isn't possible to validate changes before sending them out to other users without lagging realtime.
  • Even if your validation is not too expensive to do on every update, there isn't a way of doing it currently before sending it out to other users. As far as I can tell this would require keeping the document twice in memory, loading it into a staging document first, validating that and if it passes loading the update into the main document.

Dealing with invalid state in the application code. For example, normalizing the data such that invalid references are removed or numbers are clamped into valid ranges.

Just a note on this but we have found most of our corruptions to be coming from bugs in Yjs or layers on top and not to do with the data the client itself set. I think even if you implement this you're still susceptible these corruptions.

from hocuspocus.

ViktorQvarfordt avatar ViktorQvarfordt commented on May 26, 2024

Nice, thanks!

Quite worrying to hear about the bugs in Yjs and the layers on top. What have you had the most problems with?

With regards to "validation by normalization", the use case I'm planning for is to have yDoc → toJSON() "raw js object" → normalize() "normalized js object", for the application to consume. And client updates happen similarly: js object → clever recursive patching of the yDoc. In this way we have part of our Redux state entirely real-time synced with very little additional code. Planning to share this code when we've stabilized a little bit, if you're curious to see more of it.

from hocuspocus.

mjamesderocher avatar mjamesderocher commented on May 26, 2024

@georeith I'd be very interested in the proxy layer when you release it!

from hocuspocus.

georeith avatar georeith commented on May 26, 2024

@mjamesderocher will let you know when it comes out. FYI we aren't using Y.Text only Y.Map and Y.Array but I don't think it would be too hard to extend it to Y.Text if you are using that, it just has a less obvious representation in a JSON like object.

from hocuspocus.

YousefED avatar YousefED commented on May 26, 2024

@mjamesderocher just stumbled on this thread. I released a similar proxy layer here: https://syncedstore.org/ (specifically in this package: https://github.com/YousefED/SyncedStore/tree/main/packages/yjs-reactive-bindings)

from hocuspocus.

kongweiying2 avatar kongweiying2 commented on May 26, 2024

@YousefED I'm getting this issue as well. This has happened quite often where I'm editing an Extension by say, changing it from a Mark to a Node. From what I understand, every time the editor is loaded, it will actually parse the content and modify it based off the state of the current extensions. Unfortunately, this seems to corrupt the document and there's no way to recover from this.

Have you managed to find a solution to invalid content like this?

from hocuspocus.

janthurau avatar janthurau commented on May 26, 2024

Hi all!

Just wanted to see if this is still current and what would be an expected behavior / feature of Hocuspocus. How exactly do documents get "corrupted"? Is this always connected with Tiptap, or do the yjs docs itself break?

If the issues are related to Tiptap, I'd assume that this has to be solved on the user-side (so if Tiptap<>Hocuspocus, by Tiptap (as the issue is the same if you just store the tiptap json a regular database), see also ueberdosis/tiptap#3437.

We could support on Hocuspocus side by adding a feature that requires a specific client version before allowing a connection, but this is also something that can be easily implemented already using an onConnect or onAuthenticate hook.

LMK what you think, if this should be solved on user-side, I'll close this.

from hocuspocus.

kongweiying2 avatar kongweiying2 commented on May 26, 2024

@janthurau

I don't think this should be solved user-side, it should be a function of the library (Hocuspocus and Collab). So for example, the two main problem areas are:

  1. Preventing the document from getting into a bad state by checking before invalid changes are merged. This happens when two editors with the same node extension type, but different implementations of that same type, sync with each other. What should happen is that changes should be reverted to the previous valid state.
  2. If the document has invalid data for nodes (because maybe the document is malformed, or the current extension for that node is incompatible with the data in the doc which was based off an older extension), then gracefully catching these errors and giving an interface to take that data and produce a new valid node. Currently, TipTap seems to eat up the old node irreversibly such that even if you revert to an older version of the extension, the node still doesn't show. So this is functionally the same as data corruption. Until there's a version history extension to be able to revert to older document states, your old data is essentially unaccessible (even though technically it's still there in the form of the binary deltas).

e.g.

data_t1 extension_t1 Everything works fine
data_t1 extension_t2 Extension to handle this node was updated, doesn't handle the data at t1
data_t2 extension_t2 Extension incorrectly parses data_t1 into data_t2 and data_t2 is unrevertable

Note that the node associated with data_t1 seems to be permanently inaccessible, even if we revert to the extension at t1. I originally thought that TipTap would just skip over the invalid data but it doesn't seem to do that. It seems like it attempts to parse the data but ends up corrupting it and excluding it from the document entirely.

I don't believe this is a y.js problem. Rather, the syncing is perfect, but the issue is that TipTap doesn't handle the evolution of existing nodes gracefully. When TipTap parses these nodes, it'll do something dodgy and then perfectly propagate the corrupted version of the file through Collab.

from hocuspocus.

timoisik avatar timoisik commented on May 26, 2024

@kongweiying2, thanks for your thoughts on this.

What you describe mainly concerns your very individual editor setup with your custom nodes. Hocuspocus does not know how your editor setup looks like and therefore cannot check if it is corrupt.

If, then hocuspocus will only offer a general solution, as described above, an onValidate hook, or as described by Jan, a version field.

But as Jan also described, this could be caught already in the onConnect hook. An individual editor schema has to be validated client-side. Especially since Hocuspocus is not a tiptap editor-only collaboration backend. :)

Open for more suggestions.

from hocuspocus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.