Code Monkey home page Code Monkey logo

Comments (13)

jasonsparc avatar jasonsparc commented on July 28, 2024 2

Hi @endel and @igor84

Although, I'm not really an expert about the algorithm, nor had I performed benchmarks, but I think lines 83 to 87 in, 264ea17#diff-993db0c7e13460f7683a59bed31a7144R83-R87
(of PR #4) …will not actually provide a performance gain, since that actually adds an unnecessary check (the if statement at line 83) within the supposedly hot-loop. Therefore, if that condition is never executed, it's just wasted CPU cycles. But, that's just my naive observations though, and I might be pitifully incorrect. And if so, please forgive me for my naiveness, and freely criticise me as you wish (but please don't murder me 😅).

However, if you're also thinking the same way as mine, or think that the optimization might not be worth it at all, then I would strongly vote that lines 83 to 87 of PR #4, be reverted instead.

from fossildelta.

israellot avatar israellot commented on July 28, 2024 2

That would need a proper benchmark.
On mine usually, performance would stay the same as you notice, but on use cases that affected me it would speed up delta generation quite a lot.
But hey, revert if you will, it's a good implementation either way. Cheers

from fossildelta.

igor84 avatar igor84 commented on July 28, 2024 2

Good to hear @jasonsparc :). It is done just for Delta.Apply function, not for Delta.Create but we found that useful because we are applying patches in the app, while we only create them at build time, so hopefully others will find it useful as well. It is implemented and I'll submit a pull request as soon as this one is merged.

from fossildelta.

israellot avatar israellot commented on July 28, 2024 1

Thanks for the feedback @jasonsparc !
You made a fair observation, my understanding is that it all depends on the data.
If dealing with large data and small changes, that check jumps a lot of unnecessary hashing. That's due to the fact that if I know I am inside the region that is evaluated already as the best one, I can simply skip all the landmarks inside it since they will only lead to the exact same region being found.
Reversing the landmarks leads to bigger matches being found first, which opened the possibility for this second optimization. The real hot path for this code isn't really that loop per se, but the hashing, so less hashing less cpu spent.
Now, that may only pay off if your original and target data diff is small enough to produce large matching regions. This algorithm serves the purpose of creating deltas, which inherently means small changes over previous data, so I believe that's a fair compromise.

Regarding the unit test, the deltas are compatible as the format hasn't changed, or should be. I believe @igor84 was pointing out that the baked data used to compare against is different from the delta generated after the PR. It might be interesting trying to check why that is and what exactly is different from one output to another. I remember doing these cross-checks though, applying deltas from previous version on new one and vice-versa. Maybe something slipped.

from fossildelta.

endel avatar endel commented on July 28, 2024

Hi @igor84, thanks for diving deep on this. I wasn't aware that #4 had introduced this breaking change. Maybe it makes more sense to revert that, since the deltas should be compatible with the encoding/decoding of other implementations (JavaScript, LUA, etc)

from fossildelta.

igor84 avatar igor84 commented on July 28, 2024

Hi @endel, there is one more option. We can add a bool parameter to Delta.Create called useNonStandardOptimizations or something like that and only do this new optimization if that is true. It is up to you, so let me know if you would like that or if you still prefer the revert and I'll make the pull request.

from fossildelta.

endel avatar endel commented on July 28, 2024

@israellot thoughts? 👀

from fossildelta.

israellot avatar israellot commented on July 28, 2024

Just to complement, I believe the deltas generated by the previous version could be different from the deltas generated by the new version, but that doesn't mean they aren't compatible, it only means one has a different set of commands relative to the other, or commands in a different order as the search order is different on both versions.

from fossildelta.

igor84 avatar igor84 commented on July 28, 2024

@israellot you right, deltas are different but they still work.

I tested with and without these changes using files from 60KB to 21MB (with generated deltas ranging from 2KB to 20MB) and difference in speed is somewhere positive, somewhere negative but either way it is negligible (bellow 1%) so I will submit a pull request reverting all the changes. That way deltas will remain the same as in other tools.

from fossildelta.

igor84 avatar igor84 commented on July 28, 2024

Sorry, I messed up the pull request. I wanted to submit another one and it got merged into existing pull request. How can I fix that?

from fossildelta.

jasonsparc avatar jasonsparc commented on July 28, 2024

@igor84 You could git rebase -i <to-prev-commit-hash>, then force push, and GitHub should recognize the change. I know because I used to do something like that. But it depends on the repo settings if force-push are allowed.

I believe, by default, force push is disabled only for the master branch on GitHub. So, you could perhaps disable that first for the master branch of your fork, then force push.

from fossildelta.

igor84 avatar igor84 commented on July 28, 2024

Thanks @jasonsparc. It seems that fixed it.

from fossildelta.

jasonsparc avatar jasonsparc commented on July 28, 2024

@igor84 btw, I'm excited about that Stream support from that commit I just saw (before the rebase removed it), and I can't wait already for it to be implemented.

I hope to see it implemented some time soon. 😄

from fossildelta.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.