Code Monkey home page Code Monkey logo

Comments (9)

arrbee avatar arrbee commented on August 22, 2024

Good question. I suppose no and yes, but probably more no than yes.

When I started this, my goal was to get it to a point that it could be a libxdiff replacement for the main project I work on (i.e libgit2), but I never really got it to that point and it has become lower priority on my todo list.

That being said, I'm still interested and if there was someone else also interested, I'd be happy to work together to add functionality and get it in a better state.

It looks like there is one fork where some further work has been done (https://github.com/ggreer/diff-match-patch-c) and I'd be happy to merge pull requests if @ggreer wants to submit them!

from diff-match-patch-c.

sebfischer83 avatar sebfischer83 commented on August 22, 2024

Hi,

thanks for your quick reply.
I'm interested in helping a bit but I have to say my C knowledge is not very huge... I'm coming from the .Net and Objective-C world more.
But if I can help a bit I would try some stuff.

from diff-match-patch-c.

ggreer avatar ggreer commented on August 22, 2024

Instead of using this library, I ended up embedding Lua into my C program and using the Lua version of diff-match-patch.

My changes to diff-match-patch-c were just minor things like fixing the build on Ubuntu 12.04. I didn't add any features or fix any tricky bugs.

from diff-match-patch-c.

sebfischer83 avatar sebfischer83 commented on August 22, 2024

@ggreer Thanks for your fast answer, the problem is I need a plain C solution so Lua isn't an option for me.

from diff-match-patch-c.

Varriount avatar Varriount commented on August 22, 2024

I wouldn't mind helping complete this library. Though my experience is mainly centered around python and java, I do know how to read C, and figure that working on a project such as this would be a good learning experience.

I mainly need this library for integration into a python/pypy application.
Currently I'm making heavy use of the pure python version of diff-match-patch for a text collaboration server, and the overhead of the library leaves something to be desired - with CPython, the library is too slow, and with pypy the library is too memory intensive.

I would greatly appreciate continued work on this project.
If nothing else, could you please comment/explain:

  • Which parts of DMP are currently implemented, and which parts aren't.
  • What currently does and doesn't work.
  • What dmp_pool and the other secondary structures are/do.

Edit:
Oh, and how unicode-compatible is this library currently?

from diff-match-patch-c.

arrbee avatar arrbee commented on August 22, 2024

@Varriount Sure thing! I'm glad you're interested.

Of the diff-match-patch code, so far I've only implemented the basic string-to-string diff and even for that, I've omitted little bits here and there. The core Myers diff is implemented as are many of the optimizations, but things like deadlines and such are not. None of the match and patch code is implemented at all.

Based on the very limited tests that I've written, I think the basic diff works. Nothing else is implemented so that's pretty simple. The dmp_options struct is largely copied from the upstream code base and almost none of the actual option values are hooked up.

The core diff code generates a dmp_diff object. Internally, that sorts a singly linked list of dmp_node structures which represent the spans in the data (i.e. a range of shared bytes, a range of inserted bytes, a range of deleted bytes). The nodes are a little funky compared to a tradition linked list implementation because instead of storing a next pointer, they store the index of the next element. This is possible because the nodes are actually allocated as one big block of data instead of using individual allocations - the big block of data is the dmp_pool. It is mainly there to provide efficient allocation of the small dmp_node structures.

In retrospect, the dmp_pool stuff is probably not written the way I would implement it if I were starting now, but at this point, cleaning that up is not really the highest priority issue in this code. It does serve to keep the number of actual allocations quite low and the memory usage fairly efficient.

Regarding unicode compatibility, the current version of this code diffs based on byte ranges. None of the upstream diff cleanups are implemented (such as converting the byte-range diffs to line-oriented diffs). I believe that handling unicode should be done as a post byte-diff cleanup, realigning diff span boundaries to match unicode character boundaries. That being said, I haven't really looked at that deeply nor am I much of an expert on unicode.

By the way, I may have a reason to pick this code back up again and move it forward some more. It certainly helps to know that there is still some interest.

from diff-match-patch-c.

sebfischer83 avatar sebfischer83 commented on August 22, 2024

@Varriount I'm still interested too in this project maybe there is a chance to move it forward.

from diff-match-patch-c.

Varriount avatar Varriount commented on August 22, 2024

If it helps, I have a Windows 8, 64 bit dev machine, with both Visual Studio and Mingw64 installed.

I'm currently trying to get Visual Studio to compile the source into a DLL (Surprisingly, Mingw64's gcc compiled it without a hitch, usually it's the other way around.)

from diff-match-patch-c.

arrbee avatar arrbee commented on August 22, 2024

I did a little bit of tweaking to the project organization - hopefully it won't mess you up too much.

Right now, the coding conventions are very similar to those of libgit2 because that's what I spend much of my time working on and it was just easy to stick with those. If you want to help, you may want to read the conventions for that project.

from diff-match-patch-c.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.