Code Monkey home page Code Monkey logo

Comments (8)

GoogleCodeExporter avatar GoogleCodeExporter commented on May 23, 2024
Word-by-word diffing is not a native function of this library.  However, it is 
easy 
to do.  You need to break your text into words (how you define a word is a more 
interesting problem than you might think), create a lookup table of Unicode 
characters to words, build two strings made up of the Unicode characters 
associated 
with each word, Diff those two strings, then convert the diff back into the 
text.

Sounds complicated, but it's not -- because the code has already been written 
for 
you.  Just look at the diff_linesToChars and diff_charsToLines functions.  Copy 
them 
and make them split on words instead of characters.  Then your code will just 
be:

  Object b[] = diff_wordsToChars(text1, text2);
  String wordText1 = (String) b[0];
  String wordText2 = (String) b[1];
  wordarray = (ArrayList<String>) b[2];
  LinkedList<Diff> diffs = diff_main(wordText1, wordText2, false);
  diff_charsToWords(diffs, wordarray);

Have fun defining what a "word" is.  Been there, done that on another project.  
:)

Original comment by [email protected] on 4 Jul 2009 at 2:24

  • Changed state: WontFix
  • Added labels: Type-Enhancement
  • Removed labels: Type-Defect

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 23, 2024
Thanks for your input Neil. I will try that.

Original comment by [email protected] on 6 Jul 2009 at 5:24

  • Added labels: ****
  • Removed labels: ****

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 23, 2024
I need this enhancement too:)

Original comment by [email protected] on 17 Apr 2010 at 12:31

  • Added labels: ****
  • Removed labels: ****

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 23, 2024
I implemented the word-by-word (yes, is was easy), and it does a pretty good 
job just
tokenizing spaces and newlines.
Something like:
  wordEndSpace = text.indexOf(' ', wordStart);
  wordEndNewline = text.indexOf('\n', wordStart);
  wordEnd = Math.min(wordEndSpace, wordEndNewline);    
That will do the trick effectively. You could of course do an nicer array 
version, if
you have more matches (punctuation etc). Or perhaps regexp as well.
I guess the reason the simple version works well for me, is that the text is
preprocessed (from HTML) is a rather cool way, so whitespace in the text 
matches HTML
rendering quite close.

Thanks for a great little piece of code, Niel.

Regards,
Mads Buus Westmark

Original comment by [email protected] on 20 Apr 2010 at 11:45

  • Added labels: ****
  • Removed labels: ****

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 23, 2024
I need this feature also.

Original comment by [email protected] on 25 Jun 2011 at 11:26

  • Added labels: ****
  • Removed labels: ****

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 23, 2024
[deleted comment]

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 23, 2024
[deleted comment]

from google-diff-match-patch.

GoogleCodeExporter avatar GoogleCodeExporter commented on May 23, 2024
hi,
diff_linesToChars  functions having LinesToCharsResult as a return type. 
Is there any changes required for diff_wordsToChars() ?

 Object b[] = diff_wordsToChars(text1, text2);
  String wordText1 = (String) b[0];
  String wordText2 = (String) b[1];
  wordarray = (ArrayList<String>) b[2];
  LinkedList<Diff> diffs = diff_main(wordText1, wordText2, false);
  diff_charsToWords(diffs, wordarray);

Using the diff-match-path class,we can get the character comparison not a word 
comparison. what are all the changes required for the Word comparison?

Thanks for advance

Original comment by [email protected] on 17 Nov 2011 at 7:06

  • Added labels: ****
  • Removed labels: ****

from google-diff-match-patch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.