Comments (8)
Word-by-word diffing is not a native function of this library. However, it is
easy
to do. You need to break your text into words (how you define a word is a more
interesting problem than you might think), create a lookup table of Unicode
characters to words, build two strings made up of the Unicode characters
associated
with each word, Diff those two strings, then convert the diff back into the
text.
Sounds complicated, but it's not -- because the code has already been written
for
you. Just look at the diff_linesToChars and diff_charsToLines functions. Copy
them
and make them split on words instead of characters. Then your code will just
be:
Object b[] = diff_wordsToChars(text1, text2);
String wordText1 = (String) b[0];
String wordText2 = (String) b[1];
wordarray = (ArrayList<String>) b[2];
LinkedList<Diff> diffs = diff_main(wordText1, wordText2, false);
diff_charsToWords(diffs, wordarray);
Have fun defining what a "word" is. Been there, done that on another project.
:)
Original comment by [email protected]
on 4 Jul 2009 at 2:24
- Changed state: WontFix
- Added labels: Type-Enhancement
- Removed labels: Type-Defect
from google-diff-match-patch.
Thanks for your input Neil. I will try that.
Original comment by [email protected]
on 6 Jul 2009 at 5:24
from google-diff-match-patch.
I need this enhancement too:)
Original comment by [email protected]
on 17 Apr 2010 at 12:31
from google-diff-match-patch.
I implemented the word-by-word (yes, is was easy), and it does a pretty good
job just
tokenizing spaces and newlines.
Something like:
wordEndSpace = text.indexOf(' ', wordStart);
wordEndNewline = text.indexOf('\n', wordStart);
wordEnd = Math.min(wordEndSpace, wordEndNewline);
That will do the trick effectively. You could of course do an nicer array
version, if
you have more matches (punctuation etc). Or perhaps regexp as well.
I guess the reason the simple version works well for me, is that the text is
preprocessed (from HTML) is a rather cool way, so whitespace in the text
matches HTML
rendering quite close.
Thanks for a great little piece of code, Niel.
Regards,
Mads Buus Westmark
Original comment by [email protected]
on 20 Apr 2010 at 11:45
from google-diff-match-patch.
I need this feature also.
Original comment by [email protected]
on 25 Jun 2011 at 11:26
from google-diff-match-patch.
[deleted comment]
from google-diff-match-patch.
[deleted comment]
from google-diff-match-patch.
hi,
diff_linesToChars functions having LinesToCharsResult as a return type.
Is there any changes required for diff_wordsToChars() ?
Object b[] = diff_wordsToChars(text1, text2);
String wordText1 = (String) b[0];
String wordText2 = (String) b[1];
wordarray = (ArrayList<String>) b[2];
LinkedList<Diff> diffs = diff_main(wordText1, wordText2, false);
diff_charsToWords(diffs, wordarray);
Using the diff-match-path class,we can get the character comparison not a word
comparison. what are all the changes required for the Word comparison?
Thanks for advance
Original comment by [email protected]
on 17 Nov 2011 at 7:06
from google-diff-match-patch.
Related Issues (20)
- Getters for fields in Java version for integration with Freemarker
- Levenshtein maximum distance is greater than length of both strings HOT 1
- Substring length check missing in C# implementation
- javascript diff_cleanupSemantic uses negative indexes in the equalities array HOT 1
- diff_prettyHtml output hard-codes color for <ins> and <del> HOT 1
- C# uses \n instead of \n\r or Environment.NewLine
- c# patch_toText + patch_fromText doesn't work
- Ruby port
- performance slow?
- NewLines appear broken in patches (Python 3, Django 1.6.1) HOT 2
- Patch for /trunk/python3/diff_match_patch.py
- Patch for /trunk/python3/diff_match_patch.py
- Uninitialized string offset: 0 (function diff_cleanupSemanticLossless)
- Text containing HTML HOT 1
- Consider SQLCLR compatibility / eliminate dependency on System.Web for UrlEncode and UrlDecode HOT 3
- xIndex for instertion after location
- Demo pages not working HOT 4
- Levenshtein distance problem
- objc version generates wrong diffs
- When is this project transferred to github? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from google-diff-match-patch.