Code Monkey home page Code Monkey logo

google-diff-match-patch's People

Contributors

seanshou avatar

Watchers

 avatar

google-diff-match-patch's Issues

Copyright year is 2006

What steps will reproduce the problem?
1. Look at the source code

What is the expected output? What do you see instead?
Given the datestamp in the filename, I would assume that the copyright
notice would be dated 2008 instead of 2006.
Note that the JavaScript file hasn't got any copyright notice at all.

What version of the product are you using? On what operating system?
diff_match_patch_20080426.zip

Please provide any additional information below.
This is obviously a purely cosmetic issue.

Original issue reported on code.google.com by [email protected] on 29 Apr 2008 at 7:56

Python implementation cannot be imported as a module

Please provide any additional information below.

First of all, thank you for your great work on this library.

I would like to be able to import the Python implementation as a module so
that I can keep a checkout of the python folder in our common directory
(which is on every system's Python path). If it functioned as a module,
then I wouldn't have to modify all Python paths and treat diff_match_patch
as an edge-case (it could just be checked out directly in the common, or
any folder on the path, and it will Just Work). This will also work for
anyone else who has a common directory already on their path and wants to
use diff_match_patch without modification of their path or creation of
symlinks.

Fix:

Create __init__.py in the python directory, with this inside:

from .diff_match_patch import diff_match_patch, patch_obj

As I understand it (via http://www.python.org/dev/peps/pep-0328/), this
relative import syntax should be compatible with 2.4 or greater. Thanks!

Original issue reported on code.google.com by [email protected] on 28 Aug 2009 at 4:29

where is Diff class for java?

What steps will reproduce the problem?
1. download diff_match_patch.java
2. compile it

What is the expected output? What do you see instead?
expected: compiled
I see: no Diff class



Original issue reported on code.google.com by [email protected] on 13 Jun 2007 at 3:34

Would you consider creating a C# version?

 Awesome piece of implementation.

I would love to create the C# version myself but being ignorant about the
algorithms I could make silly mistakes.

Would you consider creating a C# version?

Original issue reported on code.google.com by [email protected] on 6 Feb 2008 at 4:49

Bad space/time complexity on non-trivial files

What steps will reproduce the problem?
1. Download the attached sample files (two whitespace-simplified versions
of a generated source file in the Hadoop project).
2. Use the library to compute the diff. The diff timeout would have to be
increased significantly or set to 0.0f.

What is the expected output? What do you see instead?
GNU diff computes a diff with about 1260 edit steps in 0.125s on my
machine. diff-match-patch with the diff timeout removed fails to terminate
in both its C++ and Java versions, consuming all available system memory.

What version of the product are you using? On what operating system?
Latest svn trunk on GNU/Linux amd64.

Please provide any additional information below.
The attached files are just an extreme example; I have also found it
infeasible to compute a diff between two files of about 2000 lines each in
Java with a 2G heap. The timeout prevents all memory from being used, but
results in a trivial "delete A, insert B" diff, which is not useful.

Original issue reported on code.google.com by [email protected] on 20 Jan 2010 at 12:03

Attachments:

Python UnicodeDecodeError when using Cyrillic (not ascii?) chars

[code]
# -*- coding: utf-8 -*-

from diff_match_patch import diff_match_patch


dmp = diff_match_patch()

str1 = """Привет!"""
str2 = """Привет and Welcome!"""

patches = dmp.patch_make(str1, str2)
#print dmp.patch_toText(patches)

print dmp.patch_apply(patches, str1)[0]
[\code]

$ python dmp.py
Traceback (most recent call last):
  File "dmp.py", line 14, in <module>
    print dmp.patch_apply(patches, str1)[0]
  File "/data/Coding/Python/diff_match_patch.py", line 1401, in patch_apply
    text = nullPadding + text + nullPadding
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0:
ordinal not in range(128)

Original issue reported on code.google.com by [email protected] on 15 May 2008 at 6:44

Pattern too long for this browser

What steps will reproduce the problem?
     var dmp = new diff_match_patch();
     var last = 'abcdefghij , h : 1 , t : 1 abcdefghij , h : 1 , t :
1 abcdefghij , h : 0 , t : 1';
     var current = 'abcdefghij , h : 0 , t : 1 abcdefghij , h : 0
, t : 1 abcdefghij , h : 0 , t : 1';
     var patches = dmp.patch_make(current, last);
     var mod_current = 'abcdefghij , h : 0 , t : 1 abcdefghij , h : 1
, t : 1 abcdefghij , h : 0 , t : 1';
     var res = dmp.patch_apply(patches, mod_current);

What is the expected output? What do you see instead?
Expect patch to succeed or fail.  Actually it throws an error "Pattern too
long for this browser".  Only affects JavaScript version (the bug is also
in the Python version but is never expressed).

Fathei Ali reported this error and I tracked it down to an indexing bug in
patch_splitMax.  Most of the time this would have no effect, but very
occasionally it fails to split a long patch.

A new version has been pushed which corrects this bug and unit tests have
been added in all languages for verification.

Original issue reported on code.google.com by [email protected] on 4 Dec 2008 at 5:50

diff_cleanupSemantic doesn't always cleanup

Usually diff_cleanupSemantic does an excellent job :-).
However, in this case it appears to have a problem.

What steps will reproduce the problem?
In JavaScript (with default settings):
var a='\tS += "</table><pre style=\'display:none\'>";\n'+
  '\tS += text.replace(/>/g, \'&gt;\');\n'+
  '\tS += "</pre></li></ul></div>\\n";';
var b='\n'+'\tt = lines.join(\'\\n\').replace(/>/g, \'&gt;\');\n'+
  '\tS += "</table><pre style=\'display:none\'>".concat(t, "</pre></li></ul></div>\\n");';
var DMP = new diff_match_patch;
var d=DMP.diff_main(a,b);
DMP.diff_cleanupSemantic(d);
for (var i in d) print('{',d[i][0],', "',d[i][1],'"}');

The output is as follows (which is basically the same as with no cleanup):
{ -1 , "    S "}
{ 1 , " 
    t "}
{ 0 , "   "}
{ -1 , " + "}
{ 0 , " =  "}
{ -1 , " "</tab "}
{ 0 , " l "}
{ 1 , " in "}
{ 0 , " e "}
{ -1 , " ><pre  "}
{ 0 , " s "}
{ -1 , " tyle='d "}
{ 1 , " .jo "}
{ 0 , " i "}
{ -1 , " splay: "}
{ 0 , " n "}
{ -1 , " o "}
{ 1 , " ('\ "}
{ 0 , " n "}
{ -1 , " e "}
{ 0 , " ' "}
{ -1 , " >";
    S += text "}
{ 1 , " ) "}
{ 0 , " .replace(/>/g, '&gt;');
    S +=  "}
{ 1 , " "</table><pre style='display:none'>".concat(t,  "}
{ 0 , " "</pre></li></ul></div>\n" "}
{ 1 , " ) "}
{ 0 , " ; "}

What version of the product are you using? On what operating system?
diff_match_patch_20080520.zip on Mac OS X 10.4.7

Please provide any additional information below.
Deleting the '\n' at the beginning of var b reduces the diff from 31 to 9 
elements. As follows:
{ 0 , "      "}
{ -1 , " S += "</table><pre style='display:none'>";
    S += text.replace(/>/g, '&gt;');
    S += "}
{ 1 , " t = lines.join('\n').replace(/>/g, '&gt;');
    S += "</table><pre style='display:none'>".concat(t, "}
{ 0 , "  "</pre></li></ul></div>\n" "}
{ 1 , " ) "}
{ 0 , " ; "}

Original issue reported on code.google.com by [email protected] on 25 May 2008 at 12:42

Add the CHANGED operation

Hello,
as I could see the library is very good.

I need one more functionality in diff, the CHANGED lines.



Original issue reported on code.google.com by [email protected] on 21 Oct 2008 at 10:27

Invalid patch created

* What steps will reproduce the problem?

I have attached a demonstration of the issue. This creates a patch between
two strings. It then tries to apply the patch to the first string to get
the second string.

* What is the expected output? What do you see instead?

When applying the patch, an IllegalArgumentException is thrown. The patch
created has the following invalid header on the second chunk:

@@ --2,32 +9,36 @@


* What version of the product are you using? On what operating system?

Latest version (20090202) on OS/X, Java 6 - but also occurs on Linux.


* Please provide any additional information below.

None.

Original issue reported on code.google.com by [email protected] on 24 Mar 2009 at 6:58

Attachments:

dummy bug test for neil

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 28 Jun 2007 at 11:53

compare like knol

What steps will reproduce the problem?
1. use for compare diff in HTML (with tags)

What is the expected output? What do you see instead?
compare like knol


What version of the product are you using? On what operating system?
diff_match_patch_20090202


Please provide any additional information below.
any sugestion???


Original issue reported on code.google.com by [email protected] on 5 Mar 2009 at 5:10

Unchecked cast (Java)

Compiling project generates a warning for line 221:
 linearray = (ArrayList<String>) b[2];
results in:
 Type safety: Unchecked cast from Object to ArrayList<String>

This can be worked around by ignoring warnings of course, so I'm not clear
on whether this is important or not (a relative noob at Java). My apologies
if this is normal behaviour.

What version of the product are you using? On what operating system?
Version 20080624
Built in Eclipse with java 1.6.0.10 on Ubuntu 8.10

Original issue reported on code.google.com by [email protected] on 8 Nov 2008 at 6:23

Important: Ping me if you file a bug.

Google Code does not currently send any notification to me when a new issue
is added to this list.  Since there are virtually no issues with my code
(gloat), I don't visit this page often.

So if you don't want to be ignored, send me an email to let me know that
you've filed a new bug:
  http://neil.fraser.name

Thanks!

Original issue reported on code.google.com by [email protected] on 29 Jun 2007 at 1:32

Is it possible to do word by word comparison?

This is a great piece of code. I have one concern on the compariosn. Is it 
possible to chnage the logic in such a way that it does word by word 
comparison? 

I have written following test code in java
========================================================================
String text1="Hello how are you. <br/> My name is Prathyusha. Shravanthi 
is my friend.";
String text2="Hello. <br/>My name is Shravanthi. Prathyusha was my 
friend.";
diff_match_patch diff = new diff_match_patch(); 
LinkedList<diff_match_patch.Diff> diffs = diff.diff_main(text1, text2);
diff.diff_cleanupSemantic(diffs);
String result = diff.diff_prettyHtml(diffs);
System.out.println(result);
=====================================================================

I see the following output
--------------------------------------------------------
Hello<DEL> how are you</DEL>. <br/><DEL> </DEL>My name is <DEL>Prathyusha. 
Shravanthi i</DEL><INS>Shravanthi. Prathyusha wa</INS>s my friend.
--------------------------------------------------------

As the output shows the logic doesn't break the differences into logical 
words. Rather it does comparison on a chunk of string. Word by word 
comparison would help in getting a precise count of the newly added words 
and the deleted words. Additinally if we check the deleted 
<DEL>Prathyusha. Shravanthi i</DEL> text and the inserted <INS>Shravanthi. 
Prathyusha wa</INS> text, the middle chars ' ' and '.' are common. They 
should not be considered as deleted and inserted. This problem wouldn't 
have arised if we do a word by word comparison. The word count in all is 
increased by 4 conidering ' ' and '.' as 2 words.
Is it possible to do a word by word comparison?

Regards,
Pratap

Original issue reported on code.google.com by [email protected] on 4 Jul 2009 at 1:51

UnicodeDecodeError in Python version when using Cyrillic characters.

Want to use Cyrillic characters with diff_match_patch (python version,
release), but got errors like:
"UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 0:
unexpected end of data"

appending in some places to strings ".decode("utf-8").encode("utf-8")",
seem to solve the problems, but I guess not 100%.

see the attached patch (and for any case new file).

Alexandr.

Original issue reported on code.google.com by [email protected] on 11 May 2008 at 1:12

Attachments:

can't get index, via java API: diff_main(String text1, String text2)

What steps will reproduce the problem?
1. use java API: diff_main(String text1, String text2).
    text1 = "this is a test";
    text2 = "this is a test A";   
2. LinkedList<Diff> diff = diff_main(String text1, String text2);
3. in "LinkedList<Diff> diff", I can't get "index" of in Diff Class.

What is the expected output? What do you see instead?
expected output: index = 16
but index always equal -1

What version of the product are you using? On what operating system?
version:20071106, java 
OS: windows XP

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 29 Dec 2007 at 2:55

The diffing algorithm is not accurate

While it *does* report differences, it tends to pick the largest diff that
it can (rather than more, smaller, diffs).  

Choose a large (10MB or so) text file with lots of carriage returns -- an
XML file formatted for readability will do -- and permute it with the
following algorithm (this is Ruby code; $. is the current line in the
input, and gsub! alters the current string):

  for x in STDIN
    if $. % 1000 == 0
      deleted = x 
    elsif $. % 100 == 0          
      puts deleted if deleted
    elsif $. % 10 == 0             
      x.gsub!(/>.*</, ">XY<")
      puts x
    else
      puts x
    end
  end                              

This deletes every 1000'th line, inserts the previously deleted line every
100'th line, and alters every 10'th line.   

diff_match_patch reported the following changes:

  Operation     String size
  EQUAL         339
  DELETE        13934676
  INSERT        13860601
  EQUAL         92

So, basically, it reported that the whole file had changed.  The same two
files run through GNU diff resulted in 36766 differences.  A corresponding
patch made from diff_match_patch would have resulted in a patch file almost
as big as the original file (nearly 14MB); the patch file from GNU diff was
around 3MB, or 1/4 the size. This means that that the algorithm used by
diff_match_patch produces extremely inefficient patches.

Incidentally, setting the "checklines" flag to false actually results in a
*faster*, not slower, diff, although the resulting reported differences
don't vary greatly.

This is with 20090501 with Java 1.6.0 on Ubuntu Intrepid.

Original issue reported on code.google.com by seanerussell on 7 Jun 2009 at 12:21

  • Merged into: #26

Minimizing the number of lines

Consider the left text :

AAA
BBB EEE

and the right text

AAA
BBB DDD
BBB EEE

To the human eye, it's obvious that the middle was added.
However, gdmp (and other character level diff algorithms I tried) see it 
differently, spanning the 
added string ("DDD\nBBB") over two lines.
Of course it makes sense from a computational standpoint, but that should at 
least be cleaned 
up by the cleanup functions, which should try to minimize the number of lines.
What do you think ?



Original issue reported on code.google.com by [email protected] on 20 Jun 2008 at 9:35

SyntaxWarning in Python2.6 (MacOSX Leopard)

diff_match_patch.py:430: SyntaxWarning: assertion is always true, perhaps 
remove parentheses?
  assert (text1[x] == text2[y],
diff_match_patch.py:475: SyntaxWarning: assertion is always true, perhaps 
remove parentheses?
  assert (text1[-x - 1] == text2[-y - 1],
diff_match_patch.py:1158: SyntaxWarning: assertion is always true, perhaps 
remove parentheses?
  assert (self.Match_MaxBits == 0 or len(pattern) <= self.Match_MaxBits,
 SyntaxWarning: assertion is always true, perhaps remove parentheses?
  assert (False, "Unknown call format to patch_make.")

Original issue reported on code.google.com by [email protected] on 2 May 2009 at 12:54

Java warning in diff_compute

Problem:
Unchecked cast from Object to ArrayList<String> in diff_match_patch.java on
line 229 in function diff_compute.

What is the expected output? What do you see instead?
No warnings should be produced

What version of the product are you using?
diff_match_patch_20090615.zip

Solution:
Create a new class diff_linesToChars_result with public the following
signature:

public String chars1;
public String chars2;
public List<String> lineArray;

See attachement

Original issue reported on code.google.com by [email protected] on 17 Jul 2009 at 7:12

Attachments:

QT Library?

Why is it required to have the QT Library for C++? That's a big chunk of
code just to use QList and QString (Maybe there is more I'm not seeing).

Why not use the C++ Standard Library?

Original issue reported on code.google.com by bradleelandis on 8 Sep 2009 at 8:42

java code bug

when i compare the following two text on sit   
"ZHEJIANG JIANGLONG TEXTILE PRINTING" and   
"ZHEJIANG JIANGLIMEI KNITTING CLOTH"
with Match balance=0.6 and Match threshold: 0.4 
i have two results: 
The first when using Demo of Match on the site(JavaScript)the result is
don't match 
the second result when using java code in my application the result is
match ok (found)
what is the difference?



Original issue reported on code.google.com by [email protected] on 9 Sep 2008 at 12:12

Pattern too long for this browser

What steps will reproduce the problem?

 I attached a JS file which reproduces the problem.

What is the expected output? What do you see instead?

 Error: Pattern too long for this browser. 

What version of the product are you using? On what operating system?
 javascript running on Rhino, java 1.5

Please provide any additional information below.
 the patch causing that problem is created by patch_fromText() and a
subsequent reverting of the patch (changing DELETES into INSERTS - that's
why the first status field says "-0").

Original issue reported on code.google.com by [email protected] on 7 Sep 2009 at 11:26

Attachments:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.