Comments (19)
Hi Carsten,
I've applied the patch but cannot see any improvements over the pre-patching
output. Perhaps I had the wrong expectations about the outcome can you please
advice? I've attached the two sources i'm using for comparison and the
resulting output.
Original comment by [email protected]
on 19 Nov 2010 at 10:02
Attachments:
from daisydiff.
Just to clarify the problem with the output on the previous comment. The diff
for the Manufacturer and Description sections are scrambled together.
Original comment by [email protected]
on 19 Nov 2010 at 10:14
from daisydiff.
Hi Michael,
sorry for coming back to you so late. I just checked your example and you are
right, this scenario is not covered by my patch. Daisydiff doesn't know or care
about sections or chapters like e.g. "Manufacturers" from your snippets.
Daisydiff only sees two long list of text elements and tries to find and report
as few differences in them as possible.
Daisydiff would need to learn about those chapters (h1..h6 for a start) and
perform the differencing separately for the text nodes of those chapters.
I.e. DomTreeBuilder should not return a flat list of elements but some kind of
tree of chapters, each with their own list of text nodes.
Ideally you would then create one TextNodeComparator per chapter and perform
the diffing chapter-wise. But this needs some work, because this would lead to
multiple outputs (you want to have just one single result of course).
Cheers,
Carsten
Original comment by [email protected]
on 14 Mar 2011 at 4:14
from daisydiff.
[deleted comment]
from daisydiff.
Carsten,
Can you post an example or two, where one can see clearly what has changed
before and after applying this patch? This should help people understand the
changes and decided if the patch should be accepted or not.
Original comment by [email protected]
on 23 May 2011 at 2:40
from daisydiff.
I applied the patch (which was tricky because it clashed with the performance
fixes in 1.2) and I note that it fixes
http://code.google.com/p/daisydiff/issues/detail?id=26 for me, without breaking
anything else I could think of. Awesome work Carsten.
Original comment by [email protected]
on 15 Jun 2011 at 5:29
from daisydiff.
Can you create then a new patch (that is based on 1.2) and attach it here?
Then I will apply this to trunk if you think that it should be part of
DaisyDiff.
Original comment by [email protected]
on 15 Jun 2011 at 6:41
from daisydiff.
Yes. Will do. I have a little more testing to do first.
Original comment by [email protected]
on 15 Jun 2011 at 6:54
from daisydiff.
Here's a simple testcase that my patch fixes. Note that the second table cell
in the old.html file ("1XXX") must start with the same character with which the
previous (first) cell ends ("1").
In two-way diff mode, daisydiff creates a new bogus column in the table.
In three-way diff mode it even puts the red removal marker into the first
column, AFAIR.
Original comment by [email protected]
on 16 Jun 2011 at 10:49
Attachments:
from daisydiff.
I will have no time for this in the coming weeks, so I took the time today,
rebased the patch to trunk and committed it. I'm using it internally for almost
a year and had no problems with it.
(There are more problems left, but unrelated to this issue).
Original comment by [email protected]
on 17 Jun 2011 at 3:57
from daisydiff.
The SeparatingNode is missing a hashCode method. Patch provided.
Original comment by [email protected]
on 24 Jun 2011 at 7:22
Attachments:
from daisydiff.
While the SeparatingNode seems to really improve the handling of tables, I'm
not convinced that the check for paretn position in TagNode.isEquals is a good
change. It leads to a lot of cells claiming to be moved when really they've
just had a row inserted before them.
Eg I find that removing that section improves things. I'm attaching a patch
that comments out the part I'm talking about so that people can discuss whether
they like it or not.
Original comment by [email protected]
on 24 Jun 2011 at 7:34
Attachments:
from daisydiff.
Thanks Don. I committed your patch until I find a better solution. The problem
(i.e. the reason for the position-check) manifests itself mostly in three-way
diff, as I found out.
Original comment by [email protected]
on 4 Jul 2011 at 12:52
from daisydiff.
Don,
after commenting out the "parent index" thing in TagNode, the testcases
"broke", as expected. When I had a look at fixing them, it became obvious that
it is currently too time consuming to validate them.
So here's my proposal to fix that:
- rename the file expected.xml to expected.html
- make them html markup instead of xml
- add an html header including a reference to the diff.css
That way, one can easily see the expected output (and also validate if the
expected output make sense at all).
I will then regenerate all expected.html files with the current version. If
you're fine with that, I will commit that tomorrow.
Original comment by [email protected]
on 4 Jul 2011 at 4:13
from daisydiff.
[deleted comment]
from daisydiff.
Hi Carsten,
switching to html files sounds great to me.
Original comment by [email protected]
on 5 Jul 2011 at 9:35
from daisydiff.
Committed in r155. Please re-open if you encounter any issues.
Original comment by [email protected]
on 13 Sep 2011 at 9:05
- Changed state: Fixed
from daisydiff.
Hi,
I've may have found a fix for Carsten's position checking (which brings too
many drawbacks with it). I simply added more separators by commenting out a
couple of lines of code - a simple patch and I'm attaching it to this issue. I
ran the new code against the file-based unit tests and while the code does
perform worse in a few cases it is a huge improvement overall. I believe it
also solves issue 44.
I went over all unit tests which have 'failed' after adding the new code and my
tally goes like this:
Big improvements in Lists 22, 23, 35; General 5, 6, 35
Small improvements in General 9
Big regression in General 38
Slight regressions in Lists 6, 19 (bad HTML!), 20 (bad HTML!), 31; Paragraphs
43; General 39 (though Lists 19 and 20 contain bad HTML, UL elements cannot be
nested in other UL elements unless they are wrapped in a LI first - if I fix
this the result with new code isn't bad)
Couldn't decide: Lists 4, 5, 17, 18; General 42
I would appreciate if somebody could take the time to review these tests. To do
this, apply the patch, run 'ant test' and go to the 'testdata' directory to
review the test results: failed tests will have the '_actual_result.html' file
present in their directory (not that failure here simply means a different
output, not a failure per se).
Original comment by [email protected]
on 3 Sep 2012 at 8:33
Attachments:
from daisydiff.
Hi Gregor,
thanks a bunch for your efforts. The improvements of your patch do indeed look
good to me. The regressions in lists however are a bad. It would be awesome if
we could find a way to get the improvements without the regressions :-)
Can you tell me how you fixed the badly nested ULs so that the results are OK?
Adding a <li> in front of the nested <ul> changes the behavior (you get another
bullet point). Even though it's not correct html to leave out the <li>, we have
to support that, because it is quite common. For example the Mozilla HTML
Commposer (the rich text editor in Thunderbird for example) creates such HTML.
Regarding issue 44 -- I couldn't reproduce that problem at all (i.e. plain
1.2). The result was the same (perfect indentation) with and without your patch.
Thanks,
Carsten
Original comment by [email protected]
on 19 Sep 2012 at 8:58
from daisydiff.
Related Issues (20)
- Add location of css and js directories as a commandline option HOT 1
- Indentation goes crazy when you compare two html pages having lot of numbered lists HOT 2
- Indentation goes crazy when you compare two html pages having lot of numbered lists HOT 4
- Alternative jQuery UI for Diff results
- Daisydiff fails to process certain invalid HTML files HOT 9
- DOM structure is modified in the daisydiff output HOT 5
- Unit test fails due to a missing newline character HOT 2
- Empty IMG tag throws NullPointerException
- when we comparing the two files have no change, daisy diff throwing Uncaught unknown destination..
- Does not Diff <Title> or <Meta> or <JavaScript> in HTML HOT 3
- An element that was moved out of a table can lead to broken table elements in the diff HOT 1
- Invalid tags are generated HOT 2
- Word changed but showing removed and added & change in image showing improper in Chrome. HOT 1
- error on line 6 at column 8: Opening and ending tag mismatch: link line 0 and head HOT 8
- [deleted issue]
- Compare result error for table
- [deleted issue]
- TextNode->IsSame() references not declared variable $html2 in php
- Is it possible to display the new created content? Not display the whole line.
- Xerces Impl included in daisydiff.jar has security vulnerabilities
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from daisydiff.