Code Monkey home page Code Monkey logo

Comments (13)

AstroMen avatar AstroMen commented on August 15, 2024 5

Actually, it is not convenient to compare two dicts from json. set is not been supported by json, so I need to convert list to set at first before I diff. Maybe it will be nice to have ignore_order, though I could convert list to set.

from dictdiffer.

jirikuncar avatar jirikuncar commented on August 15, 2024

@inoks can you use set instead of list if you are not interested in the order of items?

from dictdiffer import diff

first = {'a': {1,2}}
second = {'a': {2,1}}

result = diff(first, second)
assert list(result) == []

from dictdiffer.

slyapustin avatar slyapustin commented on August 15, 2024

@jirikuncar Actually, no. My dict came from xmltodict, which parsed pretty large xml documents.
I ended up creating dict sorting function, which sorts all nested dict and lists, before using diff().

from dictdiffer.

slyapustin avatar slyapustin commented on August 15, 2024

@jirikuncar And i also want duplicate items in list to be detected by diff().

from dictdiffer.

jirikuncar avatar jirikuncar commented on August 15, 2024

@inoks if you have a working solution for your domain specific problem, can you please close the issue?

from dictdiffer.

slyapustin avatar slyapustin commented on August 15, 2024

@jirikuncar I prefer that issue still be there, just in case someone decide to implement that in the future.

from dictdiffer.

jirikuncar avatar jirikuncar commented on August 15, 2024

To be honest, I don't think it should be part of diffing library. The same way as you diff two files in UNIX.

$ diff first.txt second.txt

The lines are not sorted unless you explicitly pre-sort them.

$ sort first.txt > sorted_first.txt
$ sort second.txt > sorted_second.txt
$ diff soted_first.txt sorted_second.txt

What if somebody wants only certains keys to be sorted, or provide custom sorting method? I'm worried that the API complexity would explode.

from dictdiffer.

lnielsen avatar lnielsen commented on August 15, 2024

I agree with @jirikuncar. A Python list is an ordered collection, and thus order should be taken into consideration when diff'ing two lists. If order doesn't matter then either a set should be used or the lists should be preprocessed (i.e. sorted) prior to being diff'ed. It will make the algorithm overly complex and likely introduced subtle errors to add this feature.

I understand your issue, but perhaps another solution is to simply create another package with your preprocessor function that relies on diff to do the job? IMHO it's best to keep a algorithm strict and not add all sorts of exceptions into it.

from dictdiffer.

slyapustin avatar slyapustin commented on August 15, 2024

I will agree with both of you, but there already tolerance option existed, so it will be nice to have ignore_order as well.

from dictdiffer.

jirikuncar avatar jirikuncar commented on August 15, 2024

@inoks the tolerance is there because of limitations in operations with floating points numbers. Two "identical" values can be compared to False when they are calculated in different ways. This problem is not possible to solve with pure preprocessing without rounding values.

Having said that I would really like to keep the package minimalist [1] and let developers to create their own pipelines.

[1] http://www.catb.org/esr/writings/taoup/html/ch01s06.html

from dictdiffer.

unitysipu avatar unitysipu commented on August 15, 2024

I humbly disagree with the previous assessments.

I'm working with abritrary sets of data and tens of thousands of dictionary records, which i do not control and currently the diffing is triggering when some lists are in arbitrary order. It's not always practical to manipulate these dictionaries each time before they're checked, i'd rather not touch them as I may introduce variability in my code. It would be nice if the tool would have a built-in feature that could just ignore the order as long as the content is the same for a given field.

Another library DeepDiff has this feature, but it has issues with some fields of data while this tool does a better job at it.

The set() workaround doesn't work when the objects in a list are not hashable, sorted(list) may work better in those cases. (it's likely deepdiff is using this method internally why i'm seeing issues diffing certain types of lists)

Lastly just because something is bad and inconvenient in unix, it doesn't mean it should be inconvenient in this tool. The unix-example is just sorting one list contained in a file, but these dictionaries are complex nested constructs with arbitrary objects.

from dictdiffer.

jirikuncar avatar jirikuncar commented on August 15, 2024

@unitysipu, do you want to add/implement sorting hooks for different types?

from dictdiffer.

unitysipu avatar unitysipu commented on August 15, 2024

I wish i had the time, but I can see how this is not always a trivial problem.

Looks like DeepDiff has some kind of deephash function just recently introduced to try to solve some of these issues, but It's not (yet) clear how i can apply that. Anycase, I hope my use case gives more insight to why this feature would be useful in some cases.

Currently for this dataset deepdiff will just reports it ignores the unhashable fields but doesn't prevent diffing the rest, I maybe able to live with that for now, but I need to probably return to this problem when some field of true significance starts exhibiting issues.

from dictdiffer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.