Code Monkey home page Code Monkey logo

compact-json's People

Contributors

jsh9 avatar masaccio avatar rrthomas avatar zerorin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

compact-json's Issues

Sample from the README does not work

I've just copied this from the README:

README script
from compact_json import Formatter, EolStyle

formatter = Formatter()
formatter.indent_spaces = 2
formatter.max_inline_complexity = 10
formatter.json_eol_style = EolStyle.LF

# Format the JSON contents:
with open("input.json", "r") as f:
    obj = json.load(f)
    json_string = formatter.serialize(obj)
    print(json_string)

# Create a new JSON file with formatted JSON contents:
with open("input.json", "r") as f:
    obj = json.load(f)

formatter.dump(obj, output_file="output.json", newline_at_eof=True)

Minor details is that it should import json, but even when I do that, I get

AttributeError: 'Formatter' object has no attribute 'dump'

Typo in help

"Windows-style CRLR" should be "Windows-style CRLF".

Thanks for compact-json, just what I needed!

A few feature requests

Hi authors, thank you for writing this awesome tool!

JSON readability is very important but I think it's overlooked by the standard library.

May I request the following features to make this library easier to use?

  1. Could you add a dump method to directly dump the re-formatted JSON string to a JSON file?
  2. Would it be possible to add a sanity checking method to check that the deserialized objects from the old string and the new string are identical?
  3. Could you add a command line argument --output-file and let users directly create a new JSON file with formatted contents?
  4. Could you register a command line entry point, so that users can use this tool in the command line with compact-json ... instead of _compact_json.py ...?

Additional notes for Request No. 4 above:

I see that this package uses poetry. I'm not so familiar with poetry so I asked ChatGPT how to register an entry point with poetry. Below is my Q&A with ChatGPT:

Question:

How do I register a command line entry point for a Python package when I'm using poetry?

Answer:

To register a command line entry point for a Python package when using Poetry, you can use the console_scripts section in your pyproject.toml file.

Here's an example pyproject.toml file:

[tool.poetry]
name = "mypackage"
version = "0.1.0"
description = "My Python package"
authors = ["Your Name <[email protected]>"]

[tool.poetry.dependencies]
python = "^3.9"

[tool.poetry.scripts]
mypackage-cli = "mypackage.cli:main"

[build-system]
requires = ["poetry-core>=1.1.0"]
build-backend = "poetry.core.masonry.api"

In this example, we have specified a script called mypackage-cli which points to a function called main in a module called cli inside a package called mypackage.

To use the command line entry point, you can install the package using poetry install and then run the command mypackage-cli in your terminal.

poetry install
mypackage-cli

This will execute the main function defined in your cli module. You can replace mypackage-cli with any name you want for your command line entry point.

JSON object names need to be strings

First off: thanks for the very useful tool! I am really happy to have found it, it fills a gap in the standard library that I think many people have experienced.

The issue I encountered: in a JSON object, names can only be strings (see 2.2 in the RFC). So if you do:

from compact_json import Formatter
Formatter().serialize({100: 'foobar'})

you should get either {"100": "foo bar"} (this is what the standard json library does), or perhaps raise an exception (this is what the cjson library apparently does). But with compact-json, this gives {100: "foo bar"}, which is invalid JSON. Trying to decode it with

import json
json.loads('{100: "foo bar"}')

raises a json.decoder.JSONDecodeError: "Expecting property name enclosed in double quotes: line 1 column 2 (char 1)".

I think raising an exception is the clearest option here, since silently stringifying the key could lead to unexpected behavior. An optional stringify_names argument could be nice though.

Side note:, I think the json library is actually not handling this case correctly: json.dumps({100: "foo", "100": "bar"} gives {"100": "foo", "100": "bar"} which is invalid JSON since names should be unique.

CLI doesn't expose many format options

compact-json CLI tool doesn't support all possible format options.
It significantly limits its usage.
Is it possible to expose all the format options in the CLI?

Alternatively, a general format parameter can be exposed, e.g. compact-json --format comma_padding=false in.json

Data corruption when padding

Given the input

[ "seq", [ "if", [ "=", "i", [ "prop", "length", "l" ] ], [ "return", "l" ] ] ]

compact-json produces:

[
    "seq", 
    [
        [                                        ], 
        [ "="     , "i", ["prop", "length", "l"] ], 
        [ "return", "l"                          ]
    ]
]

where the string "if" has been replaced with an empty list.

feature request: allow to use stding as input in an unix way

to have file as arguments like:

$ compact-json tests/data/test-11.json --indent 2 --max-inline-length 88
{
  "ObjectColumnsObjectRows": {
    "Vera": {"street": "12 Madeup St.", "city": "Boston", "state": "MA", "zip": "02127"},
    "Chuck": {
      "street": "994 Fake Place",
      "unit": "102",
      "city": "Las Vegas",
      "state": "NV",
      "zip": "89102"
    },
    "Dave": {"street": "1967 Void Rd.", "unit": "B", "city": "Athens", "state": "GA"}
  },
  "ArrayColumnsObjectRows": [
    { "type": "turret", "hp": 400, "loc": {"x": 47, "y": -4}, "flags": ["stationary"] },
    { "type": "assassin", "hp": 80, "loc": {"x": 102, "y": 6}, "flags": ["stealth"] },
    { "type": "berserker", "hp": 150, "loc": {"x": 0, "y": 0} },
    {
      "type": "pittrap",
      "loc": {"x": 10, "y": -14},
      "flags": ["invulnerable", "stationary"]
    }
  ],
  "DissimilarObjectRows": {
    "lamp": {"x": 4, "y": 1002, "r": 75, "g": 0, "b": 130, "state": 1},
    "address": {"city": "San Diego", "state": "CA"},
    "font": {"r": 0, "g": 12, "b": 40, "size": 18, "style": "italic"}
  }
}
$ 

and stdin this:

$ echo tests/data/test-11.json | compact-json  - --indent 2 --max-inline-length 88
 ...

"list elements not inline" warning

Encountered this when I'm trying to build a more compact formatter
Here is a simple way to reproduce the warning

formatter = Formatter(max_compact_list_complexity=999)
data = [
    {f'key{i}':f'value{i}' for i in range(2)},
    {f'key{i}':f'value{i}' for i in range(5)},
    {f'key{i}':f'value{i}' for i in range(2)},
    {f'key{i}':f'value{i}' for i in range(2)},
]
print(formatter.serialize(data))

with unlimited complexity it tries to format the dict as multi-line compact, and as there is a very long object that is not inline, the warning is triggered.

[
    {"key0": "value0", "key1": "value1"},
    {
        "key0": "value0",
        "key1": "value1",
        "key2": "value2",
        "key3": "value3",
        "key4": "value4"
    },
    {"key0": "value0", "key1": "value1"},
    {"key0": "value0", "key1": "value1"}
]
C:\Users\ZeroRin\Documents\Projects\compact-json\src\compact_json\formatter.py:414: RuntimeWarning: list elements not inline (please report an issue)
  
C:\Users\ZeroRin\Documents\Projects\compact-json\src\compact_json\formatter.py:462: RuntimeWarning: list elements not inline (please report an issue)
  self.indent(buffer, item.depth + 1)

I'd suggest allowing list tobe multiline compact even with multiline elements in it, so that list with many small element but a few large element would not take much space. Here is an example output with the same setup as above:

[
    {"key0": "value0", "key1": "value1"}, 
    {
        "key0": "value0",
        "key1": "value1",
        "key2": "value2",
        "key3": "value3",
        "key4": "value4"
    }, 
    {"key0": "value0", "key1": "value1"}, {"key0": "value0", "key1": "value1"}
]

This could be done by modifing the newline condition so that any multiline elements starts from a newline and ends with a newline:

            if (
                line_length_so_far + segment_length > self.max_inline_length
                and line_length_so_far > 0
                or child_index != 0
                and (
                    item.children[child_index].format != Format.INLINE
                    or item.children[child_index - 1].format != Format.INLINE
                )
            ):

I'm trying to make that dict in the middle also multiline-compact so that hopefully the final version would look better than the current showcase

Pb on pytest: "fixture 'script_runner' not found"

I guess that something script_runner has not been commited ?

(doc) luis@spinoza:~/proj/g/alien/compact-json$ pytest 
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.10.4, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/luis/proj/g/alien/compact-json, configfile: pyproject.toml
plugins: cov-4.0.0
collected 10 items                                                                                                                                                                                         

tests/test_console_script.py EEEEEE                                                                                                                                                                  [ 60%]
tests/test_issues.py ...                                                                                                                                                                             [ 90%]
tests/test_json.py .     
_______________________________________________________________________________________ ERROR at setup of test_main ________________________________________________________________________________________
file /home/luis/proj/g/alien/compact-json/tests/test_console_script.py, line 111
  @pytest.mark.script_launch_mode("subprocess")
  def test_main(script_runner):
E       fixture 'script_runner' not found
>       available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, cov, doctest_namespace, monkeypatch, no_cover, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
>       use 'pytest --fixtures [testpath]' for help on them.

[Bug] Trouble handling exponential notation

I encountered a bug while trying to use it.

This is a minimum JSON file that can reproduce the issue:

{
    "key0": [
        [
            "subkey0",
            1e-05
        ],
        [
            "subkey1",
            2e-05
        ],
        [
            "subkey2",
            3e-05
        ]
    ]
}

I'm using the default options, just this: compact-json input.json

The call stack is:

  File "/Users/jsh9/github/compact-json/src/compact_json/_compact_json.py", line 166, in main
    json_string = formatter.serialize(obj)
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 279, in serialize
    return self.prefix_string + self.format_element(0, value).value
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 302, in format_element
    formatted_item = self.format_dict(depth, element)
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 369, in format_dict
    elem = self.format_element(depth + 1, v)
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 300, in format_element
    formatted_item = self.format_list(depth, element)
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 355, in format_list
    if self.format_table_list_list(item):
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 555, in format_table_list_list
    column_stats = self.get_list_stats(item)
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 980, in get_list_stats
    col_stats_list[index].update(child, index)
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 104, in update
    (whole, frac) = str(prop_node.value).split(".")
ValueError: not enough values to unpack (expected 2, got 1)

Feature request: add a more compact mode for dicts

Lists can be made compact with multi-line inline list, but dicts with many records would always be formatted one item per line.
While this makes the output clean and easy to edit, in some cases it makes the output very space-inefficient. I wonder if we can ask dicts to be multi-line inline as well in certain scenarios.

Examples of disired outputs (I manually formatted it to suit a inline width of 80):

{
  "duplicate_info": {
    "1": {
      "id": "439", "sangvis_hiringteam_id": "1001", "challenge_type": "1",
      "enemy_team_id": "940001",
      "gun_in_allys": "3001,3002,3003,3004,3005,3506,3510,3513,3517,3531,3539,3566,3578,3586,3601,3605,3611,3647,3649,3660,3664,3677,3678,3702,3705,3713,3723,3735,3742",
      "pass_time": "9", "award_item": "2:400001:10:10000", "item_cost": "603:3",
      "is_night": "0", "map_type": "4", "map_code": "forest",
      "default_team": "3001,3002,3003,3004,3005", "start_time": 1673193600,
      "end_time": 1673798399
    }
  },
  "req": {
    "spot_id": 1633, "if_enemy_die": true, "current_time": 1673625806,
    "boss_hp": 0, "mvp": 210119981, "last_battle_info": "",
    "use_skill_squads": [], "use_skill_ally_spots": [],
    "guns": [
      { "id": 210119981, "life": 97 }, { "id": 209408291, "life": 58 },
      { "id": 210084671, "life": 106 }, { "id": 209411557, "life": 45 },
      { "id": 207888310, "life": 132 }
    ],
    "user_rec": "{\\\"seed\\\":9082,\\\"record\\\":[\\\"0,0,2,1\\\"]}",
    "1000": {
      "10": 0, "11": 0, "12": 0, "13": 0, "15": 0, "16": 0, "17": 0, "33": 0,
      "40": 0, "18": 0, "19": 0, "20": 0, "21": 0, "22": 0, "23": 0, "24": 0,
      "25": 0, "26": 0, "27": 0, "34": 0, "35": 0, "41": 0, "42": 0, "43": 0,
      "44": 0
    }, 
    "1001": {}, "1002": {}, "1003": {}, "1005": {}, "1007": {}, "1008": {},
    "1009": {}, "battle_damage": {}
  }
}

The length computation of the last line in a multi-line is wrong

item_length + len(self.padded_comma_str) if not_last_item else 0

the current code results in a "tail" at the last line as the length of the last element is set to 0:
image

I think this line should be changed to

item_length + len(self.padded_comma_str)

which take into account the comma length no matter whether there will be a comma after the item. This could prevent such tail from happening when the max length is exactly 1 character less than the length of k elements with trailing comma. For example (I edited the bugged line to item_length + (len(self.padded_comma_str) if not_last_item else 0) before running the following code):
image

Table list shadowed by multiline compact list

from compact_json import Formatter

formatter = Formatter(max_compact_list_complexity=2)
def get(i):
    return [
        {f'key{i}':f'v{i}' for i in range(i)},
        {f'key{i}':f'va{i}' for i in range(i)},
        {f'key{i}':f'val{i}' for i in range(i)},
        {f'key{i}':f'valu{i}' for i in range(i)},
        {f'key{i}':f'value{i}' for i in range(i)},
    ]
print(formatter.serialize(get(2)))
print(formatter.serialize(get(3)))

results in

[
    {"key0": "v0", "key1": "v1"}, {"key0": "va0", "key1": "va1"}, 
    {"key0": "val0", "key1": "val1"}, {"key0": "valu0", "key1": "valu1"}, 
    {"key0": "value0", "key1": "value1"}
]
[
    {"key0": "v0", "key1": "v1", "key2": "v2"}, 
    {"key0": "va0", "key1": "va1", "key2": "va2"}, 
    {"key0": "val0", "key1": "val1", "key2": "val2"}, 
    {"key0": "valu0", "key1": "valu1", "key2": "valu2"}, 
    {"key0": "value0", "key1": "value1", "key2": "value2"}
]

Notice that the second list is not formatted as a table as expected. That's because it is actually a multiline-compact list (although it is not "compact" at all as every element takes a whole line)
This cannot be fixed by simply reordering format_list_multiline_compact and format_table_list_dict since this would change the first list to table as well (while it can be more compact). to fix this we must check whether a compact list is actually "compact"

feature request: provide a short option argument e.g. --max-inline-length --indent

so this:

$ compact-json tests/data/test-11.json --indent 2 --max-inline-length 88
{
  "ObjectColumnsObjectRows": {
    "Vera": {"street": "12 Madeup St.", "city": "Boston", "state": "MA", "zip": "02127"},
    "Chuck": {
      "street": "994 Fake Place",
      "unit": "102",
      "city": "Las Vegas",
      "state": "NV",
      "zip": "89102"
    },
    "Dave": {"street": "1967 Void Rd.", "unit": "B", "city": "Athens", "state": "GA"}
  }, ...

could be replaced by:

$ compact-json tests/data/test-11.json -i 2 -l 88

Weird type conversion when formatting table

running the following code

from compact_json import Formatter

data = [{'a': '0', 'b': '1', 'c':'0'}, {'a': '0', 'b': 1}]

formatter = Formatter(always_expand_depth=0)
print(formatter.serialize(data))
formatter.table_dict_minimum_similarity=100
print(formatter.serialize(data))

results in the following output

$ python test.py
C:\Users\ZeroRin\Documents\Projects\compact-json\src\compact_json\formatter.py:68: RuntimeWarning: handled quantize error (please report an issue)
  warnings.warn(
[
    { "a": "0", "b": 0, "c": "0" },
    { "a": "0", "b": 1           }
]
[
    {"a": "0", "b": "1", "c": "0"},
    {"a": "0", "b": 1}
]

A warning is raised during the first formatting and the output somehow changed the value of data[0]['b'] from string "1" to number 0. During the second run where the output is not formatted as table, the output is correct

Add __main__.py to support `python -m compact_json`?

python -m compact_json ... seems to me as a more pythonic way of using a module as a cmdline tool. This could be done by adding a __main__.py with from ._compact_json import main; main() or directly rename _compact_json.py to __main__.py

Not sure how this would affect building the executable so I leave a issue here instead of direct PR.

Str enums not encoded correctly

When string enums are used as keys, they are not encoded correctly.

from enum import Enum
from compact_json import Formatter

class MyEnum(str, Enum):
    abc = 'Abc'
    def = 'Def'

formatter = Formatter()
formatter.serialize({ MyEnum.abc: 'abc' })
# Output:  {"MyEnum.abc": "abc"}
# Expected: {"Abc": "abc"}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.