masaccio / compact-json Goto Github PK

View Code? Open in Web Editor NEW

22.0 22.0 6.0 272 KB

A JSON formatter that produces compact but human-readable

License: MIT License

Python 100.00%

compact-json's People

Contributors

Stargazers

Watchers

Forkers

davemay99 zerorin luisbl jsh9 rrthomas mholford-mongo

compact-json's Issues

Sample from the README does not work

I've just copied this from the README:

README script

from compact_json import Formatter, EolStyle

formatter = Formatter()
formatter.indent_spaces = 2
formatter.max_inline_complexity = 10
formatter.json_eol_style = EolStyle.LF

# Format the JSON contents:
with open("input.json", "r") as f:
    obj = json.load(f)
    json_string = formatter.serialize(obj)
    print(json_string)

# Create a new JSON file with formatted JSON contents:
with open("input.json", "r") as f:
    obj = json.load(f)

formatter.dump(obj, output_file="output.json", newline_at_eof=True)

Minor details is that it should import json, but even when I do that, I get

AttributeError: 'Formatter' object has no attribute 'dump'

Typo in help

"Windows-style CRLR" should be "Windows-style CRLF".

Thanks for compact-json, just what I needed!

A few feature requests

Hi authors, thank you for writing this awesome tool!

JSON readability is very important but I think it's overlooked by the standard library.

May I request the following features to make this library easier to use?

Could you add a dump method to directly dump the re-formatted JSON string to a JSON file?
Would it be possible to add a sanity checking method to check that the deserialized objects from the old string and the new string are identical?
Could you add a command line argument --output-file and let users directly create a new JSON file with formatted contents?
Could you register a command line entry point, so that users can use this tool in the command line with compact-json ... instead of _compact_json.py ...?

Additional notes for Request No. 4 above:

I see that this package uses poetry. I'm not so familiar with poetry so I asked ChatGPT how to register an entry point with poetry. Below is my Q&A with ChatGPT:

Question:

How do I register a command line entry point for a Python package when I'm using poetry?

Answer:

To register a command line entry point for a Python package when using Poetry, you can use the console_scripts section in your pyproject.toml file.

Here's an example pyproject.toml file:

[tool.poetry]
name = "mypackage"
version = "0.1.0"
description = "My Python package"
authors = ["Your Name <[email protected]>"]

[tool.poetry.dependencies]
python = "^3.9"

[tool.poetry.scripts]
mypackage-cli = "mypackage.cli:main"

[build-system]
requires = ["poetry-core>=1.1.0"]
build-backend = "poetry.core.masonry.api"

In this example, we have specified a script called mypackage-cli which points to a function called main in a module called cli inside a package called mypackage.

To use the command line entry point, you can install the package using poetry install and then run the command mypackage-cli in your terminal.

poetry install
mypackage-cli

This will execute the main function defined in your cli module. You can replace mypackage-cli with any name you want for your command line entry point.

Please implement `OmitTrailingWhitespace` in Python

I found this option in the wiki, but it doesn't appear to be implemented in the Python version.

JSON object names need to be strings

First off: thanks for the very useful tool! I am really happy to have found it, it fills a gap in the standard library that I think many people have experienced.

The issue I encountered: in a JSON object, names can only be strings (see 2.2 in the RFC). So if you do:

from compact_json import Formatter
Formatter().serialize({100: 'foobar'})

you should get either {"100": "foo bar"} (this is what the standard json library does), or perhaps raise an exception (this is what the cjson library apparently does). But with compact-json, this gives {100: "foo bar"}, which is invalid JSON. Trying to decode it with

import json
json.loads('{100: "foo bar"}')

raises a json.decoder.JSONDecodeError: "Expecting property name enclosed in double quotes: line 1 column 2 (char 1)".

I think raising an exception is the clearest option here, since silently stringifying the key could lead to unexpected behavior. An optional stringify_names argument could be nice though.

Side note:, I think the json library is actually not handling this case correctly: json.dumps({100: "foo", "100": "bar"} gives {"100": "foo", "100": "bar"} which is invalid JSON since names should be unique.

CLI doesn't expose many format options

compact-json CLI tool doesn't support all possible format options.
It significantly limits its usage.
Is it possible to expose all the format options in the CLI?

Alternatively, a general format parameter can be exposed, e.g. compact-json --format comma_padding=false in.json

Type hints not compatible

Need to do from typing import Dict and use Dict[str] for older pythons

Data corruption when padding

Given the input

[ "seq", [ "if", [ "=", "i", [ "prop", "length", "l" ] ], [ "return", "l" ] ] ]

compact-json produces:

[
    "seq", 
    [
        [                                        ], 
        [ "="     , "i", ["prop", "length", "l"] ], 
        [ "return", "l"                          ]
    ]
]

where the string "if" has been replaced with an empty list.

feature request: allow to use stding as input in an unix way

to have file as arguments like:

$ compact-json tests/data/test-11.json --indent 2 --max-inline-length 88
{
  "ObjectColumnsObjectRows": {
    "Vera": {"street": "12 Madeup St.", "city": "Boston", "state": "MA", "zip": "02127"},
    "Chuck": {
      "street": "994 Fake Place",
      "unit": "102",
      "city": "Las Vegas",
      "state": "NV",
      "zip": "89102"
    },
    "Dave": {"street": "1967 Void Rd.", "unit": "B", "city": "Athens", "state": "GA"}
  },
  "ArrayColumnsObjectRows": [
    { "type": "turret", "hp": 400, "loc": {"x": 47, "y": -4}, "flags": ["stationary"] },
    { "type": "assassin", "hp": 80, "loc": {"x": 102, "y": 6}, "flags": ["stealth"] },
    { "type": "berserker", "hp": 150, "loc": {"x": 0, "y": 0} },
    {
      "type": "pittrap",
      "loc": {"x": 10, "y": -14},
      "flags": ["invulnerable", "stationary"]
    }
  ],
  "DissimilarObjectRows": {
    "lamp": {"x": 4, "y": 1002, "r": 75, "g": 0, "b": 130, "state": 1},
    "address": {"city": "San Diego", "state": "CA"},
    "font": {"r": 0, "g": 12, "b": 40, "size": 18, "style": "italic"}
  }
}
$

and stdin this:

$ echo tests/data/test-11.json | compact-json  - --indent 2 --max-inline-length 88
 ...

"list elements not inline" warning

Encountered this when I'm trying to build a more compact formatter
Here is a simple way to reproduce the warning

formatter = Formatter(max_compact_list_complexity=999)
data = [
    {f'key{i}':f'value{i}' for i in range(2)},
    {f'key{i}':f'value{i}' for i in range(5)},
    {f'key{i}':f'value{i}' for i in range(2)},
    {f'key{i}':f'value{i}' for i in range(2)},
]
print(formatter.serialize(data))

with unlimited complexity it tries to format the dict as multi-line compact, and as there is a very long object that is not inline, the warning is triggered.

[
    {"key0": "value0", "key1": "value1"},
    {
        "key0": "value0",
        "key1": "value1",
        "key2": "value2",
        "key3": "value3",
        "key4": "value4"
    },
    {"key0": "value0", "key1": "value1"},
    {"key0": "value0", "key1": "value1"}
]
C:\Users\ZeroRin\Documents\Projects\compact-json\src\compact_json\formatter.py:414: RuntimeWarning: list elements not inline (please report an issue)
  
C:\Users\ZeroRin\Documents\Projects\compact-json\src\compact_json\formatter.py:462: RuntimeWarning: list elements not inline (please report an issue)
  self.indent(buffer, item.depth + 1)

I'd suggest allowing list tobe multiline compact even with multiline elements in it, so that list with many small element but a few large element would not take much space. Here is an example output with the same setup as above:

[
    {"key0": "value0", "key1": "value1"}, 
    {
        "key0": "value0",
        "key1": "value1",
        "key2": "value2",
        "key3": "value3",
        "key4": "value4"
    }, 
    {"key0": "value0", "key1": "value1"}, {"key0": "value0", "key1": "value1"}
]

This could be done by modifing the newline condition so that any multiline elements starts from a newline and ends with a newline:

            if (
                line_length_so_far + segment_length > self.max_inline_length
                and line_length_so_far > 0
                or child_index != 0
                and (
                    item.children[child_index].format != Format.INLINE
                    or item.children[child_index - 1].format != Format.INLINE
                )
            ):

I'm trying to make that dict in the middle also multiline-compact so that hopefully the final version would look better than the current showcase

Pb on pytest: "fixture 'script_runner' not found"

I guess that something script_runner has not been commited ?

(doc) luis@spinoza:~/proj/g/alien/compact-json$ pytest 
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.10.4, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/luis/proj/g/alien/compact-json, configfile: pyproject.toml
plugins: cov-4.0.0
collected 10 items                                                                                                                                                                                         

tests/test_console_script.py EEEEEE                                                                                                                                                                  [ 60%]
tests/test_issues.py ...                                                                                                                                                                             [ 90%]
tests/test_json.py .

_______________________________________________________________________________________ ERROR at setup of test_main ________________________________________________________________________________________
file /home/luis/proj/g/alien/compact-json/tests/test_console_script.py, line 111
  @pytest.mark.script_launch_mode("subprocess")
  def test_main(script_runner):
E       fixture 'script_runner' not found
>       available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, cov, doctest_namespace, monkeypatch, no_cover, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
>       use 'pytest --fixtures [testpath]' for help on them.

[Bug] Trouble handling exponential notation

I encountered a bug while trying to use it.

This is a minimum JSON file that can reproduce the issue:

{
    "key0": [
        [
            "subkey0",
            1e-05
        ],
        [
            "subkey1",
            2e-05
        ],
        [
            "subkey2",
            3e-05
        ]
    ]
}

I'm using the default options, just this: compact-json input.json

The call stack is:

  File "/Users/jsh9/github/compact-json/src/compact_json/_compact_json.py", line 166, in main
    json_string = formatter.serialize(obj)
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 279, in serialize
    return self.prefix_string + self.format_element(0, value).value
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 302, in format_element
    formatted_item = self.format_dict(depth, element)
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 369, in format_dict
    elem = self.format_element(depth + 1, v)
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 300, in format_element
    formatted_item = self.format_list(depth, element)
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 355, in format_list
    if self.format_table_list_list(item):
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 555, in format_table_list_list
    column_stats = self.get_list_stats(item)
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 980, in get_list_stats
    col_stats_list[index].update(child, index)
  File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 104, in update
    (whole, frac) = str(prop_node.value).split(".")
ValueError: not enough values to unpack (expected 2, got 1)

Feature request: add a more compact mode for dicts

Lists can be made compact with multi-line inline list, but dicts with many records would always be formatted one item per line.
While this makes the output clean and easy to edit, in some cases it makes the output very space-inefficient. I wonder if we can ask dicts to be multi-line inline as well in certain scenarios.

Examples of disired outputs (I manually formatted it to suit a inline width of 80):

{
  "duplicate_info": {
    "1": {
      "id": "439", "sangvis_hiringteam_id": "1001", "challenge_type": "1",
      "enemy_team_id": "940001",
      "gun_in_allys": "3001,3002,3003,3004,3005,3506,3510,3513,3517,3531,3539,3566,3578,3586,3601,3605,3611,3647,3649,3660,3664,3677,3678,3702,3705,3713,3723,3735,3742",
      "pass_time": "9", "award_item": "2:400001:10:10000", "item_cost": "603:3",
      "is_night": "0", "map_type": "4", "map_code": "forest",
      "default_team": "3001,3002,3003,3004,3005", "start_time": 1673193600,
      "end_time": 1673798399
    }
  },
  "req": {
    "spot_id": 1633, "if_enemy_die": true, "current_time": 1673625806,
    "boss_hp": 0, "mvp": 210119981, "last_battle_info": "",
    "use_skill_squads": [], "use_skill_ally_spots": [],
    "guns": [
      { "id": 210119981, "life": 97 }, { "id": 209408291, "life": 58 },
      { "id": 210084671, "life": 106 }, { "id": 209411557, "life": 45 },
      { "id": 207888310, "life": 132 }
    ],
    "user_rec": "{\\\"seed\\\":9082,\\\"record\\\":[\\\"0,0,2,1\\\"]}",
    "1000": {
      "10": 0, "11": 0, "12": 0, "13": 0, "15": 0, "16": 0, "17": 0, "33": 0,
      "40": 0, "18": 0, "19": 0, "20": 0, "21": 0, "22": 0, "23": 0, "24": 0,
      "25": 0, "26": 0, "27": 0, "34": 0, "35": 0, "41": 0, "42": 0, "43": 0,
      "44": 0
    }, 
    "1001": {}, "1002": {}, "1003": {}, "1005": {}, "1007": {}, "1008": {},
    "1009": {}, "battle_damage": {}
  }
}

Cannot format non-ascii strings without unicode escape

compact json uses json.dumps(element) without any configuration

compact-json/src/compact_json/formatter.py

Line 270 in 3b96495

simple_node.value = json.dumps(element)

so that there is no way to set ensure_ascii=False and the output will always be with unicode escaped characters, like "\u5f20\u4e09" instead of "张三"

I think there should be a option to control this behaviour to make the output more human friendly.

The length computation of the last line in a multi-line is wrong

compact-json/src/compact_json/formatter.py

Line 431 in 234356f

item_length + len(self.padded_comma_str) if not_last_item else 0

the current code results in a "tail" at the last line as the length of the last element is set to 0:

I think this line should be changed to

item_length + len(self.padded_comma_str)

which take into account the comma length no matter whether there will be a comma after the item. This could prevent such tail from happening when the max length is exactly 1 character less than the length of k elements with trailing comma. For example (I edited the bugged line to item_length + (len(self.padded_comma_str) if not_last_item else 0) before running the following code):

Table list shadowed by multiline compact list

from compact_json import Formatter

formatter = Formatter(max_compact_list_complexity=2)
def get(i):
    return [
        {f'key{i}':f'v{i}' for i in range(i)},
        {f'key{i}':f'va{i}' for i in range(i)},
        {f'key{i}':f'val{i}' for i in range(i)},
        {f'key{i}':f'valu{i}' for i in range(i)},
        {f'key{i}':f'value{i}' for i in range(i)},
    ]
print(formatter.serialize(get(2)))
print(formatter.serialize(get(3)))

results in

[
    {"key0": "v0", "key1": "v1"}, {"key0": "va0", "key1": "va1"}, 
    {"key0": "val0", "key1": "val1"}, {"key0": "valu0", "key1": "valu1"}, 
    {"key0": "value0", "key1": "value1"}
]
[
    {"key0": "v0", "key1": "v1", "key2": "v2"}, 
    {"key0": "va0", "key1": "va1", "key2": "va2"}, 
    {"key0": "val0", "key1": "val1", "key2": "val2"}, 
    {"key0": "valu0", "key1": "valu1", "key2": "valu2"}, 
    {"key0": "value0", "key1": "value1", "key2": "value2"}
]

Notice that the second list is not formatted as a table as expected. That's because it is actually a multiline-compact list (although it is not "compact" at all as every element takes a whole line)
This cannot be fixed by simply reordering format_list_multiline_compact and format_table_list_dict since this would change the first list to table as well (while it can be more compact). to fix this we must check whether a compact list is actually "compact"

feature request: provide a short option argument e.g. --max-inline-length --indent

so this:

$ compact-json tests/data/test-11.json --indent 2 --max-inline-length 88
{
  "ObjectColumnsObjectRows": {
    "Vera": {"street": "12 Madeup St.", "city": "Boston", "state": "MA", "zip": "02127"},
    "Chuck": {
      "street": "994 Fake Place",
      "unit": "102",
      "city": "Las Vegas",
      "state": "NV",
      "zip": "89102"
    },
    "Dave": {"street": "1967 Void Rd.", "unit": "B", "city": "Athens", "state": "GA"}
  }, ...

could be replaced by:

$ compact-json tests/data/test-11.json -i 2 -l 88

Weird type conversion when formatting table

running the following code

from compact_json import Formatter

data = [{'a': '0', 'b': '1', 'c':'0'}, {'a': '0', 'b': 1}]

formatter = Formatter(always_expand_depth=0)
print(formatter.serialize(data))
formatter.table_dict_minimum_similarity=100
print(formatter.serialize(data))

results in the following output

$ python test.py
C:\Users\ZeroRin\Documents\Projects\compact-json\src\compact_json\formatter.py:68: RuntimeWarning: handled quantize error (please report an issue)
  warnings.warn(
[
    { "a": "0", "b": 0, "c": "0" },
    { "a": "0", "b": 1           }
]
[
    {"a": "0", "b": "1", "c": "0"},
    {"a": "0", "b": 1}
]

A warning is raised during the first formatting and the output somehow changed the value of data[0]['b'] from string "1" to number 0. During the second run where the output is not formatted as table, the output is correct

Add main.py to support `python -m compact_json`?

python -m compact_json ... seems to me as a more pythonic way of using a module as a cmdline tool. This could be done by adding a __main__.py with from ._compact_json import main; main() or directly rename _compact_json.py to __main__.py

Not sure how this would affect building the executable so I leave a issue here instead of direct PR.

Str enums not encoded correctly

When string enums are used as keys, they are not encoded correctly.

from enum import Enum
from compact_json import Formatter

class MyEnum(str, Enum):
    abc = 'Abc'
    def = 'Def'

formatter = Formatter()
formatter.serialize({ MyEnum.abc: 'abc' })
# Output:  {"MyEnum.abc": "abc"}
# Expected: {"Abc": "abc"}