masaccio / compact-json Goto Github PK
View Code? Open in Web Editor NEWA JSON formatter that produces compact but human-readable
License: MIT License
A JSON formatter that produces compact but human-readable
License: MIT License
I've just copied this from the README:
from compact_json import Formatter, EolStyle
formatter = Formatter()
formatter.indent_spaces = 2
formatter.max_inline_complexity = 10
formatter.json_eol_style = EolStyle.LF
# Format the JSON contents:
with open("input.json", "r") as f:
obj = json.load(f)
json_string = formatter.serialize(obj)
print(json_string)
# Create a new JSON file with formatted JSON contents:
with open("input.json", "r") as f:
obj = json.load(f)
formatter.dump(obj, output_file="output.json", newline_at_eof=True)
Minor details is that it should import json
, but even when I do that, I get
AttributeError: 'Formatter' object has no attribute 'dump'
"Windows-style CRLR" should be "Windows-style CRLF".
Thanks for compact-json, just what I needed!
Hi authors, thank you for writing this awesome tool!
JSON readability is very important but I think it's overlooked by the standard library.
May I request the following features to make this library easier to use?
dump
method to directly dump the re-formatted JSON string to a JSON file?compact-json ...
instead of _compact_json.py ...
?Additional notes for Request No. 4 above:
I see that this package uses poetry
. I'm not so familiar with poetry
so I asked ChatGPT how to register an entry point with poetry
. Below is my Q&A with ChatGPT:
How do I register a command line entry point for a Python package when I'm using poetry?
To register a command line entry point for a Python package when using Poetry, you can use the console_scripts
section in your pyproject.toml
file.
Here's an example pyproject.toml
file:
[tool.poetry]
name = "mypackage"
version = "0.1.0"
description = "My Python package"
authors = ["Your Name <[email protected]>"]
[tool.poetry.dependencies]
python = "^3.9"
[tool.poetry.scripts]
mypackage-cli = "mypackage.cli:main"
[build-system]
requires = ["poetry-core>=1.1.0"]
build-backend = "poetry.core.masonry.api"
In this example, we have specified a script called mypackage-cli
which points to a function called main
in a module called cli
inside a package called mypackage
.
To use the command line entry point, you can install the package using poetry install
and then run the command mypackage-cli
in your terminal.
poetry install
mypackage-cli
This will execute the main
function defined in your cli
module. You can replace mypackage-cli
with any name you want for your command line entry point.
I found this option in the wiki, but it doesn't appear to be implemented in the Python version.
First off: thanks for the very useful tool! I am really happy to have found it, it fills a gap in the standard library that I think many people have experienced.
The issue I encountered: in a JSON object, names can only be strings (see 2.2 in the RFC). So if you do:
from compact_json import Formatter
Formatter().serialize({100: 'foobar'})
you should get either {"100": "foo bar"}
(this is what the standard json
library does), or perhaps raise an exception (this is what the cjson
library apparently does). But with compact-json
, this gives {100: "foo bar"}
, which is invalid JSON. Trying to decode it with
import json
json.loads('{100: "foo bar"}')
raises a json.decoder.JSONDecodeError
: "Expecting property name enclosed in double quotes: line 1 column 2 (char 1)".
I think raising an exception is the clearest option here, since silently stringifying the key could lead to unexpected behavior. An optional stringify_names
argument could be nice though.
Side note:, I think the json
library is actually not handling this case correctly: json.dumps({100: "foo", "100": "bar"}
gives {"100": "foo", "100": "bar"}
which is invalid JSON since names should be unique.
compact-json
CLI tool doesn't support all possible format options.
It significantly limits its usage.
Is it possible to expose all the format options in the CLI?
Alternatively, a general format parameter can be exposed, e.g. compact-json --format comma_padding=false in.json
Need to do from typing import Dict
and use Dict[str]
for older pythons
Given the input
[ "seq", [ "if", [ "=", "i", [ "prop", "length", "l" ] ], [ "return", "l" ] ] ]
compact-json
produces:
[
"seq",
[
[ ],
[ "=" , "i", ["prop", "length", "l"] ],
[ "return", "l" ]
]
]
where the string "if"
has been replaced with an empty list.
to have file as arguments like:
$ compact-json tests/data/test-11.json --indent 2 --max-inline-length 88
{
"ObjectColumnsObjectRows": {
"Vera": {"street": "12 Madeup St.", "city": "Boston", "state": "MA", "zip": "02127"},
"Chuck": {
"street": "994 Fake Place",
"unit": "102",
"city": "Las Vegas",
"state": "NV",
"zip": "89102"
},
"Dave": {"street": "1967 Void Rd.", "unit": "B", "city": "Athens", "state": "GA"}
},
"ArrayColumnsObjectRows": [
{ "type": "turret", "hp": 400, "loc": {"x": 47, "y": -4}, "flags": ["stationary"] },
{ "type": "assassin", "hp": 80, "loc": {"x": 102, "y": 6}, "flags": ["stealth"] },
{ "type": "berserker", "hp": 150, "loc": {"x": 0, "y": 0} },
{
"type": "pittrap",
"loc": {"x": 10, "y": -14},
"flags": ["invulnerable", "stationary"]
}
],
"DissimilarObjectRows": {
"lamp": {"x": 4, "y": 1002, "r": 75, "g": 0, "b": 130, "state": 1},
"address": {"city": "San Diego", "state": "CA"},
"font": {"r": 0, "g": 12, "b": 40, "size": 18, "style": "italic"}
}
}
$
and stdin this:
$ echo tests/data/test-11.json | compact-json - --indent 2 --max-inline-length 88
...
Encountered this when I'm trying to build a more compact formatter
Here is a simple way to reproduce the warning
formatter = Formatter(max_compact_list_complexity=999)
data = [
{f'key{i}':f'value{i}' for i in range(2)},
{f'key{i}':f'value{i}' for i in range(5)},
{f'key{i}':f'value{i}' for i in range(2)},
{f'key{i}':f'value{i}' for i in range(2)},
]
print(formatter.serialize(data))
with unlimited complexity it tries to format the dict as multi-line compact, and as there is a very long object that is not inline, the warning is triggered.
[
{"key0": "value0", "key1": "value1"},
{
"key0": "value0",
"key1": "value1",
"key2": "value2",
"key3": "value3",
"key4": "value4"
},
{"key0": "value0", "key1": "value1"},
{"key0": "value0", "key1": "value1"}
]
C:\Users\ZeroRin\Documents\Projects\compact-json\src\compact_json\formatter.py:414: RuntimeWarning: list elements not inline (please report an issue)
C:\Users\ZeroRin\Documents\Projects\compact-json\src\compact_json\formatter.py:462: RuntimeWarning: list elements not inline (please report an issue)
self.indent(buffer, item.depth + 1)
I'd suggest allowing list tobe multiline compact even with multiline elements in it, so that list with many small element but a few large element would not take much space. Here is an example output with the same setup as above:
[
{"key0": "value0", "key1": "value1"},
{
"key0": "value0",
"key1": "value1",
"key2": "value2",
"key3": "value3",
"key4": "value4"
},
{"key0": "value0", "key1": "value1"}, {"key0": "value0", "key1": "value1"}
]
This could be done by modifing the newline condition so that any multiline elements starts from a newline and ends with a newline:
if (
line_length_so_far + segment_length > self.max_inline_length
and line_length_so_far > 0
or child_index != 0
and (
item.children[child_index].format != Format.INLINE
or item.children[child_index - 1].format != Format.INLINE
)
):
I'm trying to make that dict in the middle also multiline-compact so that hopefully the final version would look better than the current showcase
I guess that something script_runner
has not been commited ?
(doc) luis@spinoza:~/proj/g/alien/compact-json$ pytest
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.10.4, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/luis/proj/g/alien/compact-json, configfile: pyproject.toml
plugins: cov-4.0.0
collected 10 items
tests/test_console_script.py EEEEEE [ 60%]
tests/test_issues.py ... [ 90%]
tests/test_json.py .
_______________________________________________________________________________________ ERROR at setup of test_main ________________________________________________________________________________________
file /home/luis/proj/g/alien/compact-json/tests/test_console_script.py, line 111
@pytest.mark.script_launch_mode("subprocess")
def test_main(script_runner):
E fixture 'script_runner' not found
> available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, cov, doctest_namespace, monkeypatch, no_cover, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
I encountered a bug while trying to use it.
This is a minimum JSON file that can reproduce the issue:
{
"key0": [
[
"subkey0",
1e-05
],
[
"subkey1",
2e-05
],
[
"subkey2",
3e-05
]
]
}
I'm using the default options, just this: compact-json input.json
The call stack is:
File "/Users/jsh9/github/compact-json/src/compact_json/_compact_json.py", line 166, in main
json_string = formatter.serialize(obj)
File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 279, in serialize
return self.prefix_string + self.format_element(0, value).value
File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 302, in format_element
formatted_item = self.format_dict(depth, element)
File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 369, in format_dict
elem = self.format_element(depth + 1, v)
File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 300, in format_element
formatted_item = self.format_list(depth, element)
File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 355, in format_list
if self.format_table_list_list(item):
File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 555, in format_table_list_list
column_stats = self.get_list_stats(item)
File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 980, in get_list_stats
col_stats_list[index].update(child, index)
File "/Users/jsh9/github/compact-json/src/compact_json/formatter.py", line 104, in update
(whole, frac) = str(prop_node.value).split(".")
ValueError: not enough values to unpack (expected 2, got 1)
Lists can be made compact with multi-line inline list, but dicts with many records would always be formatted one item per line.
While this makes the output clean and easy to edit, in some cases it makes the output very space-inefficient. I wonder if we can ask dicts to be multi-line inline as well in certain scenarios.
Examples of disired outputs (I manually formatted it to suit a inline width of 80):
{
"duplicate_info": {
"1": {
"id": "439", "sangvis_hiringteam_id": "1001", "challenge_type": "1",
"enemy_team_id": "940001",
"gun_in_allys": "3001,3002,3003,3004,3005,3506,3510,3513,3517,3531,3539,3566,3578,3586,3601,3605,3611,3647,3649,3660,3664,3677,3678,3702,3705,3713,3723,3735,3742",
"pass_time": "9", "award_item": "2:400001:10:10000", "item_cost": "603:3",
"is_night": "0", "map_type": "4", "map_code": "forest",
"default_team": "3001,3002,3003,3004,3005", "start_time": 1673193600,
"end_time": 1673798399
}
},
"req": {
"spot_id": 1633, "if_enemy_die": true, "current_time": 1673625806,
"boss_hp": 0, "mvp": 210119981, "last_battle_info": "",
"use_skill_squads": [], "use_skill_ally_spots": [],
"guns": [
{ "id": 210119981, "life": 97 }, { "id": 209408291, "life": 58 },
{ "id": 210084671, "life": 106 }, { "id": 209411557, "life": 45 },
{ "id": 207888310, "life": 132 }
],
"user_rec": "{\\\"seed\\\":9082,\\\"record\\\":[\\\"0,0,2,1\\\"]}",
"1000": {
"10": 0, "11": 0, "12": 0, "13": 0, "15": 0, "16": 0, "17": 0, "33": 0,
"40": 0, "18": 0, "19": 0, "20": 0, "21": 0, "22": 0, "23": 0, "24": 0,
"25": 0, "26": 0, "27": 0, "34": 0, "35": 0, "41": 0, "42": 0, "43": 0,
"44": 0
},
"1001": {}, "1002": {}, "1003": {}, "1005": {}, "1007": {}, "1008": {},
"1009": {}, "battle_damage": {}
}
}
compact json uses json.dumps(element)
without any configuration
compact-json/src/compact_json/formatter.py
Line 270 in 3b96495
ensure_ascii=False
and the output will always be with unicode escaped characters, like "\u5f20\u4e09"
instead of "张三"
I think there should be a option to control this behaviour to make the output more human friendly.
compact-json/src/compact_json/formatter.py
Line 431 in 234356f
I think this line should be changed to
item_length + len(self.padded_comma_str)
which take into account the comma length no matter whether there will be a comma after the item. This could prevent such tail from happening when the max length is exactly 1 character less than the length of k elements with trailing comma. For example (I edited the bugged line to item_length + (len(self.padded_comma_str) if not_last_item else 0)
before running the following code):
from compact_json import Formatter
formatter = Formatter(max_compact_list_complexity=2)
def get(i):
return [
{f'key{i}':f'v{i}' for i in range(i)},
{f'key{i}':f'va{i}' for i in range(i)},
{f'key{i}':f'val{i}' for i in range(i)},
{f'key{i}':f'valu{i}' for i in range(i)},
{f'key{i}':f'value{i}' for i in range(i)},
]
print(formatter.serialize(get(2)))
print(formatter.serialize(get(3)))
results in
[
{"key0": "v0", "key1": "v1"}, {"key0": "va0", "key1": "va1"},
{"key0": "val0", "key1": "val1"}, {"key0": "valu0", "key1": "valu1"},
{"key0": "value0", "key1": "value1"}
]
[
{"key0": "v0", "key1": "v1", "key2": "v2"},
{"key0": "va0", "key1": "va1", "key2": "va2"},
{"key0": "val0", "key1": "val1", "key2": "val2"},
{"key0": "valu0", "key1": "valu1", "key2": "valu2"},
{"key0": "value0", "key1": "value1", "key2": "value2"}
]
Notice that the second list is not formatted as a table as expected. That's because it is actually a multiline-compact list (although it is not "compact" at all as every element takes a whole line)
This cannot be fixed by simply reordering format_list_multiline_compact
and format_table_list_dict
since this would change the first list to table as well (while it can be more compact). to fix this we must check whether a compact list is actually "compact"
$ compact-json tests/data/test-11.json --indent 2 --max-inline-length 88
{
"ObjectColumnsObjectRows": {
"Vera": {"street": "12 Madeup St.", "city": "Boston", "state": "MA", "zip": "02127"},
"Chuck": {
"street": "994 Fake Place",
"unit": "102",
"city": "Las Vegas",
"state": "NV",
"zip": "89102"
},
"Dave": {"street": "1967 Void Rd.", "unit": "B", "city": "Athens", "state": "GA"}
}, ...
$ compact-json tests/data/test-11.json -i 2 -l 88
running the following code
from compact_json import Formatter
data = [{'a': '0', 'b': '1', 'c':'0'}, {'a': '0', 'b': 1}]
formatter = Formatter(always_expand_depth=0)
print(formatter.serialize(data))
formatter.table_dict_minimum_similarity=100
print(formatter.serialize(data))
results in the following output
$ python test.py
C:\Users\ZeroRin\Documents\Projects\compact-json\src\compact_json\formatter.py:68: RuntimeWarning: handled quantize error (please report an issue)
warnings.warn(
[
{ "a": "0", "b": 0, "c": "0" },
{ "a": "0", "b": 1 }
]
[
{"a": "0", "b": "1", "c": "0"},
{"a": "0", "b": 1}
]
A warning is raised during the first formatting and the output somehow changed the value of data[0]['b']
from string "1"
to number 0
. During the second run where the output is not formatted as table, the output is correct
python -m compact_json ...
seems to me as a more pythonic way of using a module as a cmdline tool. This could be done by adding a __main__.py
with from ._compact_json import main; main()
or directly rename _compact_json.py
to __main__.py
Not sure how this would affect building the executable so I leave a issue here instead of direct PR.
When string enums are used as keys, they are not encoded correctly.
from enum import Enum
from compact_json import Formatter
class MyEnum(str, Enum):
abc = 'Abc'
def = 'Def'
formatter = Formatter()
formatter.serialize({ MyEnum.abc: 'abc' })
# Output: {"MyEnum.abc": "abc"}
# Expected: {"Abc": "abc"}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.