Is your feature request related to a problem? Please describe.
There is no problem involved with this issue, its only a codebase suggestion.
Additional context
Maintaining data parsing and validation is kind of painful and not so fun, using a tool like Pydantic might save you some energy (for example you woudln't have to write complete from_json
but only provide a way to parse specific parts of it when you have unusual rules).
First step
A first step could simply be to use Pydantic's @dataclass
instead of the default one without using any of the data validation.
Pros/cons
Pros:
- simpler parsing code -> no code in most cases (easier to maintain);
- more complete (dynamic) type checking;
- more standard way of doing type validation (which again make it easier to maintain);
- can easily generate openAPI schema, which can help propagate types to frontend;
- integrates nicely with your linter;
- benefit from the great work of others, Pydantic is quite fast (V2 was just released and promise great performance improvements)
Cons:
- you become dependant on Pydantic (but its quite widely used mainly thanks to FastAPI);
- transition is not that easy (requires strong tests to make sure nothing breaks, but regression testing based on output types should suffice).
Airport example
As an example, take the Airport
data class that contain a nested City
data class to compare the Pydantic implementation with the current one.
Current implementation
Here is the code you have now (its fully working as is this is why I kept the get_and_transform
function in here):
from typing import Optional
from dataclasses import dataclass
@dataclass
class City:
id: str
name: str
iata_code: str
iata_country_code: str
@classmethod
def from_json(cls, json: dict):
return cls(
id=json["id"],
name=json["name"],
iata_code=json["iata_code"],
iata_country_code=json["iata_country_code"],
)
@dataclass
class Airport:
id: str
name: str
iata_code: Optional[str]
icao_code: Optional[str]
iata_country_code: str
latitude: float
longitude: float
time_zone: str
city: Optional[City]
@classmethod
def from_json(cls, json: dict):
return cls(
id=json["id"],
name=json["name"],
iata_code=json.get("iata_code"),
icao_code=json.get("icao_code"),
iata_country_code=json["iata_country_code"],
latitude=json["latitude"],
longitude=json["longitude"],
time_zone=json["time_zone"],
city=get_and_transform(json, "city", City.from_json),
)
def get_and_transform(dict: dict, key: str, fn, default=None):
try:
value = dict[key]
if value is None:
return value
else:
return fn(value)
except KeyError:
return default
And here is how it is called:
>>> Airport.from_json(airport_json)
Airport(id='arp_swf_us', name='New York Stewart International Airport', iata_code='SWF', icao_code='KSWF', iata_country_code='US', latitude=41.501292, longitude=-74.102724, time_zone='America/New_York', city=City(id='cit_nyc_us', name='New York', iata_code='NYC', iata_country_code='US'))
Pydantic version
from pydantic import BaseModel
class PydanticCity(BaseModel):
id: str
name: str
iata_code: str
iata_country_code: str
class PydanticAirport(BaseModel):
id: str
name: str
iata_code: Optional[str]
icao_code: Optional[str]
iata_country_code: str
latitude: float
longitude: float
time_zone: str
city: Optional[City]
And here is how it would called (using the BaseModel.model_validate
method):
>>> PydanticAirport.model_validate(airport_json)
PydanticAirport(id='arp_swf_us', name='New York Stewart International Airport', iata_code='SWF', icao_code='KSWF', iata_country_code='US', latitude=41.501292, longitude=-74.102724, time_zone='America/New_York', city=City(id='cit_nyc_us', name='New York', iata_code='NYC', iata_country_code='US'))
Stats for the geeks
The performances of the two validations are as follows:
>>> %timeit Airport.from_json(airport_json)
861 ns ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit PydanticAirport.model_validate(airport_json)
1.79 µs ± 16.5 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Note that Pydantic performs a complete validation (each field is type checked) whereas your current code only parse the input data. The comparaison is not made to be fair, I just wanted to highlight that there isn't a huge performance difference between the two (Pydantic is basically twice as slow as you current implementation).
Type errors
>>> airport_json["iata_country_code"] = 12
>>> Airport.from_json(airport_json)
# no error
>>> PydanticAirport.model_validate(airport_json)
ValidationError: 1 validation error for PydanticAirport
iata_country_code
Input should be a valid string [type=string_type, input_value=12, input_type=int]
For further information visit https://errors.pydantic.dev/2.0.3/v/string_type
Pydantic dataclasses
from typing import Optional
from pydantic.dataclasses import dataclass
@dataclass
class PydanticCity:
id: str
name: str
iata_code: str
iata_country_code: str
@dataclass
class PydanticAirport:
id: str
name: str
iata_code: Optional[str]
icao_code: Optional[str]
iata_country_code: str
latitude: float
longitude: float
time_zone: str
city: Optional[City]
From the Pydantic's dataclasses documentation:
Keep in mind that pydantic.dataclasses.dataclass is not a replacement for pydantic.BaseModel. pydantic.dataclasses.dataclass provides a similar functionality to dataclasses.dataclass with the addition of Pydantic validation. There are cases where subclassing pydantic.BaseModel is the better choice.
For more information and discussion see pydantic/pydantic#710.
Disclaimer
I'm not a maintainer of Pydantic, nor I have any sort of participation in it (I think I've never even raised an issue there). I just like it.