Today we take a brief tour into how to write a custom JSON encoder in Python.
We use pydantic dataclasses
to set up our objects. It’s very common to use them as a protocol for communicating with APIs (typically with FastAPI
).
Let’s create our dataclasses
first. We’ll include enum
and require a dataclass
in one of the variables of another dataclass
.
import dataclasses
import enum
import json
import pydantic
from typing import Any, Dict, List, Optional
class UserType(enum.Enum):
FREE = '1'
PAID = '2'
TRIAL = '3'
@pydantic.dataclasses.dataclass
class Base:
"""
Use this for JSON encoder to recognise
child classes for encoding.
"""
pass
@pydantic.dataclasses.dataclass
class UserContext(Base):
loc: str
browser: str
@pydantic.dataclasses.dataclass
class UserRequest(Base):
queries: List[str]
product_category_id: Optional[str]
user_id: str
request_params: Dict[str, Any]
user_context: UserContext
user_type: UserType
req = UserRequest(
queries=['q1', 'q2'],
product_category_id=None,
user_id='abc123',
request_params={'x': 1, 'y': 2},
user_context=UserContext(loc='US', browser='chrome'),
user_type=UserType.FREE,
)
Alright, we have a dataclass
now let’s serialise it!
data = json.dumps(req, indent=4)
print(data)
> TypeError: Object of type UserRequest is not JSON serializable
Well, it turns out that the normal JSON encoder doesn’t understand dataclasses
.
Not a problem, a solution is provided by the pydantic
documentation. We just need to use the pydantic_encoder
like below:
data = json.dumps(req, indent=4, default=pydantic.json.pydantic_encoder)
print(data)
{
"queries": [
"q1",
"q2"
],
"product_category_id": null,
"user_id": "abc123",
"request_params": {
"x": 1,
"y": 2
},
"user_context": {
"loc": "US",
"browser": "chrome"
},
"user_type": "1"
}
Great! Our problems are solved!
Except… notice that the enum
class encoding uses the value by default. What if you wanted to use the name? Or make other custom changes?
The answer lies in the documentation: subclass from json.JSONEncoder
and then make the change. Inside the subclass, we need to adjust the default
method.
Let’s have a look at what that might look like.
class CustomEncoder(json.JSONEncoder):
item_separator = ","
key_separator = ":"
def default(self, obj) -> dict:
# [1] Custom method to encode dataclasses
if isinstance(obj, Base):
return self._extract_fields_to_json(obj)
# [2] Custom method to encode enums.
elif isinstance(obj, enum.Enum):
return obj.name
# [3] Finally we default to the parent class method.
else:
return super().default(obj)
def _field_name_extractor(self, field) -> str:
return field.name
def _is_field_included(self, obj: Base, field_name: Any) -> bool:
return True if getattr(obj, field_name) is not None else False
def _extract_fields_to_json(self, obj: Base) -> Dict:
return dict(
[
(self._field_name_extractor(field), getattr(obj, field.name))
for field in dataclasses.fields(obj)
if self._is_field_included(obj, field.name)
]
)
At [1] we define a custom method to extract the fields from the dataclasses
that are a subclass of Base
.
Similarly, at [2] we define the custom method to extract the enum
name instead of the value.
Else we default to the parent classes’ method.
data = json.dumps(req, indent=4, cls=CustomEncoder)
print(data)
{
"queries":[
"q1",
"q2"
],
"user_id":"abc123",
"request_params":{
"x":1,
"y":2
},
"user_context":{
"loc":"US",
"browser":"chrome"
},
"user_type":"FREE"
}
Viola! We now have FREE
instead of 1
.
And that’s how we write a custom JSON encoder in Python!