Today we take a brief tour into how to write a custom JSON encoder in Python.

We use pydantic dataclasses to set up our objects. It’s very common to use them as a protocol for communicating with APIs (typically with FastAPI).

Let’s create our dataclasses first. We’ll include enum and require a dataclass in one of the variables of another dataclass.

import dataclasses
import enum
import json
import pydantic

from typing import Any, Dict, List, Optional

class UserType(enum.Enum):
    FREE = '1'
    PAID = '2'
    TRIAL = '3'

@pydantic.dataclasses.dataclass
class Base:
    """
    Use this for JSON encoder to recognise 
    child classes for encoding.
    """
    pass

@pydantic.dataclasses.dataclass
class UserContext(Base):
    loc: str
    browser: str

@pydantic.dataclasses.dataclass
class UserRequest(Base):
    queries: List[str]
    product_category_id: Optional[str]
    user_id: str
    request_params: Dict[str, Any]
    user_context: UserContext
    user_type: UserType

    
req = UserRequest(
    queries=['q1', 'q2'],
    product_category_id=None,
    user_id='abc123',
    request_params={'x': 1, 'y': 2},
    user_context=UserContext(loc='US', browser='chrome'),
    user_type=UserType.FREE,
)

Alright, we have a dataclass now let’s serialise it!

data = json.dumps(req, indent=4)
print(data)
> TypeError: Object of type UserRequest is not JSON serializable

Well, it turns out that the normal JSON encoder doesn’t understand dataclasses.

Not a problem, a solution is provided by the pydantic documentation. We just need to use the pydantic_encoder like below:

data = json.dumps(req, indent=4, default=pydantic.json.pydantic_encoder)
print(data)
{
    "queries": [
        "q1",
        "q2"
    ],
    "product_category_id": null,
    "user_id": "abc123",
    "request_params": {
        "x": 1,
        "y": 2
    },
    "user_context": {
        "loc": "US",
        "browser": "chrome"
    },
    "user_type": "1"
}

Great! Our problems are solved!

Except… notice that the enum class encoding uses the value by default. What if you wanted to use the name? Or make other custom changes?

The answer lies in the documentation: subclass from json.JSONEncoder and then make the change. Inside the subclass, we need to adjust the default method.

Let’s have a look at what that might look like.

class CustomEncoder(json.JSONEncoder):
    item_separator = ","
    key_separator = ":"
    
    def default(self, obj) -> dict:
        # [1] Custom method to encode dataclasses
        if isinstance(obj, Base):
            return self._extract_fields_to_json(obj)
        # [2] Custom method to encode enums.
        elif isinstance(obj, enum.Enum):
            return obj.name
        # [3] Finally we default to the parent class method.
        else:
            return super().default(obj)
    
    def _field_name_extractor(self, field) -> str:
        return field.name

    def _is_field_included(self, obj: Base, field_name: Any) -> bool:
        return True if getattr(obj, field_name) is not None else False

    def _extract_fields_to_json(self, obj: Base) -> Dict:
        return dict(
            [
                (self._field_name_extractor(field), getattr(obj, field.name))
                    for field in dataclasses.fields(obj)
                        if self._is_field_included(obj, field.name)
            ]
        )

At [1] we define a custom method to extract the fields from the dataclasses that are a subclass of Base.

Similarly, at [2] we define the custom method to extract the enum name instead of the value.

Else we default to the parent classes’ method.

data = json.dumps(req, indent=4, cls=CustomEncoder)
print(data)
{
    "queries":[
        "q1",
        "q2"
    ],
    "user_id":"abc123",
    "request_params":{
        "x":1,
        "y":2
    },
    "user_context":{
        "loc":"US",
        "browser":"chrome"
    },
    "user_type":"FREE"
}

Viola! We now have FREE instead of 1.

And that’s how we write a custom JSON encoder in Python!