Double asterisk - ** is a very useful concept in Python.


Table of Contents

Introduction

The first time you encountered this concept is probably when you were learning about Pythonic syntax

def func(*args, **kwargs) -> None:
    pass

What you might not realise is that we can pass in arguments as dictionaries using **.

This makes for some very interesting use cases.

Prototyping large functions

When I was in the early stages of writing model training pipelines, I noticed that I was often changing the parameters that I was specifying because of changing requirements and increasing complexity.

Changing my function signature each time was extremely annoying, so I decided to just use **kwargs and specify my parameters as a dictionary.

from typing import Dict

# Keep this flexible.
def data_pipeline(**kwargs) -> None:
    pass

if __name__ == "__main__":
    args: Dict = {
        'name': 'identity-map',
        'step': 'train',
        'features': ['user', 'age', 'gender'],
        'save_path': './data'
    }

    data_pipeline(**args)

And only when I am fairly certain of the final structure of my code do I then specify the parameters. This saved me a lot of grief when I was prototyping.

def data_pipeline(name, step, features, save_path):
    pass

Defining Configs

When you write code for production, you will most likely have a dev and prod environment. These environments have overlapping environment variables and different environment variables.

You can follow OOP design patterns and use a Factory to create your config dictionaries but if you want portable configs you’ll need to write a separate Factory function for each language.

I’ve found that the easiest way to do this is:

from typing import Any, Dict

def pipeline(name: str, user: str, aws_key: str, sql_key: str) -> None:
    pass

configs: Dict[str, Any] = {
    "common": {
        "name": "linear regression model",
        "user": "bob",
    },
    "dev": {
        "aws_key": "dev key",
        "redshift_key": "dev key",
    },
    "prod": {
        "aws_key": "prod key",
        "redshift_key": "prod key",
    },
}

if __name__ == "__main__":
    # Now you can change your configs
    # based on the flavour provided.
    flavour = 'prod'
    pipeline(**{
        **configs['common'],
        **configs[flavour]
    })

So you don’t have to duplicate your “common” variables and it allows you to be more flexible.

You can also write the configs as JSON and make them portable.

That’s it! These two things have served me well.