Double asterisk - ** is a very useful concept in Python.
Table of Contents
Introduction
The first time you encountered this concept is probably when you were learning about Pythonic syntax
def func(*args, **kwargs) -> None:
pass
What you might not realise is that we can pass in arguments as dictionaries using **.
This makes for some very interesting use cases.
Prototyping large functions
When I was in the early stages of writing model training pipelines, I noticed that I was often changing the parameters that I was specifying because of changing requirements and increasing complexity.
Changing my function signature each time was extremely annoying, so I decided to just use **kwargs
and specify my parameters as a dictionary.
from typing import Dict
# Keep this flexible.
def data_pipeline(**kwargs) -> None:
pass
if __name__ == "__main__":
args: Dict = {
'name': 'identity-map',
'step': 'train',
'features': ['user', 'age', 'gender'],
'save_path': './data'
}
data_pipeline(**args)
And only when I am fairly certain of the final structure of my code do I then specify the parameters. This saved me a lot of grief when I was prototyping.
def data_pipeline(name, step, features, save_path):
pass
Defining Configs
When you write code for production, you will most likely have a dev
and prod
environment. These environments have overlapping environment variables and different environment variables.
You can follow OOP design patterns and use a Factory
to create your config dictionaries but if you want portable configs you’ll need to write a separate Factory
function for each language.
I’ve found that the easiest way to do this is:
from typing import Any, Dict
def pipeline(name: str, user: str, aws_key: str, sql_key: str) -> None:
pass
configs: Dict[str, Any] = {
"common": {
"name": "linear regression model",
"user": "bob",
},
"dev": {
"aws_key": "dev key",
"redshift_key": "dev key",
},
"prod": {
"aws_key": "prod key",
"redshift_key": "prod key",
},
}
if __name__ == "__main__":
# Now you can change your configs
# based on the flavour provided.
flavour = 'prod'
pipeline(**{
**configs['common'],
**configs[flavour]
})
So you don’t have to duplicate your “common” variables and it allows you to be more flexible.
You can also write the configs as JSON
and make them portable.
That’s it! These two things have served me well.