Data Classes in Python (52/100 Days of Python)

Martin Mirakyan
4 min readFeb 22, 2023

--

Day 52 of the “100 Days of Python” blog post series covering dataclasses in Python

Dataclasses in Python provide a simple and concise way to define classes with default attributes, methods, and properties. In this tutorial, we will cover what dataclasses are, how to define and use them in Python, and some of the benefits they provide.

What are Dataclasses in Python?

Dataclasses provide a way to create classes that are primarily used to store data. They are similar to traditional Python classes but are designed to be simpler and more concise.

In a traditional Python class, you need to define the __init__ method, which is responsible for initializing the class attributes. You also need to define methods to access and modify the class attributes.

In a dataclass, you don’t need to define the __init__ method or the methods to access and modify the attributes. Instead, you can define the attributes directly as class variables. This makes it easier to define and use classes that are primarily used to store data.

How to Define a Dataclass in Python

Defining a dataclass in Python is simple. You just need to use the dataclass decorator and specify the class attributes:

from dataclasses import dataclass


@dataclass
class Person:
name: str
age: int
email: str

In this example, we define a Person class with three attributes: name, age, and email. The attributes are defined using type annotations, which specify the type of the attribute.

The dataclass decorator automatically generates the __init__ method, which takes the attributes as arguments and initializes the class attributes. It also generates methods to access and modify the attributes.

Here’s how you can create an instance of the Person class:

person = Person('John Doe', 30, 'john.doe@gmail.com')

This creates an instance of the Person class with the specified attributes.

Using Dataclass Attributes

You can access and modify the attributes of a dataclass instance just like you would with any other Python object:

print(person.name)  # John Doe
person.age = 31 # Modify the age attribute
print(person.age) # 31

In this example, we access the name and age attributes of the person instance and modify the age attribute.

The __post_init__ function

The post_init function is a special method that you can define in a dataclass in Python to perform additional processing after the object has been initialized. It is called immediately after the __init__ method and takes no arguments:

from dataclasses import dataclass


@dataclass
class Person:
name: str
age: int
email: str

def __post_init__(self):
self.full_name = f'{self.name} ({self.email})'

So, instead of defining an __init__ function, we can have a dataclass with a simple __post__init method.

In this example, we define a Person dataclass with three attributes: name, age, and email. We also define a post_init method that sets the full_name attribute based on the name and email attributes.

The post_init method is useful for performing additional processing after the object has been initialized. For example, you can use it to validate the object's attributes or to set additional attributes that are derived from the original attributes:

person = Person('John Doe', 30, 'john.doe@gmail.com')
print(person.full_name) # John Doe (john.doe@gmail.com)

In this example, we create a Person instance with the name, age, and email attributes. The post_init method is automatically called after the __init__ method and sets the full_name attribute.

What If I Only Need A Variable When Initializing the Class but Not Later?

InitVar is a special variable type in Python's dataclasses that allows you to define attributes that are not part of the class's actual state, but are used in its initialization.

When you define a dataclass, you typically define its attributes as instance variables. However, there are some cases where you might need to define a value that is used in the initialization process, but isn't stored as an attribute of the class instance. For example, you might want to use a value that is calculated from one of the class's attributes, or you might want to pass in an additional argument that isn't stored as an attribute.

Here’s a simple example of using InitVar in a dataclass:

from dataclasses import dataclass, InitVar


@dataclass
class Rectangle:
width: InitVar[int] # We don't want to store width in the object
height: InitVar[int] # We don't want to store height in the object
color: str

def __post_init__(self, width: int, height: int):
# Create a new attribute called area and store it in the object
self.area: int = width * height

def draw(self) -> str:
return f'Draw a {self.color} rectangle with area {self.area}'


rect = Rectangle(width=10, height=20, color='red')
print(rect.draw()) # Draw a red rectangle with area 200
print(rect.width) # AttributeError: 'Rectangle' object has no attribute 'width'
print(rect.height) # AttributeError: 'Rectangle' object has no attribute 'height'

In this example, the width and height attributes are defined as InitVar, which means they won't be automatically added to the object's attributes. Instead, they are only used during initialization to calculate the area attribute.

The __post_init__ method takes width and height as parameters and calculates the area by multiplying width and height. The area attribute is then added to the object as a regular attribute.

Benefits of Using Dataclasses

Dataclasses provide several benefits over traditional Python classes:

  1. Concise syntax: Dataclasses provide a concise syntax for defining classes that are primarily used to store data. You don’t need to define the __init__ method or the methods to access and modify the attributes.
  2. Default values: You can specify default values for attributes, which makes it easier to create instances of the class.
  3. Comparison methods: Dataclasses automatically generate methods to compare instances of the class. This makes it easier to compare instances of the class based on their attributes.
  4. Immutable dataclasses: You can define dataclasses as immutable by using the frozen parameter. This makes the class attributes read-only, which can help prevent bugs.

What’s next?

--

--

Martin Mirakyan
Martin Mirakyan

Written by Martin Mirakyan

Software Engineer | Machine Learning | Founder of Profound Academy (https://profound.academy)

No responses yet