Generators in Python (61/100 Days of Python)
Python’s generators are a powerful and often underutilized feature that can greatly simplify code, reduce memory usage, and improve performance. In this tutorial, we will explore what generators are, how they work, and some real-world examples of when they might be useful.
What are Generators in Python?
In Python, generators are a type of iterable, similar to lists or tuples. However, unlike lists and tuples, generators do not store all of their values in memory at once. Instead, they generate values on-the-fly as they are requested, which makes them very memory-efficient.
Generators are defined using the yield
keyword instead of return
. When a function containing yield
is called, it returns a generator object that can be used to iterate over the generated values. Each time the next()
function is called on the generator object, the function continues executing from where it left off, yielding the next value in the sequence:
def countdown(n):
while n > 0:
yield n
n -= 1
This generator function will count down from a given number to 1, yielding each value as it goes:
for i in countdown(5):
print(i)
# 5
# 4
# 3
# 2
# 1
In this example, countdown(5)
returns a generator object, which we then use in a for
loop to print each value as it is generated. Notice how the countdown()
function does not generate all the values at once; instead, it generates them one at a time as they are requested by the loop.
Processing Large Datasets
Generators are particularly useful when working with large datasets that cannot fit into memory all at once. In such cases, a generator can be used to generate each item in the dataset one at a time, allowing us to process the data without loading it all into memory at once.
For example, let’s say we have a large CSV file containing a million rows of data. We want to read each row of data, process it, and output some results. Instead of loading the entire file into memory at once, we can use a generator to read each row one at a time:
import csv
def process_data(filename):
with open(filename) as f:
reader = csv.reader(f)
next(reader) # skip header row
for row in reader:
# process row here
yield processed_row
In this example, we use the csv
module to read in each row of the CSV file one at a time. We then process each row and yield the result. This allows us to process the entire file without loading it all into memory at once.
Generating Infinite Sequences
Generators can also be used to generate infinite sequences of values. Because generators only generate values on-the-fly as they are requested, we can generate sequences that are too large to store in memory all at once.
For example, let’s say we want to generate an infinite sequence of Fibonacci numbers:
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
In this example, the fibonacci()
generator generates an infinite sequence of Fibonacci numbers by continuously yielding the next number in the sequence.
Now we can generate an infinite sequence of Fibonacci numbers without having any additional overhead:
fib = fibonacci()
print(next(fib)) # 0
print(next(fib)) # 1
print(next(fib)) # 2
print(next(fib)) # 3
print(next(fib)) # 5
print(next(fib)) # 8
Stream Processing
Generators can also be used for stream processing, where we process a continuous stream of data in real time. For example, let’s say we want to process a continuous stream of sensor data from a temperature sensor:
import random
def temperature_sensor():
"""
This function generates a sequence of numbers
In real life it would be obtained from a sensor
"""
while True:
yield random.uniform(0, 100)
In this example, the temperature_sensor()
generator generates an infinite stream of temperature readings from a sensor. We can then process these readings in real-time using another generator:
def temperature_monitor(sensor):
for temperature in sensor:
if temperature > 90:
yield 'WARNING: Temperature is too high!'
else:
yield 'Temperature is normal.'
In this example, the temperature_monitor()
generator processes the temperature readings from the temperature_sensor()
generator and yields a warning message if the temperature is too high.
Lazily Loading Data
Generators can also be used to lazily load data, where we load data on-the-fly as it is requested instead of loading it all into memory at once. This can be useful when working with large datasets or when we only need to load a portion of the data.
Let’s say we have a large text file and we want to search for a specific line:
def search_line(filename, keyword):
with open(filename) as file:
for line in file:
if keyword in line:
yield line
In this example, the search_line()
generator reads in each line of the text file one at a time and yields the line if it contains the given keyword. This allows us to search for the line without loading the entire file into memory at once.
How to Chain Generators
As we saw earlier, we can use the results of one generator in another generator. We can also chain generators to first use the results of the first one and then move to the second generator. To chain generators, we simply need to call one generator from within another generator using the yield from
statement. The yield from
statement allows us to delegate to another generator, which then yields values back to the calling generator:
def first_generator():
yield 1
yield 2
yield 3
def second_generator():
yield 4
yield 5
yield 6
def chained_generator():
yield from first_generator()
yield from second_generator()
for value in chained_generator():
print(value)
In this example, we define two generators first_generator()
and second_generator()
, each of which yields three values. We then define a new generator chained_generator()
which calls yield from
on both first_generator()
and second_generator()
. When we call chained_generator()
in a for
loop, it yields all the values from first_generator()
followed by all the values from second_generator()
. So, the program would output:
1
2
3
4
5
6
Filtering Data
We can chain generators together to filter data as it is generated. For example, let’s say we have a generator that yields a sequence of numbers, and we want to filter out all the even numbers:
def number_generator():
for i in range(1, 11):
yield i
def even_filter(generator):
for value in generator:
if value % 2 == 0:
yield value
for value in even_filter(number_generator()):
print(value)
In this example, we define a generator number_generator()
that yields a sequence of numbers from 1 to 10. We then define a new generator even_filter()
that filters out all the even numbers by checking if each value is divisible by 2. When we call even_filter(number_generator())
in a for
loop, it yields only the even numbers from the number_generator()
.
What’s next?
- If you found this story valuable, please consider clapping multiple times (this really helps a lot!)
- Hands-on Practice: Free Python Course
- Full series: 100 Days of Python
- Previous topic: Iterators
- Next topic: Iterables