Python Memory Usage
Contents
The Python language has a feature called Generator Expressions
which were
introduced with PEP 289. You can think of them as a better way of doing certain
operations involving lists. This post is more interested in the memory benefits
the feature provides. We will first introduce the memory_profiler
tool which
can be used to measure the memory usage of a python program. We will then
compare two different pieces of code (one with and the other without generator
expressions) which perform the same operation, explaining why one is more
superior than the other. Finally, we will run a few experiments to demonstrate
and prove our assertion.
When we talk about measuring the memory usage of python code, we are usually
interested in determining memory expense on a line-by-line basis. The
memory_profiler
tool ( ref: https://github.com/pythonprofilers/memory_profiler )
is ideal for this purpose. Here is example code of how it can be used,
|
|
Executing the above will return the following output,
|
|
The above clearly shows that one assignment statement is more expensive than the other as well as indicating that the delete operation actually recovers some memory back.
What exactly is a Python generator expression? According to PEP 289, they are a high performance, memory efficient generalization of list comprehensions. That definition is a bit much for my tastes, so let’s just jump into an example;
|
|
Pay attention to the assingments to PP
and NN
. We calculate the sum of
squares of all numbers upto a limit for both of them, but the implementation
is a bit different. In the latter case, a temporary list is created which holds
all the squares we need. The sum is calculated over this list. But with the
former situation, no such temporary list is created. The sum gets incremented
during each iteration of the for-loop. It feels very intuitive that one method
will use more memory than the other. Executing the code confirms our
hypothesis,
|
|
A most interesting things happens however, if were were to increase our limit by a factor of 10;
|
|
Very strangely, the assignment to b
recovers memory from the system! This
troubled me a lot - it didn’t make sense that a random assignment statement
should recover memory from our running application. Initially, I began to
suspect that memory_profiler
was flawed - that investigation led me down a
very deep rabbit hole which I may write about another time. But, for the
purposes of this post, I do have an explanation for the above behaviour - the
Python Garbage Collector! With a limit of 100000
, the temporary list kept
triggering garbage collection and memory_profiler
dutifully reports the
system state as such.
All-in-all, I’m satisfied with how this analysis turned out - the fact that generator expressions do save memory and that it’s possible to prove the fact!