3

I am working on Ptera Software, an open-source aerodynamics solver. This is the first package I have distributed, and I'm having some issues related to memory management.

Specifically, importing my package takes up an absurd amount of memory. The last time I checked, it took around 136 MB of RAM. PyPI lists the package size as 118 MB, which also seems crazy high. For reference, NumPy is only 87 MB.

At first, I thought that maybe I had accidentally included some huge file in the package. So I downloaded every version's tar.gz files from PyPI and extracted them. None was over 1 MB unzipped.

This leads me to believe that there's something wrong with how I am importing my requirements. My REQUIREMENTS.txt file looks like this:

matplotlib >= 3.2.2, < 4.0.0
numpy >= 1.18.5, < 2.0.0
pyvista >= 0.29.0, < 1.0.0
scipy >= 1.5, < 2.0
numba >= 0.53, <1.0

It could also be that I messed up my __init__.py file. It looks like this:

from pterasoftware import aerodynamics
from pterasoftware import airfoils
from pterasoftware import geometry
from pterasoftware import meshing
from pterasoftware import movement
from pterasoftware import operating_point
from pterasoftware import output
from pterasoftware import problems
from pterasoftware import steady_horseshoe_vortex_lattice_method
from pterasoftware import steady_ring_vortex_lattice_method
from pterasoftware import unsteady_ring_vortex_lattice_method

The directory structure is as so:

├───pterasoftware
│   ├───airfoils
│   │   └───naca0012.dat
│   ├───__init__.py
│   ├───aerodynamics.py
│   ├───geometry.py
│   ├───meshing.py
│   ├───movement.py
│   ├───operating_point.py
│   ├───output.py
│   ├───problems.py
│   ├───steady_horsehoe_vortex_lattice_method.py
│   ├───steady_ring_vortex_lattice_method.py
│   └───unsteady_ring_vortex_lattice_method.py

I know that importing large packages like numpy, matplotlib, and scipy can be memory heavy. However, I know plenty of packages that use these resources, which don't take anywhere near 136 MB to import. What am I missing here?

Here's the code I use to test the memory allocated while importing the package:

from memory_profiler import profile


@profile
def find_import_memory_usage():
    import pterasoftware as ps


if __name__ == "__main__":
    find_import_memory_usage()
  • I really don't see an issue here. Don't forget that numpy, numba, and many other Python packages have a lot of overhead as they are not native Python, but rather written in C. If you could share some comparative data. – Can H. Tartanoglu Apr 21 '21 at 21:07
  • 1
    PyPI shows your package is less than 70 KB large, which is normal. The dependencies (all the transient dependencies included) can easily cumulate to 100-200 MB in venv when installed, which is also fine. For example, I have to maintain a 300 KB project that pulls `torch` dependency, which is nearly 1 GB large. – hoefling Apr 22 '21 at 17:08
  • Looking at the code, the overhead probably comes from numba compiling a lot of stuff into memory (I might be wrong though). – hoefling Apr 22 '21 at 17:23

1 Answers1

3

See Importing a python module takes too much memory. Importing your module requires the memory to store your bytecode (i.e. .pyc files) as well as to store the compiled form of referenced objects.

So for what, exactly, is all that memory being allocated?

We can check whether the memory is being allocated for your package or for your dependencies by running your memory profiler. We'll import your package's dependencies first to see how much memory they take up.

Since no memory will be allocated the next time(s) you import those libraries (you can try this yourself), when we import your package, we will see only the memory usage of that package and not its dependencies.

from memory_profiler import profile

@profile
def find_import_memory_usage():
    import matplotlib
    import numpy
    import pyvista
    import scipy
    import numba
    import copy
    import pterasoftware

Which gives me (using stock Python 3.7.6 on Windows 10):

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    10     18.3 MiB     18.3 MiB           1   @profile
    11                                         def find_import_memory_usage():
    12     34.3 MiB     16.0 MiB           1       import matplotlib
    13     34.3 MiB      0.0 MiB           1       import numpy
    14     96.6 MiB     62.2 MiB           1       import pyvista
    15    101.1 MiB      4.6 MiB           1       import scipy
    16    137.3 MiB     36.2 MiB           1       import numba
    17    137.3 MiB      0.0 MiB           1       import copy
    18    174.6 MiB     37.3 MiB           1       import pterasoftware

Your package only uses 37.3 MiB, which is much more reasonable. Your dependencies use 119 MiB, with Pyvista a particularly expensive import. And when numpy is used, it will require even more memory.

There are ways to reduce the memory requirements of your own package (some come at the cost of readability and some are just good practice, and some may not help at all when using a just-in-time compiler like Numba), but if you want to reduce the memory taken up by your dependencies, then you may just need to choose different ones, or, in the worst case, modify their code to split them up into only the components that your project needs, if their other components comprise lots of unused overhead.

abeta201
  • 148
  • 9