0

For Geospatial analysis I created a clump function that will identify patches of deforestation in a .tif file. The deforestation events are labelled as 1 the rest is background (0)

I created the following function :

import rasterio as rio
from rasterio.warp import calculate_default_transform
from scipy import ndimage as ndi

def clump(src_f, dst_f):

    with rio.open(src_f) as f:
        raster = f.read(1)
        dst_crs = 'EPSG:4326'
        transform, _, _ =  calculate_default_transform(
            f.crs, 
            dst_crs, 
            f.width, 
            f.height, 
            *f.bounds
        )
    
    struct = [
        [1,1,1],
        [1,1,1],
        [1,1,1]
    ]
    raster_labeled = ndi.label(raster, structure = struct)[0]
    
    dtype = rio.dtypes.get_minimum_dtype(raster_labeled)
    height = raster_labeled.shape[0]
    width = raster_labeled.shape[1]
    raster_labeled = raster_labeled.astype(dtype)
    
    with rio.open(dst_f, 'w', driver='GTiff', height=height, width=width, count=1, dtype=dtype, crs=dst_crs, transform=transform) as dst:
        dst.write(raster_labeled, 1)
    
    return

One of my end user need to use it on the total surface of Congo which is a (69940, 70935) int64 ndArray.

of course I get the following error :

MemoryError: Unable to allocate 37.0 GiB for an array with shape (69940, 70935) and data type int64

Is there a way to perform this analysis and reduce the amont of memory required ?

my search leads me to numpy.memmap but I'm not sure if it's relevant or doable.

Pierrick Rambaud
  • 870
  • 9
  • 31

0 Answers0