For Geospatial analysis I created a clump
function that will identify patches of deforestation in a .tif file. The deforestation events are labelled as 1 the rest is background (0)
I created the following function :
import rasterio as rio
from rasterio.warp import calculate_default_transform
from scipy import ndimage as ndi
def clump(src_f, dst_f):
with rio.open(src_f) as f:
raster = f.read(1)
dst_crs = 'EPSG:4326'
transform, _, _ = calculate_default_transform(
f.crs,
dst_crs,
f.width,
f.height,
*f.bounds
)
struct = [
[1,1,1],
[1,1,1],
[1,1,1]
]
raster_labeled = ndi.label(raster, structure = struct)[0]
dtype = rio.dtypes.get_minimum_dtype(raster_labeled)
height = raster_labeled.shape[0]
width = raster_labeled.shape[1]
raster_labeled = raster_labeled.astype(dtype)
with rio.open(dst_f, 'w', driver='GTiff', height=height, width=width, count=1, dtype=dtype, crs=dst_crs, transform=transform) as dst:
dst.write(raster_labeled, 1)
return
One of my end user need to use it on the total surface of Congo which is a (69940, 70935) int64 ndArray.
of course I get the following error :
MemoryError: Unable to allocate 37.0 GiB for an array with shape (69940, 70935) and data type int64
Is there a way to perform this analysis and reduce the amont of memory required ?
my search leads me to numpy.memmap but I'm not sure if it's relevant or doable.