26

I have been struggling to inteprolate the data for "empty" pixels in my 2D matrix. Basically, I understand (but not deeply) interpolation techniques such as Inverse Distance Weighting, Kriging, Bicubic etc. I dont know the starting point exactly (either in the statement of the problem or Python case).

The problem definition: I have MxN matrix (regular grid) in which each pixel represents certain measurement value (figure below and data used in this figure is here). I wanted to interpolate the data for "question mark space" (white space which also consists of the same sized but empty pixels) areas using the existing data I have as blue pixels.

Water evaporation in space

My question:

1) How can I interpolate this data. Could anyone give me simple example (e.g. 3x3 matrix) to understand that clearly?

2) Could anyone guide me how to perform the steps towards solution in Python environment?

3) How can I compare the interpolation techniques in accuracy sense using Python?

4) Do you think it is a good idea to use different interpolation depending on the density of the data?

I will appreciate your answers and suggestions.

Spider
  • 826
  • 3
  • 14
  • 34
  • 1
    Here is one way to do it: http://stackoverflow.com/questions/17115030/want-to-smooth-a-contour-from-a-masked-array/17125125#17125125 though your holes are too big IMHO – theta Jul 27 '14 at 05:09
  • For the small holes in the middle you may be able to do something sensible, but the data you do have varies on scales smaller than the large areas of missing data on the outskirts. If you're just going for visual similarity I would fill it in randomly and smooth it. – Gabriel Jul 27 '14 at 05:26
  • @theta Thanks for the link, but It is not what I want. If you read my questions I meant something with small examples (to really understand and implement it for this case) from the one who is more experienced in this issue. For this reason, I asked otherwise I would google it... – Spider Jul 27 '14 at 14:11
  • My suggestion would be to *not* fill in the missing data. Just eyeballing it you seem to have data for about half the grid. Making up the other half will make a mileading picture no matter how you do it. – Roland Smith Jul 27 '14 at 18:48
  • @RolandSmith . I agree with you, but there is no way except interpolating these points depending on their neighbour values. My plan is to increase the measurement area as large and accuracte as possible. It is okay if I loose some information boundary areas which is normal. Would have some ideas on that? – Spider Jul 27 '14 at 19:04
  • @Spider You could probably fill in the small gaps inside. But for the big gaps on the outside I'd say there is probably not enough data to make a sensible approximation. – Roland Smith Jul 27 '14 at 19:59
  • 1
    This question might be better answered on: https://stats.stackexchange.com/. – jb. Jul 27 '14 at 20:30
  • For a principled method that computes all the missing values together, see [Laplace interpolation](http://apps.nrbook.com/empanel/index.html?pg=150) in Numerical Recipes, pages 150-153. – denis Jan 26 '16 at 16:24
  • 2
    @Spider Do you still have access to "measurements.txt"? The Dropbox link has gone dead, which makes reproducing these examples difficult – Robbi Bishop-Taylor Mar 07 '18 at 02:03

1 Answers1

46

What is a sensible solution largely depends on what questions you're trying to answer with the interpolated pixels -- caveat emptor: extrapolating over missing data can lead to very misleading answers!

Radial Basis Function Interpolation / Kernel Smoothing

In terms of practical solutions available in Python, one way to fill those pixels in would be to use Scipy's implementation of Radial Basis Function interpolation (see here) which is intended for the smoothing/interpolation of scattered data.

Given your matrix M and underlying 1D coordinate arrays r and c (such that M.shape == (r.size, c.size)), where missing entries of M are set to nan, this seems to work fairly well with a linear RBF kernel as follows:

import numpy as np
import scipy.interpolate as interpolate

with open('measurement.txt') as fh:
    M = np.vstack(map(float, r.split(' ')) for r in fh.read().splitlines())
r = np.linspace(0, 1, M.shape[0]) 
c = np.linspace(0, 1, M.shape[1])

rr, cc = np.meshgrid(r, c)
vals = ~np.isnan(M)
f = interpolate.Rbf(rr[vals], cc[vals], M[vals], function='linear')
interpolated = f(rr, cc)

This yields the following interpolation of the data you've linked to above, which although reasonable looking, does highlight how unfavourable the ratio of missing samples to real data is:

RBF Interpolation

Gaussian Process Regression / Kriging

Kriging interpolation is available via the Gaussian Process Regression implementation (which is itself based on the DACE Kriging toolbox for Matlab) in the scikit-learn library. This could be invoked as follows:

from sklearn.gaussian_process import GaussianProcess

gp = GaussianProcess(theta0=0.1, thetaL=.001, thetaU=1., nugget=0.01)
gp.fit(X=np.column_stack([rr[vals],cc[vals]]), y=M[vals])
rr_cc_as_cols = np.column_stack([rr.flatten(), cc.flatten()])
interpolated = gp.predict(rr_cc_as_cols).reshape(M.shape)

This yields a very similar interpolation to the Radial Basis Function example above. In both cases there are a lot of parameters to explore - the choice of these largely hinges on the assumptions that you can make about the data. (One advantage of the linear kernel used in the RBF example above is that it has no free parameters)

Kriging/Gaussian Process Regression

Inpainting

As a final aside, an entirely visually motivated solution would be use OpenCV's inpainting functionality, although this assumes 8bit arrays (0 - 255), and does not have a straightforward mathematical interpretation.

rroowwllaanndd
  • 3,378
  • 20
  • 19