15

Does anyone know if there is a clean implementation of the Turlach rolling median algorithm in C? I'm having trouble porting the R version to a clean C version. See here for more details on the algorithm.

EDIT: As darkcminor pointed out, matlab has a function medfilt2 which calls ordf which is a c implementation of a rolling order statistic algorithm. I believe the algorithm is faster than O(n^2), but it is not open source and I do not want to purchase the image processing toolbox.

Community
  • 1
  • 1
Rich C
  • 3,076
  • 5
  • 23
  • 34
  • check this maybe matlab http://www.mathworks.com/matlabcentral/newsreader/view_thread/270067 – edgarmtze Apr 03 '11 at 04:26
  • 3
    See this question: http://stackoverflow.com/questions/1309263/rolling-median-algorithm-in-c – Robert Gamble Apr 04 '11 at 18:23
  • 2
    There's also the constant time median filtering algorithm. There's an implementation for 2D in scikits.image, with an octagonal filter area. –  Apr 15 '11 at 09:50

2 Answers2

18

I've implemented a rolling median calculator in C here (Gist). It uses a max-median-min heap structure: The median is at heap[0] (which is at the center of a K-item array). There is a minheap starting at heap[ 1], and a maxheap (using negative indexing) at heap[-1].
It's not exactly the same as the Turlach implementation from the R source: This one supports values being inserted on-the-fly, while the R version acts on a whole buffer at once. But I believe the time complexity is the same. And it could easily be used to implement a whole buffer version (possibly with with the addition of some code to handle R's "endrules").

Interface:

//Customize for your data Item type
typedef int Item;
#define ItemLess(a,b)  ((a)<(b))
#define ItemMean(a,b)  (((a)+(b))/2)

typedef struct Mediator_t Mediator;

//creates new Mediator: to calculate `nItems` running median. 
//mallocs single block of memory, caller must free.
Mediator* MediatorNew(int nItems);

//returns median item (or average of 2 when item count is even)
Item MediatorMedian(Mediator* m);

//Inserts item, maintains median in O(lg nItems)
void MediatorInsert(Mediator* m, Item v)
{
   int isNew = (m->ct < m->N);
   int p = m->pos[m->idx];
   Item old = m->data[m->idx];
   m->data[m->idx] = v;
   m->idx = (m->idx+1) % m->N;
   m->ct += isNew;
   if (p > 0)         //new item is in minHeap
   {  if (!isNew && ItemLess(old, v)) { minSortDown(m, p*2);  }
      else if (minSortUp(m, p)) { maxSortDown(m,-1); }
   }
   else if (p < 0)   //new item is in maxheap
   {  if (!isNew && ItemLess(v, old)) { maxSortDown(m, p*2); }
      else if (maxSortUp(m, p)) { minSortDown(m, 1); }
   }
   else            //new item is at median
   {  if (maxCt(m)) { maxSortDown(m,-1); }
      if (minCt(m)) { minSortDown(m, 1); }
   }
}
AShelly
  • 32,752
  • 12
  • 84
  • 142
  • 1
    I can confirm this works and it is fast. It would be nice to have the ability to pop elements w/o inserting (to accomodate missing values) and to specify an arbitrary percentile. These are probably easy tweaks though. Good work! – Rich C May 15 '11 at 18:14
  • Implementing PopOldest() would be easy: The position of the oldest item in the heap is `p=pos[(idx-ct+N)%N]`. If it is in the minheap, swap it to the end, then do a sortdown to ensure the swapped item is in the right place: `if (p>0) {exchange(p,minCt); m->ct--; minSortDown(p*2);`. Otherwise do the same with the maxheap - except to handle the special case of p==0, you need to do a `maxSortDown( p*2||-1)`. – AShelly May 16 '11 at 06:57
  • Implementing for "KthPercentile" would be a bit trickier but not too bad. For a K between 0.0 and 1.0, `heap` would point at element K*N. `maxCt` would be ct*k, `minCt` would be ct-1-maxCt. The tricky part would be initializing the pos array so that the initial elements are distributed correctly. It would be something like: for each i: point pos[i] to the next element on the maxheap until it contains more than K percent of the items so far, then shift to the minheap. – AShelly May 16 '11 at 07:43
  • 1
    Here are some benchmarks: https://github.com/suomela/median-filter — in brief, this approach seems to work very well in general, but for some data distributions it is possible to do better with a sorting-based algorithm. – Jukka Suomela Apr 21 '14 at 11:24
  • Nice benchmark. It appears that it's more accurate to say that for 'most' data sorting is better. – AShelly Apr 21 '14 at 19:02
  • @AShelly, I've tried to implement PopOldest as you describe and it seems that it is a bit more complex, because it needs to also maintain the balance between the min/max heaps (eg it could just so happen that you have 3 pops from the max heap in a row). The initial window has the positions alternating between heaps to maintain the balance implicitly - any advice on how to manually fix the balance? – Charles Sep 05 '16 at 03:06
  • I'd need to look closer, but I think you could use the `ct` member as an indicator of which heap should be emptied. If the element to be popped is in the wrong heap, you'd have to do one more pass to shift the median into that heap, then fixup the other. – AShelly Sep 05 '16 at 22:51
  • Thanks for the info. I did end up managing to get it to work with missing values the other day - I ended up adding an conditional balance step to the end of PopOldest. I also needed to change Insert to choose the target heap on the fly when isNew=1, as the position rolling in would no longer necessarily be the right one. – Charles Sep 07 '16 at 03:00
  • Overall it is a very efficient, neat & compact implementation. Out of interest, are you of aware of any way to address the worst-case performance that you get on sorted (ascending or descending) inputs, or is that unavoidable? – Charles Sep 07 '16 at 03:15
  • I don't see a simple way, unless you know in advance the inputs will be sorted, in which case you can directly calculate the answer by taking the midpoint. But I'm an engineer, not a computer scientist, so maybe there is a trick somewhere. – AShelly Sep 07 '16 at 04:57
  • Possibly you could use randomness somehow, like they do with quicksort, which choses a random pivot to avoid worst case behavior. But I'm not sure that really helps - with sorted data, every new input needs to go in the same heap, shifting the median to the other. And you don't really have a choice where the new element starts - you need to put it where the oldest was. – AShelly Sep 07 '16 at 05:20
  • 1
    Heads up for anybody interested, this code can also be found in `movstat/medacc.c` of the GNU Scientific Library (GSL; https://www.gnu.org/software/gsl/) and is accessible via the `gsl_movstat_median()` interface. – sircolinton Jul 05 '19 at 15:57
  • [Just mentioning to avoid confusion, AShelly is the original author of the aforementioned GSL implementation.] – sircolinton Jul 05 '19 at 17:10
2

OpenCV has a medianBlur function that seems to do what you want. I know it's a rolling median. I can't say if it's the "Turlach rolling median" specifically. It's pretty fast though and it supports multi-threading when available.

Carl F.
  • 5,771
  • 3
  • 25
  • 41