-1

src is a non-null terminated char string whose length is data_len. I want to start from the end of this array, and find the first occurrence of html </body> tag.
find_pos should hold the position of the </body> tag with src

Does the code below look correct to you?

char *strrcasestr_len(const char *hay, size_t haylen, const char *ndl,size_t ndllen)
{    
   char *ret = NULL;
   int i;
   for (i = haylen - ndllen; i >= 0; i--) {
       if (!strncasecmp(&hay[i], ndl, ndllen)) {
            break;
       }
   }
   if (i == -1)
       return ret;
   else
       return (char *)&hay[i];
}
ann
  • 584
  • 1
  • 9
  • 19
vsc
  • 1
  • 2

1 Answers1

2

This should do it, very very fast.

char const* find_body_closing_tag( char const* const src, size_t const data_len )
{
    static char table[256];
    static bool inited;
    if (!inited) {
         table['<'] = 1;
         table['/'] = 2;
         table['b'] = table['B'] = 3;
         table['o'] = table['O'] = 4;
         table['d'] = table['D'] = 5;
         table['y'] = table['Y'] = 6;
         table['>'] = 7;
         inited = true;
    }

    for( char const* p = src + data_len - 7; p >= src; p -= 7 ) {
        if (char offset = table[*p]) {
            if (0 == strnicmp(p - (offset-1), "</body>", 7)) return p - (offset-1);
        }
    }
    return 0;
}

Another very fast approach would be using SIMD to test 16 consecutive characters against '>' at once (and this is what strrchr or memrchr ought to be doing).

Ben Voigt
  • 260,885
  • 36
  • 380
  • 671