Case insensitive std::string.find()

Question

I am using std::string's find() method to test if a string is a substring of another. Now I need case insensitive version of the same thing. For string comparison I can always turn to stricmp() but there doesn't seem to be a stristr().

I have found various answers and most suggest using Boost which is not an option in my case. Additionally, I need to support std::wstring/wchar_t. Any ideas?

There's a Gotw about this very subject : http://www.gotw.ca/gotw/029.htm — Alexandre C., Jun 30 '10 at 18:45
stristr is not there, but "char *strcasestr(const char *haystack, const char *needle);" is there. Isnt this ok? — Nasir, Nov 27 '15 at 05:59

score 78 · Accepted Answer · edited Sep 13 '14 at 00:16

78

You could use std::search with a custom predicate.

#include <locale>
#include <iostream>
#include <algorithm>
using namespace std;

// templated version of my_equal so it could work with both char and wchar_t
template<typename charT>
struct my_equal {
    my_equal( const std::locale& loc ) : loc_(loc) {}
    bool operator()(charT ch1, charT ch2) {
        return std::toupper(ch1, loc_) == std::toupper(ch2, loc_);
    }
private:
    const std::locale& loc_;
};

// find substring (case insensitive)
template<typename T>
int ci_find_substr( const T& str1, const T& str2, const std::locale& loc = std::locale() )
{
    typename T::const_iterator it = std::search( str1.begin(), str1.end(), 
        str2.begin(), str2.end(), my_equal<typename T::value_type>(loc) );
    if ( it != str1.end() ) return it - str1.begin();
    else return -1; // not found
}

int main(int arc, char *argv[]) 
{
    // string test
    std::string str1 = "FIRST HELLO";
    std::string str2 = "hello";
    int f1 = ci_find_substr( str1, str2 );

    // wstring test
    std::wstring wstr1 = L"ОПЯТЬ ПРИВЕТ";
    std::wstring wstr2 = L"привет";
    int f2 = ci_find_substr( wstr1, wstr2 );

    return 0;
}

edited Sep 13 '14 at 00:16

Drew Dormann

50,103
11
109
162

answered Jun 30 '10 at 18:35

Kirill V. Lyadvinsky

89,955
22
127
208

Why are you using templates here? – rstackhouse Jun 19 '14 at 15:03
@rstackhouse, template here is for a support of different char types (`char` & `wchar_t`). – Kirill V. Lyadvinsky Jun 20 '14 at 07:35
1

Thanks, Kirill. For those as clueless as I am, insert `std::advance( it, offset );` after the declaration of the iterator to start the search from an offset. – Lara Aug 05 '14 at 02:17
For those (like me) who are not familiar with templates, can you also post a standard version without templates, without locales? Just for `wstring` for example @KirillV.Lyadvinsky? – Basj Jul 11 '17 at 11:02
1

Does the call to `std::toupper` actually work for wide characters? Wouldn't you need to call `std::towupper`? – MiloDC Sep 03 '20 at 18:22

CC. · Answer 2 · 2018-01-28T16:06:50.257

62

The new C++11 style:

#include <algorithm>
#include <string>
#include <cctype>

/// Try to find in the Haystack the Needle - ignore case
bool findStringIC(const std::string & strHaystack, const std::string & strNeedle)
{
  auto it = std::search(
    strHaystack.begin(), strHaystack.end(),
    strNeedle.begin(),   strNeedle.end(),
    [](char ch1, char ch2) { return std::toupper(ch1) == std::toupper(ch2); }
  );
  return (it != strHaystack.end() );
}

Explanation of the std::search can be found on cplusplus.com.

edited Jan 28 '18 at 16:06

answered Nov 07 '13 at 15:08

CC.

2,599
2
17
13

What if I want to find a char `c` in a string `str` using the same function. calling it using `findStringIC(str, (string)c)` doesnt work – Enissay Mar 12 '15 at 17:02
This type of char to string cast does not work, you have to actually create the string object like `std::string(1, 'x')` See http://coliru.stacked-crooked.com/a/af4051dd1d15972e If you do this a lot it might worth creating a specific function that does not require creating a new object every time. – CC. Mar 17 '15 at 17:16
1

In most cases, it is preferable to use `tolower()` when doing a case insensitive search. Even Ada changed it to lowercase! There are reasons that Unicode.org probably explains somewhere but I do not know exactly why. – Alexis Wilke Aug 07 '16 at 05:37
Upper case is better https://msdn.microsoft.com/en-us/library/bb386042.aspx but of course not perfect. If you need Turkish, that's going to be hard http://stackoverflow.com/questions/234591/upper-vs-lower-case and http://haacked.com/archive/2012/07/05/turkish-i-problem-and-why-you-should-care.aspx/ – CC. Oct 03 '16 at 22:28
... did they do away with templates in C++11? I must have missed the memo :) – Orwellophile Jan 26 '18 at 07:33
5

No template needed in this case. For C++17 you might want to take a look at string_view instead of std::string https://skebanga.github.io/string-view/ – CC. Jan 28 '18 at 16:12
That was a great read on `string_view`! Something new and shiny, and _fast_! :) – kayleeFrye_onDeck Jun 09 '18 at 01:23

gast128 · Answer 3 · 2017-08-06T21:26:22.523

18

why not use Boost.StringAlgo:

#include <boost/algorithm/string/find.hpp>

bool Foo()
{
   //case insensitive find

   std::string str("Hello");

   boost::iterator_range<std::string::const_iterator> rng;

   rng = boost::ifind_first(str, std::string("EL"));

   return rng;
}

edited Aug 06 '17 at 21:26

answered Nov 04 '14 at 11:22

gast128

1,003
11
20

8

Typically, unless a C++ question is tagged for Boost, it's assumed Boost isn't an option. – kayleeFrye_onDeck May 02 '17 at 20:00

DavidS · Answer 4 · 2016-04-12T17:38:37.253

17

Why not just convert both strings to lowercase before you call find()?

tolower

Notice:

Inefficient for long strings.
Beware of internationalization issues.

edited Apr 12 '16 at 17:38

answered Jun 30 '10 at 18:34

DavidS

4,384
2
23
51

14

Because it is very inefficient for larger strings. – bkausbk Aug 31 '12 at 11:37
1

This is also not really a good idea if your software ever needs to be localized. See Turkey test: http://haacked.com/archive/2012/07/05/turkish-i-problem-and-why-you-should-care.aspx/ – Bart Apr 12 '16 at 07:50
The arguments you'll uncover for doing basic upcase and downcase operations in C++ on anything not encoded as ANSI will overwhelm you xD Simply put, it's not trivial for the standard library to handle as of C++17. – kayleeFrye_onDeck Jun 09 '18 at 00:49

stinky472 · Answer 5 · 2010-06-30T18:48:41.073

Since you're doing substring searches (std::string) and not element (character) searches, there's unfortunately no existing solution I'm aware of that's immediately accessible in the standard library to do this.

Nevertheless, it's easy enough to do: simply convert both strings to upper case (or both to lower case - I chose upper in this example).

std::string upper_string(const std::string& str)
{
    string upper;
    transform(str.begin(), str.end(), std::back_inserter(upper), toupper);
    return upper;
}

std::string::size_type find_str_ci(const std::string& str, const std::string& substr)
{
    return upper(str).find(upper(substr) );
}

This is not a fast solution (bordering into pessimization territory) but it's the only one I know of off-hand. It's also not that hard to implement your own case-insensitive substring finder if you are worried about efficiency.

Additionally, I need to support std::wstring/wchar_t. Any ideas?

tolower/toupper in locale will work on wide-strings as well, so the solution above should be just as applicable (simple change std::string to std::wstring).

[Edit] An alternative, as pointed out, is to adapt your own case-insensitive string type from basic_string by specifying your own character traits. This works if you can accept all string searches, comparisons, etc. to be case-insensitive for a given string type.

Boris Ivanov · Answer 6 · 2013-12-31T14:11:14.923

Also make sense to provide Boost version: This will modify original strings.

#include <boost/algorithm/string.hpp>

string str1 = "hello world!!!";
string str2 = "HELLO";
boost::algorithm::to_lower(str1)
boost::algorithm::to_lower(str2)

if (str1.find(str2) != std::string::npos)
{
    // str1 contains str2
}

or using perfect boost xpression library

#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
....
std::string long_string( "very LonG string" );
std::string word("long");
smatch what;
sregex re = sregex::compile(word, boost::xpressive::icase);
if( regex_match( long_string, what, re ) )
{
    cout << word << " found!" << endl;
}

In this example you should pay attention that your search word don't have any regex special characters.

*"... I have found various answers and most suggest using Boost which is not an option in my case"*. — jww, Sep 13 '14 at 00:54

score 2 · Answer 7 · answered Jun 30 '10 at 18:58

2

If you want “real” comparison according to Unicode and locale rules, use ICU’s Collator class.

answered Jun 30 '10 at 18:58

Philipp

43,805
12
78
104

score 0 · Answer 8 · answered Aug 06 '15 at 10:49

#include <iostream>
using namespace std;

template <typename charT>
struct ichar {
    operator charT() const { return toupper(x); }
    charT x;
};
template <typename charT>
static basic_string<ichar<charT> > *istring(basic_string<charT> &s) { return (basic_string<ichar<charT> > *)&s; }
template <typename charT>
static ichar<charT> *istring(const charT *s) { return (ichar<charT> *)s; }

int main()
{
    string s = "The STRING";
    wstring ws = L"The WSTRING";
    cout << istring(s)->find(istring("str")) << " " << istring(ws)->find(istring(L"wstr"))  << endl;
}

A little bit dirty, but short & fast.

kayleeFrye_onDeck · Answer 9 · 2018-06-28T01:59:43.080

I love the answers from Kiril V. Lyadvinsky and CC. but my problem was a little more specific than just case-insensitivity; I needed a lazy Unicode-supported command-line argument parser that could eliminate false-positives/negatives when dealing with alphanumeric string searches that could have special characters in the base string used to format alphanum keywords I was searching against, e.g., Wolfjäger shouldn't match jäger but <jäger> should.

It's basically just Kiril/CC's answer with extra handling for alphanumeric exact-length matches.

/* Undefined behavior when a non-alpha-num substring parameter is used. */
bool find_alphanum_string_CI(const std::wstring& baseString, const std::wstring& subString)
{
    /* Fail fast if the base string was smaller than what we're looking for */
    if (subString.length() > baseString.length()) 
        return false;

    auto it = std::search(
        baseString.begin(), baseString.end(), subString.begin(), subString.end(),
        [](char ch1, char ch2)
        {
            return std::toupper(ch1) == std::toupper(ch2);
        }
    );

    if(it == baseString.end())
        return false;

    size_t match_start_offset = it - baseString.begin();

    std::wstring match_start = baseString.substr(match_start_offset, std::wstring::npos);

    /* Typical special characters and whitespace to split the substring up. */
    size_t match_end_pos = match_start.find_first_of(L" ,<.>;:/?\'\"[{]}=+-_)(*&^%$#@!~`");

    /* Pass fast if the remainder of the base string where
       the match started is the same length as the substring. */
    if (match_end_pos == std::wstring::npos && match_start.length() == subString.length()) 
        return true;

    std::wstring extracted_match = match_start.substr(0, match_end_pos);

    return (extracted_match.length() == subString.length());
}

The last 3 lines of code should be return (extracted_match.length() == subString.length()); — SJHowe, Jun 28 '18 at 01:57
"should" might be a bit strong for wording, but I agree that it's an improvement! :) Ty & updated ^_^ — kayleeFrye_onDeck, Jun 28 '18 at 02:00

score -2 · Answer 10 · answered Dec 24 '19 at 07:06

wxWidgets has a very rich string API wxString

it can be done with (using the case conversion way)

int Contains(const wxString& SpecProgramName, const wxString& str)
{
  wxString SpecProgramName_ = SpecProgramName.Upper();
  wxString str_ = str.Upper();
  int found = SpecProgramName.Find(str_);
  if (wxNOT_FOUND == found)
  {
    return 0;
  }
  return 1;
}

Case insensitive std::string.find()

10 Answers10

Linked

Related