84

Is the indexOf(String) method case sensitive? If so, is there a case insensitive version of it?

Brian
  • 1,766
  • 5
  • 21
  • 35
  • 3
    Not that I'm a big performance guy or anything (I actually consider performance tuning kind of evil), but the .toUpperCase copies your string each time you call it so if you do this in a loop, try to move the .toUpperCase out of the loop if possible. – Bill K Jul 14 '09 at 16:14

19 Answers19

77

The indexOf() methods are all case-sensitive. You can make them (roughly, in a broken way, but working for plenty of cases) case-insensitive by converting your strings to upper/lower case beforehand:

s1 = s1.toLowerCase(Locale.US);
s2 = s2.toLowerCase(Locale.US);
s1.indexOf(s2);
Joey
  • 316,376
  • 76
  • 642
  • 652
  • 5
    Beware of internationalization issues (i.e. the Turkish İ) when using toUpperCase. A more proper solution is to use str.toUpperCase(Locale.US).indexOf(...); – James Van Huis Jul 14 '09 at 15:53
  • 2
    I'm quite sure that case-converting and then comparing is not entirely correct according to Unicode comparison rules. It works for some things (namely case folding, which is generally used only in syntax parsing contexts) but for natural language there can be special cases where two strings that should compare equal don't, under either both uppercase or both lowercase. I can't come up with any examples off the bat however. – nielsm May 04 '10 at 09:47
  • 7
    Won't work. Some weird, international characters are converted to multiple characters when converted to lower-/upper-case. For example: `"ß".toUpperCase().equals("SS")` – Simon Apr 05 '13 at 22:33
  • ß is hardly a weird character and it's hardly international either, being used only in Germany and Austria. But yes, this is just as good as it gets but not *actually* a case-insensitive comparison, as nielsm already pointed out three years ago. – Joey Apr 06 '13 at 17:29
  • Does not work for Turkish unicode, that comes straight from somebody's email. – Alexander Pogrebnyak Feb 04 '14 at 19:07
  • Due to the discrepancy between some characters' upper and lower case equivalents, one could perform both tests - one upper and one lower case match - and if either pass, the match is successful. – David May 08 '19 at 08:30
44

Is the indexOf(String) method case sensitive?

Yes, it is case sensitive:

@Test
public void indexOfIsCaseSensitive() {
    assertTrue("Hello World!".indexOf("Hello") != -1);
    assertTrue("Hello World!".indexOf("hello") == -1);
}

If so, is there a case insensitive version of it?

No, there isn't. You can convert both strings to lower case before calling indexOf:

@Test
public void caseInsensitiveIndexOf() {
    assertTrue("Hello World!".toLowerCase().indexOf("Hello".toLowerCase()) != -1);
    assertTrue("Hello World!".toLowerCase().indexOf("hello".toLowerCase()) != -1);
}
dfa
  • 107,531
  • 29
  • 184
  • 223
  • 8
    oh please please please don't forget to use culture invariant conversion with Locale.US, we had enough problems with java applications running under Turkish locale. – idursun Jul 14 '09 at 15:49
  • @idursun - forcing to US locale doesn't solve the problem, because it still doesn't work for strings that actually contain the characters that are problematic to start with (for instance `"ı".toLowerCase(Locale.US).indexOf("I".toLowerCase(Locale.US))` should return 0 because the first string is a Turkish lower case `"I"`, and therefore should compare as equal to the upper-case `"I"` in the second, but returns -1 because the latter is converted to `"i"` instead). – Jules Apr 20 '18 at 11:12
20

There is an ignore case method in StringUtils class of Apache Commons Lang library

indexOfIgnoreCase(CharSequence str, CharSequence searchStr)

deepika
  • 291
  • 3
  • 6
  • This should be an accepted answer, as the current one does not work for certain non-ascii strings that contain unicode control characters. For example, this works for text written in Turkish. Behind the scene Apache uses regionMatches, and that does work. – Alexander Pogrebnyak Feb 04 '14 at 19:10
17

Yes, indexOf is case sensitive.

The best way to do case insensivity I have found is:

String original;
int idx = original.toLowerCase().indexOf(someStr.toLowerCase());

That will do a case insensitive indexOf().

jjnguy
  • 128,890
  • 51
  • 289
  • 321
  • 2
    No. Don't ever do that. The reason is that, `original.toLowerCase().length()` not always equals to `original.length()`. The result `idx` is not able to map back correctly to `original`. – Cheok Yan Cheng Jan 15 '19 at 04:21
14

Here is my solution which does not allocate any heap memory, therefore it should be significantly faster than most of the other implementations mentioned here.

public static int indexOfIgnoreCase(final String haystack,
                                    final String needle) {
    if (needle.isEmpty() || haystack.isEmpty()) {
        // Fallback to legacy behavior.
        return haystack.indexOf(needle);
    }

    for (int i = 0; i < haystack.length(); ++i) {
        // Early out, if possible.
        if (i + needle.length() > haystack.length()) {
            return -1;
        }

        // Attempt to match substring starting at position i of haystack.
        int j = 0;
        int ii = i;
        while (ii < haystack.length() && j < needle.length()) {
            char c = Character.toLowerCase(haystack.charAt(ii));
            char c2 = Character.toLowerCase(needle.charAt(j));
            if (c != c2) {
                break;
            }
            j++;
            ii++;
        }
        // Walked all the way to the end of the needle, return the start
        // position that this was found.
        if (j == needle.length()) {
            return i;
        }
    }

    return -1;
}

And here are the unit tests that verify correct behavior.

@Test
public void testIndexOfIgnoreCase() {
    assertThat(StringUtils.indexOfIgnoreCase("A", "A"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("a", "A"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("A", "a"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("a", "a"), is(0));

    assertThat(StringUtils.indexOfIgnoreCase("a", "ba"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase("ba", "a"), is(1));

    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", " Royal Blue"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase(" Royal Blue", "Royal Blue"), is(1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "royal"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "oyal"), is(1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "al"), is(3));
    assertThat(StringUtils.indexOfIgnoreCase("", "royal"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", ""), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "BLUE"), is(6));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "BIGLONGSTRING"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "Royal Blue LONGSTRING"), is(-1));  
}
Zach Vorhies
  • 169
  • 1
  • 2
  • How does this answer the question?? – Quality Catalyst Apr 22 '15 at 22:07
  • 7
    The answer is "no, there are no case insensitive versions of indexOf". However, I added the solution here because people are going to find this page looking for solutions. I made my solution available with test cases so that the next person coming through can use my code to solve the exact same problem. That's why stack overflow is useful right? I have a decade of experience writing high performance code, half of that at google. I just gave a well tested solution away for free to help the community. – Zach Vorhies Apr 24 '15 at 01:34
  • 3
    This is exactly what I was interested in. I found this to be about 10-15% faster than the Apache Commons version. If I could upvote it many more times I would. Thanks! – Jeff Williams Dec 07 '15 at 20:14
  • Thanks Jeff, I'm glad it gave you a lot of value. There are others that are recommending that this post which provides a solution goes toward the top. If someone else likes my code then I humbly ask that you upvote this solution. – Zach Vorhies Dec 15 '15 at 00:20
  • 2
    Here's a missing test case: `assertThat(StringUtils.indexOfIgnoreCase("ı" /* Turkish lower-case I, U+0131 */, "I"), is(0));` – Jules Apr 20 '18 at 11:17
  • This works great. – skanga Apr 04 '21 at 16:35
11

Yes, it is case-sensitive. You can do a case-insensitive indexOf by converting your String and the String parameter both to upper-case before searching.

String str = "Hello world";
String search = "hello";
str.toUpperCase().indexOf(search.toUpperCase());

Note that toUpperCase may not work in some circumstances. For instance this:

String str = "Feldbergstraße 23, Mainz";
String find = "mainz";
int idxU = str.toUpperCase().indexOf (find.toUpperCase ());
int idxL = str.toLowerCase().indexOf (find.toLowerCase ());

idxU will be 20, which is wrong! idxL will be 19, which is correct. What's causing the problem is tha toUpperCase() converts the "ß" character into TWO characters, "SS" and this throws the index off.

Consequently, always stick with toLowerCase()

Community
  • 1
  • 1
Nick Lewis
  • 4,018
  • 1
  • 18
  • 22
  • 1
    Sticking to lower case doesn't help: if you change `find` to `"STRASSE"`, it doesn't find it at all in the lower case variant, but does correctly find it in the upper case version. – Jules Apr 20 '18 at 11:26
3

What are you doing with the index value once returned?

If you are using it to manipulate your string, then could you not use a regular expression instead?

import static org.junit.Assert.assertEquals;    
import org.junit.Test;

public class StringIndexOfRegexpTest {

    @Test
    public void testNastyIndexOfBasedReplace() {
        final String source = "Hello World";
        final int index = source.toLowerCase().indexOf("hello".toLowerCase());
        final String target = "Hi".concat(source.substring(index
                + "hello".length(), source.length()));
        assertEquals("Hi World", target);
    }

    @Test
    public void testSimpleRegexpBasedReplace() {
        final String source = "Hello World";
        final String target = source.replaceFirst("(?i)hello", "Hi");
        assertEquals("Hi World", target);
    }
}
toolkit
  • 47,529
  • 17
  • 103
  • 134
  • Surprised by the lack of upvotes here. In a page dominated by incorrect answers, this is one of the only three that actually works correctly. – Jules Apr 20 '18 at 11:43
2
@Test
public void testIndexofCaseSensitive() {
    TestCase.assertEquals(-1, "abcDef".indexOf("d") );
}
dfa
  • 107,531
  • 29
  • 184
  • 223
Paul McKenzie
  • 18,189
  • 23
  • 71
  • 116
  • This doesn't even answer the full question..it doesn't even say if the test passes.... – jjnguy Jul 14 '09 at 15:43
  • 2
    You're right I didn't, I was kinda hoping that it would prompt the original questioner to run the test him/herself, and maybe get into the habit – Paul McKenzie Jul 14 '09 at 15:51
  • 2
    Well, that is fine...but I would argue that it would be better to vote for a question that actually gives an answer than a test. StackOverflow is trying to be a code Q and A repository. Thus full answers would be best. – jjnguy Jul 14 '09 at 15:54
  • 1
    @jjnguy: I was always under the impression that people who posted tests, posted tests that pass. @dfa kind of did a similar thing. (But @dfa's answer is more complete). – Tom Jul 14 '09 at 16:03
  • But he also posted some words(description)...Those are usually helpful. – jjnguy Jul 14 '09 at 16:06
2

Yes, I am fairly sure it is. One method of working around that using the standard library would be:

int index = str.toUpperCase().indexOf("FOO"); 
Yacoby
  • 51,022
  • 12
  • 106
  • 116
2

I've just looked at the source. It compares chars so it is case sensitive.

John Topley
  • 107,187
  • 45
  • 188
  • 235
2

Had the same problem. I tried regular expression and the apache StringUtils.indexOfIgnoreCase-Method, but both were pretty slow... So I wrote an short method myself...:

public static int indexOfIgnoreCase(final String chkstr, final String searchStr, int i) {
    if (chkstr != null && searchStr != null && i > -1) {
          int serchStrLength = searchStr.length();
          char[] searchCharLc = new char[serchStrLength];
          char[] searchCharUc = new char[serchStrLength];
          searchStr.toUpperCase().getChars(0, serchStrLength, searchCharUc, 0);
          searchStr.toLowerCase().getChars(0, serchStrLength, searchCharLc, 0);
          int j = 0;
          for (int checkStrLength = chkstr.length(); i < checkStrLength; i++) {
                char charAt = chkstr.charAt(i);
                if (charAt == searchCharLc[j] || charAt == searchCharUc[j]) {
                     if (++j == serchStrLength) {
                           return i - j + 1;
                     }
                } else { // faster than: else if (j != 0) {
                         i = i - j;
                         j = 0;
                    }
              }
        }
        return -1;
  }

According to my tests its much faster... (at least if your searchString is rather short). if you have any suggestions for improvement or bugs it would be nice to let me know... (since I use this code in an application ;-)

phil
  • 37
  • 1
  • This is actually very clever, as the searchstring will be significantly shorter than the text to search in, and it only creates an upper- and lowercase version of the searchstring. Thank you for that! – fiffy Oct 02 '15 at 09:50
  • This is significantly slower than StringUtils version in my testing. However, Zach's answer is like 10-15% faster. – Jeff Williams Dec 07 '15 at 20:13
  • This solution is about 10% faster than the one given by Zach Vorhies. Thank you for this solution. – gogognome May 18 '16 at 21:16
  • This solution doesn't produce a correct answer in presence of strings that change length on conversion to upper case (e.g. if you search for "ß" it will find it in any string that contains a single capital "S") or for text that uses alternative capitalizations (e.g. `indexOfIgnoreCase("İ","i")` should return 0 because `İ` is the correct capitalization of `i` for Turkish text, but instead returns -1 because `i` is capitalized to the more common `I`). – Jules Apr 20 '18 at 11:33
1

Just to sum it up, 3 solutions:

  • using toLowerCase() or toUpperCase
  • using StringUtils of apache
  • using regex

Now, what I was wondering was which one is the fastest? I'm guessing on average the first one.

max
  • 7,861
  • 15
  • 70
  • 115
1

The first question has already been answered many times. Yes, the String.indexOf() methods are all case-sensitive.

If you need a locale-sensitive indexOf() you could use the Collator. Depending on the strength value you set you can get case insensitive comparison, and also treat accented letters as the same as the non-accented ones, etc. Here is an example of how to do this:

private int indexOf(String original, String search) {
    Collator collator = Collator.getInstance();
    collator.setStrength(Collator.PRIMARY);
    for (int i = 0; i <= original.length() - search.length(); i++) {
        if (collator.equals(search, original.substring(i, i + search.length()))) {
            return i;
        }
    }
    return -1;
}
Bernd S
  • 1,228
  • 1
  • 11
  • 18
  • Surprised by the lack of upvotes here. In a page dominated by incorrect answers, this is one of the only three that actually works correctly. – Jules Apr 20 '18 at 11:46
0

But it's not hard to write one:

public class CaseInsensitiveIndexOfTest extends TestCase {
    public void testOne() throws Exception {
        assertEquals(2, caseInsensitiveIndexOf("ABC", "xxabcdef"));
    }

    public static int caseInsensitiveIndexOf(String substring, String string) {
        return string.toLowerCase().indexOf(substring.toLowerCase());
    }
}
Carl Manaster
  • 38,312
  • 15
  • 96
  • 147
  • As commented above, this fails to correctly identify that `"ı"` is a lower-case variant (just not the default one in most langauges) of `"I"`. Or alternatively, if run on a machine set to a locale where `"ı"` *is* the default, it will fail to notice that `"i"` is also a lower-case variant of `"I"`. – Jules Apr 20 '18 at 11:42
0

Converting both strings to lower-case is usually not a big deal but it would be slow if some of the strings is long. And if you do this in a loop then it would be really bad. For this reason, I would recommend indexOfIgnoreCase.

Jakub Vrána
  • 533
  • 2
  • 14
0
 static string Search(string factMessage, string b)
        {

            int index = factMessage.IndexOf(b, StringComparison.CurrentCultureIgnoreCase);
            string line = null;
            int i = index;
            if (i == -1)
            { return "not matched"; }
            else
            {
                while (factMessage[i] != ' ')
                {
                    line = line + factMessage[i];
                    i++;
                }

                return line;
            }

        }
0

Here's a version closely resembling Apache's StringUtils version:

public int indexOfIgnoreCase(String str, String searchStr) {
    return indexOfIgnoreCase(str, searchStr, 0);
}

public int indexOfIgnoreCase(String str, String searchStr, int fromIndex) {
    // https://stackoverflow.com/questions/14018478/string-contains-ignore-case/14018511
    if(str == null || searchStr == null) return -1;
    if (searchStr.length() == 0) return fromIndex;  // empty string found; use same behavior as Apache StringUtils
    final int endLimit = str.length() - searchStr.length() + 1;
    for (int i = fromIndex; i < endLimit; i++) {
        if (str.regionMatches(true, i, searchStr, 0, searchStr.length())) return i;
    }
    return -1;
}
Ernie Thomason
  • 1,321
  • 13
  • 16
0

I would like to lay claim to the ONE and only solution posted so far that actually works. :-)

Three classes of problems that have to be dealt with.

  1. Non-transitive matching rules for lower and uppercase. The Turkish I problem has been mentioned frequently in other replies. According to comments in Android source for String.regionMatches, the Georgian comparison rules requires additional conversion to lower-case when comparing for case-insensitive equality.

  2. Cases where upper- and lower-case forms have a different number of letters. Pretty much all of the solutions posted so far fail, in these cases. Example: German STRASSE vs. Straße have case-insensitive equality, but have different lengths.

  3. Binding strengths of accented characters. Locale AND context effect whether accents match or not. In French, the uppercase form of 'é' is 'E', although there is a movement toward using uppercase accents . In Canadian French, the upper-case form of 'é' is 'É', without exception. Users in both countries would expect "e" to match "é" when searching. Whether accented and unaccented characters match is locale-specific. Now consider: does "E" equal "É"? Yes. It does. In French locales, anyway.

I am currently using android.icu.text.StringSearch to correctly implement previous implementations of case-insensitive indexOf operations.

Non-Android users can access the same functionality through the ICU4J package, using the com.ibm.icu.text.StringSearch class.

Be careful to reference classes in the correct icu package (android.icu.text or com.ibm.icu.text) as Android and the JRE both have classes with the same name in other namespaces (e.g. Collator).

    this.collator = (RuleBasedCollator)Collator.getInstance(locale);
    this.collator.setStrength(Collator.PRIMARY);

    ....

    StringSearch search = new StringSearch(
         pattern,
         new StringCharacterIterator(targetText),
         collator);
    int index = search.first();
    if (index != SearchString.DONE)
    {
        // remember that the match length may NOT equal the pattern length.
        length = search.getMatchLength();
        .... 
    }

Test Cases (Locale, pattern, target text, expectedResult):

    testMatch(Locale.US,"AbCde","aBcDe",true);
    testMatch(Locale.US,"éèê","EEE",true);

    testMatch(Locale.GERMAN,"STRASSE","Straße",true);
    testMatch(Locale.FRENCH,"éèê","EEE",true);
    testMatch(Locale.FRENCH,"EEE","éèê",true);
    testMatch(Locale.FRENCH,"éèê","ÉÈÊ",true);

    testMatch(new Locale("tr-TR"),"TITLE","tıtle",true);  // Turkish dotless I/i
    testMatch(new Locale("tr-TR"),"TİTLE","title",true);  // Turkish dotted I/i
    testMatch(new Locale("tr-TR"),"TITLE","title",false);  // Dotless-I != dotted i.

PS: As best as I can determine, the PRIMARY binding strength should do the right thing when locale-specific rules differentiate between accented and non-accented characters according to dictionary rules; but I don't which locale to use to test this premise. Donated test cases would be gratefully appreciated.

--

Copyright notice: because StackOverflow's CC-BY_SA copyrights as applied to code-fragments are unworkable for professional developers, these fragments are dual licensed under more appropriate licenses here: https://pastebin.com/1YhFWmnU

Robin Davies
  • 6,889
  • 1
  • 29
  • 41
  • 1
    If you want to dual-license your code, please do so via some other platform, and include a link there. A massive blob of legalese appended to the end of each answer adds an egregious amount of clutter to Stack Overflow. – meager Mar 15 '20 at 00:47
  • Then perhaps you should find a more efficient way to address the problem of CC-BY-SA applied to code fragments, – Robin Davies Mar 16 '20 at 09:12
  • It also seems inappropriate for you to remove license grants that I provided to code fragments to which I hold copyright. – Robin Davies Mar 16 '20 at 09:24
-2

indexOf is case sensitive. This is because it uses the equals method to compare the elements in the list. The same thing goes for contains and remove.

Robbie
  • 801
  • 2
  • 14
  • 22
  • The original question is about String's indexOf method. – John Topley Jul 14 '09 at 16:13
  • I didn't know that's what he was talking about. I didn't realize it until other people had said something. The principle is still the same though. – Robbie Jul 14 '09 at 18:44
  • 2
    No it isn't. The internals of String's indexOf method compares chars not objects, so it doesn't use the equals method. – John Topley Jul 14 '09 at 18:54