21

I am new in Android and I'm working on a query in SQLite. My problem is that when I use accent in strings e.g.

  • ÁÁÁ
  • ááá
  • ÀÀÀ
  • ààà
  • aaa
  • AAA

If I do:

SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%a%' ORDER BY MOVIE_NAME;

It's return:

  • AAA
  • aaa (It's ignoring the others)

But if I do:

SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%à%' ORDER BY MOVIE_NAME;

It's return:

  • ààà (ignoring the title "ÀÀÀ")

I want to select strings in a SQLite DB without caring for the accents and the case. Please help.

Daniele D.
  • 2,416
  • 3
  • 34
  • 42
andrehsouza
  • 439
  • 1
  • 4
  • 14
  • possible duplicate of [Accented Search in sqlite (android)](http://stackoverflow.com/questions/14083446/accented-search-in-sqlite-android) – CL. Apr 29 '13 at 16:07
  • 2
    Not duplicate to the above - it is about correctly collating accented characters (upper/lowercase matching), not ignoring them. – laalto Apr 29 '13 at 17:22

4 Answers4

38

Generally, string comparisons in SQL are controlled by column or expression COLLATE rules. In Android, only three collation sequences are pre-defined: BINARY (default), LOCALIZED and UNICODE. None of them is ideal for your use case, and the C API for installing new collation functions is unfortunately not exposed in the Java API.

To work around this:

  1. Add another column to your table, for example MOVIE_NAME_ASCII
  2. Store values into this column with the accent marks removed. You can remove accents by normalizing your strings to Unicode Normal Form D (NFD) and removing non-ASCII code points since NFD represents accented characters roughly as plain ASCII + combining accent markers:

    String asciiName = Normalizer.normalize(unicodeName, Normalizer.Form.NFD)
        .replaceAll("[^\\p{ASCII}]", "");
    
  3. Do your text searches on this ASCII-normalized column but display data from the original unicode column.

laalto
  • 137,703
  • 64
  • 254
  • 280
5

In Android sqlite, LIKE and GLOB ignore both COLLATE LOCALIZED and COLLATE UNICODE (they only work for ORDER BY). However, there is a solution without having to add extra columns to your table. As @asat explains in this answer, you can use GLOB with a pattern that will replace each letter with all the available alternatives of that letter. In Java:

public static String addTildeOptions(String searchText) {
    return searchText.toLowerCase()
                     .replaceAll("[aáàäâã]", "\\[aáàäâã\\]")
                     .replaceAll("[eéèëê]", "\\[eéèëê\\]")
                     .replaceAll("[iíìî]", "\\[iíìî\\]")
                     .replaceAll("[oóòöôõ]", "\\[oóòöôõ\\]")
                     .replaceAll("[uúùüû]", "\\[uúùüû\\]")
                     .replace("*", "[*]")
                     .replace("?", "[?]");
}

And then (not literally like this, of course):

SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"

This way, for example in Spanish, a user searching for either mas or más will get the search converted into m[aáàäâã]s, returning both results.

It is important to notice that GLOB ignores COLLATE NOCASE, that's why I converted everything to lower case both in the function and in the query. Notice also that the lower() function in sqlite doesn't work on non-ASCII characters - but again those are probably the ones that you are already replacing!

The function also replaces both GLOB wildcards, * and ?, with "escaped" versions.

descatalogado
  • 101
  • 1
  • 5
  • I had to fix your regex to get it working, because it is inverted. Where is `.replaceAll("[aáàäâã]", "\\[aáàäâã\\]")` I had to use `.replaceAll("\\.*[aáàäâã]\\.*", "[aáàäâã]")` – Cícero Moura Feb 05 '20 at 19:35
0

You can use Android NDK to recompile the SQLite source including the desired ICU (International Components for Unicode). Explained in russian here: http://habrahabr.ru/post/122408/

The process of compiling the SQLilte with source with ICU explained here:

How to compile sqlite with ICU?

Unfortunately you will end up with different APKs for different CPUs.

Community
  • 1
  • 1
BCS Software
  • 1,346
  • 1
  • 9
  • 8
-1

You need to look at these, not as accented characters, but as entirely different characters. You might as well be looking for a, b, or c. That being said, I would try using a regex for it. It would look something like:

SELECT * from TB_MOVIE WHERE MOVIE_NAME REGEXP '.*[aAàÀ].*' ORDER BY MOVIE_NAME;
Dave
  • 282
  • 1
  • 7