1

I am trying to avoid using a regexp function when I need to match strings regardless to their accent or case. For instance, when I search for "VOILA", I want to match "voilà".

I noted that it is possible to create a custom collation to achieve that but I don't know where to start and what to expect in terms of performance:

  • I assume it will be faster than my REGEXP function defined in my VB.net code (see code below). Is that correct? EDIT: this approach doesn't work. I guess the regex is false. Since I am looking for another solution, I'll wait for answers instead of fixing it.

    _ Class MyRegEx Inherits SQLiteFunction Public Overrides Function Invoke(args As Object()) As Object Return System.Text.RegularExpressions.Regex.IsMatch(Convert.ToString(args(1)), Convert.ToString(args(0))) End Function End Class

The regex build:

Dim testString As String = word.Replace("A", "[=a=]")
testString = testString.Replace("E", "[=e=]")
testString = testString.Replace("I", "[=i=]")
testString = testString.Replace("O", "[=o=]")
testString = testString.Replace("U", "[=u=]")
testString = testString.Replace("Œ", "œ")
testString = "(?i)(\W|^)" & testString & "(\W|$)"
  • I have read the SQLite documentation on the creation of a custom collation but I don't really get it. Should I define this in my .net application? Can I embed my custom collation in the database to avoid to create it each time?
  • I guess that it's a common issue considering the number of related questions. However, I haven't seen a single written example achieving this kind of accent/case insensitive search. Can someone share this?

The best lead I have so far would be to create a custom column without accents as described here. This approach seems to be satisfying, but I wanted to get the elegant solution (i.e. custom collate) instead, if it is possible in my case.

Thanks!

Community
  • 1
  • 1
Billybobbonnet
  • 2,876
  • 3
  • 17
  • 41
  • See [Case-Insensitive Sort of UTF8 Data Using System.Data.SQLite](http://www.codeproject.com/Articles/44349/Case-Insensitive-Sort-of-UTF-Data-Using-System-Da), and adjust the comparison approrpiately. – CL. Sep 17 '14 at 11:43
  • Nice example. I tried this but it is slower than the additional column approach. Thanks anyway – Billybobbonnet Sep 17 '14 at 14:46
  • I depend on regexp all the time... but just saw a presentation that suggested, in general, it might not be a good strategy given the complexity and "proof"-ness – Richard May 01 '21 at 13:48

0 Answers0