6

I'm using Google App Engine (Java) with JDO. How can I do the JDO equivalent of

select * from table where field like '%foo%'

The only recommendation I have seen so far is to use Lucene. I'm kind of surprised that something this basic is not possible on out-of-the-box GAE.

George Armhold
  • 29,784
  • 45
  • 147
  • 224

2 Answers2

6

You can't do substring searches of that sort on App Engine. The reason for this is that the App Engine datastore is built to be scalable, and refuses to execute any query it can't satisfy with an index. Indexing a query like this is next to impossible, because it requires searching the entirety of each record's 'field' property for a match. Any relational database you run this query on will execute it by doing a full table scan and checking each record individually - unscalable, to say the least.

The solution, as you've already found out, is to use full-text indexing, such as Lucene. There are libraries for running Lucene on App Engine, such as GAELucene. This also gives you the power of proper full-text search, rather than naive substring matching.

Nick Johnson
  • 98,961
  • 16
  • 125
  • 196
  • Nick, thanks for the reply. Would it be possible for the datastore to create an index on individual words, rather than %foo% ? I mean, Google is obviously able to do keyword searches, if not regexp-like searches. What I'm really trying to accomplish is to scan a set of recipes for keywords, so perhaps I formulated my question poorly. Thanks. – George Armhold Sep 30 '09 at 15:41
  • 1
    Yes - and what you're referring to is called an "inverted index" - and it's what libraries like Lucene use. For Python, there's SearchableModel, which implements this pattern. You could do the same in Java, if you wanted, but you're probably better off just using Lucene. – Nick Johnson Sep 30 '09 at 16:54
1

tl;dr: Manage your own multi-valued search property and perform equals queries against it.

Details: For those looking for something simple and DIY, you can do the following:

  1. On your entity, create a multi-valued searchTerms property. This will contain the entity's searchable items.

  2. Split your entity's searchable text into words. These words will be the entity's only searchable parts. You could start by splitting on whitespace, or you could add some basic stemming. E.g. when dealing with email addresses you might want to put the user and domain parts in separately so that both can be searched. If your entity is updated you'll need to rebuild this property.

  3. To perform a search, split the search input into words (performing basic stemming if required), and add each as a filter using the equals operator against the searchTerms property.

    (The = operator on a multi-valued property asks if any value equals the filter.)

    E.g. (using Objectify):

    Query query = dao.ofy().query(Recipe.class);
    for (String term : search.toLowerCase().split(" ")) {
      query = query.filter("searchTerms =", term);
    }
    
Tom Clift
  • 2,287
  • 23
  • 18