2

I just started learning about Lucene and I get how it can be very useful for searching documents and such, but does anyone use it for searching database data?

For example,

I need to make a search that will search for keywords based on two fields ("description", "message").

To do this I would need to write an sql query to return all the description rows, and all the message rows (could be tens of thousands), then from that create an index, and search the index.

Is my understanding correct? I would first have to get all the data from the database which could be huge.

chobo
  • 29,453
  • 35
  • 118
  • 182
  • Why not use the full text search in your sql database for those fields and Lucene for you documents, that is the intended use. Otherwise you'd need to dump the data. – Erik Kettenburg Mar 13 '12 at 23:25

2 Answers2

5

If you can (I mean if you are open to use anything), then try Solr. It has a built in support for indexing Database content using Data Import Handler. I am using it to index 1Billion+ rows of full text data residing in databases.

Yavar
  • 11,773
  • 4
  • 28
  • 60
  • DO you have a website link for this? The site name doesn't seem to be related to their product – chobo Mar 14 '12 at 16:08
1

Your understanding is correct.

You would have to create an index for Lucene based on a set as you indicated. I would recommend Luke for viewing the indexes you created.

I have a warning though as to why Full Text Search is ultimately better -- Every time a value is updated, you have to update your Lucene index which is external to your database. This is overhead that I didn't want to deal with and ultimately ended up abandoning this very approach.

FTS gives me just as good results, at least for my needs, yours may differ.

Matt
  • 24,106
  • 61
  • 180
  • 291
  • Ya, this is was a concern for me too. It's not searching the data it's just getting the data that maybe an issue. I'm just trying to figure out if data retrieval is gonna wipe out any benefits of using Lucene in this case. I don't have much data at the moment, but would be nice to know for the future. – chobo Mar 14 '12 at 16:04
  • After looking into it a bit more I'll probably use the built in full text search capabilities of mssql 2008. It comes with a stoplist and phrase searching, and is supposed to be pretty fast. I think for this application Lucene is probably overkill. If I would go with Lucene I would probably rebuild the index every couple of hours of days. – chobo Mar 14 '12 at 18:51
  • just a quick note, you dont need to rebuild the index every time, you can just update it. Lucene also has near real time features for this. But if you have a small amount of data and simple queries, SqlServer FTS might be more appropriate for your needs – Jf Beaulac Mar 14 '12 at 20:16