5

Can anyone provide a simple comparative analysis of these search engines? What advantages does either framework have?

BTW, I've seen the following basic explanations of choosing mg4j from several academic papers:

  • combining indices over the same collection
  • multi-index queries

Update:

These slides (from mir2ed.org) contain a more fresh overview of open source search engines including Lucene and mg4j on benchmarking various aspects: memory & CPU, index size, search performance, search quality etc.

Nikita Zhiltsov
  • 644
  • 9
  • 15

1 Answers1

3

Jeff Dalton reviewed many open source search engines including Lucene and mg4j in 2007, and updated the comparison in 2009.

I have not used mg4j. I have used Lucene, though. The number one feature of Lucene IMO is its wide adoption and wonderful community of users/developers/committers. This means that there is a fair chance that somebody worked on a use case similar to yours using Lucene. Current weak points of Lucene are its scoring model and its ability to scale to large collections of text. The Lucene developers are working on these issues.

I believe that the choice of a search library is very dependent on your (academic or industrial) setting, the other parts of your application and your use case.

T J
  • 40,740
  • 11
  • 73
  • 131
Yuval F
  • 20,437
  • 4
  • 41
  • 67
  • Thanks. What about [SOLR](http://lucene.apache.org/solr/features.html)? Does it solve these issues of Lucene? – Nikita Zhiltsov Feb 17 '11 at 18:05
  • Solr is a search engine that adds functionality to Lucene. It adds some scaling abilities to Lucene and is much more easy to start working with. Solr Cloud - http://wiki.apache.org/solr/SolrCloud is an effort to make Solr much more robust and scalable. The scoring in Solr is identical to Lucene. – Yuval F Feb 17 '11 at 19:49
  • [elasticsearch](https://www.elastic.co/) is another search engine built on top of Lucene. – Bax Jan 05 '16 at 23:17