Questions tagged [lucene]

The term Lucene refers to the open source Java fulltext search engine library, but also to the entire eco-system that grew around it, including lucene.net, solr, elasticsearch and zend-search-lucene.

The term "Lucene" refers to the open source Java fulltext search engine library, and also to the entire eco-system that grew around it, including , , and . "Lucene" may also be used to refer to top-level projects like Nutch and Tika which were once sub-projects of Lucene.

Use the "Lucene" tag if either:

  • The question is about the Java library
  • The question is about a port of the library, but would make sense to people who know the Java library (many Lucene.NET questions match this criteria).
  • The question is so general it doesn't apply to a specific implementation (example).

References:

Basic Demo:

A basic "getting started" demo showing how to build and query an index is provided as part of the official documentation:

Basic Demo documentation - (this link is for Lucene v8.7.0. Newer versions may be available)

Links to the demo's source files are provided in the above documentation.

The source code can also be found here on GitHub.

Luke - a Lucene GUI Client:

Luke is a GUI client application which can be used to explore your Lucene indexes. Recent versions of Luke are now provided as part of each binary release, which can be downloaded from here.

After downloading the binary release, unzip it, and go to the luke directory. Launch the client using the provided luke.bat or luke.sh scripts.

11633 questions
67
votes
2 answers

How to specify two Fields in Lucene QueryParser?

I read How to incorporate multiple fields in QueryParser? but i didn't get it. At the moment i have a very strange construction like: parser = New QueryParser("bodytext", analyzer) parser2 = New QueryParser("title", analyzer) query =…
Tyzak
  • 2,342
  • 7
  • 35
  • 50
66
votes
5 answers

Why is Solr so much faster than Postgres?

I recently switched from Postgres to Solr and saw a ~50x speed up in our queries. The queries we run involve multiple ranges, and our data is vehicle listings. For example: "Find all vehicles with mileage < 50,000, $5,000 < price < $10,000,…
cberner
  • 2,758
  • 3
  • 19
  • 33
65
votes
6 answers

Why are document stores like Lucene / Solr not included in NoSQL conversations?

All of us have come across the recent hype of no-SQL solutions lately. MongoDB, CouchDB, BigTable, Cassandra, and others have been listed as no-SQL options. Here's an…
Jon Davis
  • 6,132
  • 5
  • 38
  • 58
62
votes
5 answers

How would one use Lucene.NET to help implement search on a site like Stack Overflow?

I've asked a simlar question on Meta Stack Overflow, but that deals specifically with whether or not Lucene.NET is used on Stack Overflow. The purpose of the question here is more of a hypotetical, as to what approaches one would make if they were…
casperOne
  • 70,959
  • 17
  • 175
  • 239
62
votes
4 answers

Is there a good indexing / search engine for Node.js?

I'm looking for a good open source (with LGPL or a permissive license) indexing engine for a node.js application, something like Lucene. I'm looking for in-process indexing and search and am not interested in indexing servers like Sphinx or Solr. I…
Venemo
  • 17,191
  • 9
  • 78
  • 117
59
votes
5 answers

Retrieving specific fields in a Solr query?

I am running a Solr instance on Jetty and when I search using the Solr admin panel, it returns the entire document. What should I do to get only specified fields from each Solr document returned by the search?
Mohit Ranka
  • 3,655
  • 12
  • 39
  • 40
57
votes
4 answers

Is using a load balancer with ElasticSearch unnecessary?

I have a cluster of 3 ElasticSearch nodes running on AWS EC2. These nodes are setup using OpsWorks/Chef. My intent is to design this cluster to be very resilient and elastic (nodes can come in and out when needed). From everything I've read about…
user2719100
  • 1,644
  • 3
  • 18
  • 25
55
votes
1 answer

Understanding Segments in Elasticsearch

I was under the assumption that each shard in Elasticsearch is an index. But I read somewhere that each segment is a Lucene index. What exactly is a segment? How does it effect search performance? I have indices that reach around 450GB in size…
Abhijeet Rastogi
  • 14,835
  • 24
  • 73
  • 123
52
votes
3 answers

Elasticsearch always returning "mapping type is missing"

I'm following the advice given here in order to find partial words with elasticsearch: ElasticSearch n-gram tokenfilter not finding partial words I've created a simple bash script that attempts to run a version of this: curl -XDELETE…
Travis
  • 7,021
  • 11
  • 37
  • 52
52
votes
7 answers

Solr Collection vs Cores

I struggle with understanding the difference between collections and cores. If I understand it correctly, cores are multiple indexes. Collection consists of cores, so essentially they share the same logic in separation, i.e. separate cores and…
NeatNerd
  • 2,137
  • 2
  • 23
  • 45
51
votes
5 answers

How to do query auto-completion/suggestions in Lucene?

I'm looking for a way to do query auto-completion/suggestions in Lucene. I've Googled around a bit and played around a bit, but all of the examples I've seen seem to be setting up filters in Solr. We don't use Solr and aren't planning to move to…
Mat Mannion
  • 3,236
  • 2
  • 28
  • 29
45
votes
6 answers

How to evaluate hosted full text search solutions?

What are the options when it comes to SaaS/hosted full text search? How should I evaluate the different options available? I'm looking for something that uses Lucene, solr, or sphinx on the backend, and provides a REST API for submitting documents…
James Cooper
  • 2,310
  • 2
  • 23
  • 23
45
votes
3 answers

Best practices for searchable archive of thousands of documents (pdf and/or xml)

Revisiting a stalled project and looking for advice in modernizing thousands of "old" documents and making them available via web. Documents exist in various formats, some obsolete: (.doc, PageMaker, hardcopy (OCR), PDF, etc.). Funds are available…
Meltemi
  • 36,348
  • 48
  • 182
  • 274
44
votes
4 answers

Entity Extraction/Recognition with free tools while feeding Lucene Index

I'm currently investigating the options to extract person names, locations, tech words and categories from text (a lot articles from the web) which will then feeded into a Lucene/ElasticSearch index. The additional information is then added as…
Karussell
  • 16,303
  • 14
  • 88
  • 188
44
votes
3 answers

Lucene indexing: Store and indexing modes explained

I think I'm still not understanding the lucene indexing options. The following options are Store.Yes Store.No and Index.Tokenized Index.Un_Tokenized Index.No Index.No_Norms I don't really understand the store option. Why would you ever want to…
Boris Callens
  • 82,870
  • 79
  • 201
  • 297