Questions tagged [lucene]

The term Lucene refers to the open source Java fulltext search engine library, but also to the entire eco-system that grew around it, including lucene.net, solr, elasticsearch and zend-search-lucene.

The term "Lucene" refers to the open source Java fulltext search engine library, and also to the entire eco-system that grew around it, including , , and . "Lucene" may also be used to refer to top-level projects like Nutch and Tika which were once sub-projects of Lucene.

Use the "Lucene" tag if either:

  • The question is about the Java library
  • The question is about a port of the library, but would make sense to people who know the Java library (many Lucene.NET questions match this criteria).
  • The question is so general it doesn't apply to a specific implementation (example).

References:

Basic Demo:

A basic "getting started" demo showing how to build and query an index is provided as part of the official documentation:

Basic Demo documentation - (this link is for Lucene v8.7.0. Newer versions may be available)

Links to the demo's source files are provided in the above documentation.

The source code can also be found here on GitHub.

Luke - a Lucene GUI Client:

Luke is a GUI client application which can be used to explore your Lucene indexes. Recent versions of Luke are now provided as part of each binary release, which can be downloaded from here.

After downloading the binary release, unzip it, and go to the luke directory. Launch the client using the provided luke.bat or luke.sh scripts.

11633 questions
44
votes
3 answers

Lucene Score results

In Lucene if you had multiple indexes that covered only one partition each. Why does the same search on different indexes return results with different scores? The results from different servers match exactly. i.e. if I searched for : Name - John…
Stephen Hendry
  • 731
  • 7
  • 10
43
votes
8 answers

Is there a pure Python Lucene?

The ruby folks have Ferret. Someone know of any similar initiative for Python? We're using PyLucene at current, but I'd like to investigate moving to pure Python searching.
PEZ
  • 15,930
  • 6
  • 39
  • 63
43
votes
5 answers

Kibana query exact match

I would like to know how to query a field to exactly match a string. I'm actually trying to query like this: url : "http://www.domain_name.com" Which returns all string starting with http://www.domain_name.com .
smace
  • 918
  • 1
  • 10
  • 15
42
votes
11 answers

What is best and most active open source .Net search technology?

I'm trying to decide on an open source search/indexing technology for a .Net project. It seems like the standard out there for Java projects is Lucene, but as far as .Net is concerned, the Lucene.Net project seems to be pretty inactive. Is this…
jamesaharvey
  • 13,363
  • 14
  • 47
  • 63
41
votes
3 answers

Solr/Solrj: How can I determine the total number of documents in an index?

How can I determine the total number of documents in a Solr index using Solrj? After hours of searching on my own, I actually have an answer (given below); I'm only posting this question so others can find the solution more easily.
George Armhold
  • 29,784
  • 45
  • 147
  • 224
41
votes
4 answers

Search engine Lucene vs Database search

I am using a MySQL database and have been using database driven search. Any advantages and disadvantages of database engines and Lucene search engine? I would like to have suggestions about when and where to use them?
Santosh Linkha
  • 13,692
  • 17
  • 72
  • 113
41
votes
3 answers

What are segments in Lucene?

What are segments in Lucene? What are the benefits of segments?
Mahdi Amrollahi
  • 2,602
  • 5
  • 23
  • 34
40
votes
5 answers

SQL Server 2008 Full Text Search (FTS) versus Lucene.NET

I know there have been questions in the past about SQL 2005 versus Lucene.NET but since 2008 came out and they made a lot of changes to it and was wondering if anyone can give me pros/cons (or link to an article).
ajma
  • 11,695
  • 11
  • 67
  • 87
40
votes
2 answers

How can I search on a list of values using Solr/Lucene?

Given the following query: (field:value1 OR field:value2 OR field:value3 OR ... OR field:value50) Can this be broken down into something less verbose? Basically I have hundreds of category IDs, and I need to search for items under large groups of…
Michael Moussa
  • 4,077
  • 4
  • 32
  • 50
39
votes
4 answers

How to use a Lucene Analyzer to tokenize a String?

Is there a simple way I could use any subclass of Lucene's Analyzer to parse/tokenize a String? Something like: String to_be_parsed = "car window seven"; Analyzer analyzer = new StandardAnalyzer(...); List tokenized_string =…
Felipe Hummel
  • 4,216
  • 5
  • 27
  • 33
39
votes
3 answers

TFIDF for Large Dataset

I have a corpus which has around 8 million news articles, I need to get the TFIDF representation of them as a sparse matrix. I have been able to do that using scikit-learn for relatively lower number of samples, but I believe it can't be used for…
apurva.nandan
  • 931
  • 1
  • 10
  • 19
38
votes
1 answer

ElasticSearch - Searching For Human Names

I have a large database of names, primarily from Scotland. We're currently producing a prototype to replace an existing piece of software which carries out the search. This is still in production and we're aiming to get our results as closes as…
Nathan Smith
  • 7,717
  • 3
  • 23
  • 43
36
votes
1 answer

solr search for documents where a field doesn't exist

How do I search for those document in a SOLR index which do not contain a specified field?
Midhat
  • 16,422
  • 17
  • 84
  • 113
36
votes
4 answers

Difference between BooleanClause.Occur.Must and BooleanClause.Occur.SHOULD in lucene

Can anyone explain the difference between the BooleanClause.Occur.Must and BooleanClause.Occur.SHOULD in lucene in BooleanQuery with an example?
Jagadesh
  • 5,841
  • 6
  • 24
  • 30
34
votes
4 answers

Indexing .PDF, .XLS, .DOC, .PPT using Lucene.NET

I've heard of Lucene.Net and I've heard of Apache Tika. The question is - how do I index these documents using C# vs Java? I think the issue is that there is no .Net equivalent of Tika which extracts relevant text from these document types. UPDATE…
dana
  • 14,964
  • 4
  • 53
  • 82