Lucene - simpleAnalyzer - How to get matched word(s)?

Question

I can't get offset of or directly the word itself by using the following algorithm. Any help would be appreciated

   ...
   Analyzer analyzer = new SimpleAnalyzer();
   MemoryIndex index = new MemoryIndex();

   QueryParser parser = new QueryParser(Version.LUCENE_30, "content", analyzer);

   float score = index.search(parser.parse("+content:" + target));

   if(score > 0.0f)
        System.out.println("How to know matched word?");

Paulius Matulionis · Answer 1 · 2012-03-28T19:01:08.920

Here is whole in memory index and search example. I have just written in for my self and it works perfectly. I understand that you need to store index in memory, but the question is why you need MemoryIndex for that? You simply use RAMDirectory instead and your index will be stored in memory, so when you perform your search, index will be loaded from RAMDirectory (memory).

    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34);
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer);
    RAMDirectory directory = new RAMDirectory();
    try {
        IndexWriter indexWriter = new IndexWriter(directory, config);
        Document doc = new Document();
        doc.add(new Field("content", text, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_OFFSETS));
        indexWriter.addDocument(doc);
        indexWriter.optimize();
        indexWriter.close();

        QueryParser parser = new QueryParser(Version.LUCENE_34, "content", analyzer);
        IndexSearcher searcher = new IndexSearcher(directory, true);
        IndexReader reader = IndexReader.open(directory, true);

        Query query = parser.parse(word);
        TopScoreDocCollector collector = TopScoreDocCollector.create(10000, true);
        searcher.search(query, collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;
        if (hits != null && hits.length > 0) {
            for (ScoreDoc hit : hits) {
                int docId = hit.doc;
                Document hitDoc = searcher.doc(docId);

                TermFreqVector termFreqVector = reader.getTermFreqVector(docId, "content");
                TermPositionVector termPositionVector = (TermPositionVector) termFreqVector;
                int termIndex = termFreqVector.indexOf(word);
                TermVectorOffsetInfo[] termVectorOffsetInfos = termPositionVector.getOffsets(termIndex);

                for (TermVectorOffsetInfo termVectorOffsetInfo : termVectorOffsetInfos) {
                    concordances.add(processor.processConcordance(hitDoc.get("content"), word, termVectorOffsetInfo.getStartOffset(), size));
                }
            }
        }

        analyzer.close();
        searcher.close();
        directory.close();

Hi, thanks for your comment. Can you convert your sample to memoryIndex usage? That's why I use memoryIndex for full text search, I cannot use hits or doc like those in your code. — Javatar, Mar 21 '12 at 12:09
Hi, thanks, I use memoryIndex because of performance and memory issues I have learnt that MemoryIndex is more efficient and convenient rather than RAMDirectory, and that's why I prefer to choose MemoryIndex. — Javatar, Mar 29 '12 at 08:44
Then my suggestion will be to get Lucene in Action book. It will save you a lot of time and issues of using Lucene. — Paulius Matulionis, Mar 29 '12 at 10:47
This class is a replacement/substitute for a large subset of RAMDirectory functionality. It is designed to enable maximum efficiency for on-the-fly matchmaking combining structured and fuzzy fulltext search in realtime streaming applications such as Nux XQuery based XML message queues, publish-subscribe systems for Blogs/newsfeeds, text chat, data acquisition and distribution systems, application level routers, firewalls, classifiers, etc... http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/all/org/apache/lucene/index/memory/MemoryIndex.html — Javatar, Dec 04 '12 at 09:59
The close statements should be in a finally block, each preceded by a not null test. — Paulo Merson, Mar 11 '13 at 20:14
Please notice that they are all created with `new` so they aren't ever gonna be null. — Paulius Matulionis, Mar 11 '13 at 20:19
You can use the MemoryIndex with a searcher. MemoryIndex provides a createSearcher() method. And then you have all the TopDocs functionality you need. — nbz, Apr 16 '14 at 10:36
@nbz could you provide an example of searching a MemoryIndex for docs ? — nhaberl, May 05 '17 at 15:17

Lucene - simpleAnalyzer - How to get matched word(s)?

1 Answers1