0

I have a large Apache Jena TDB, I want to build a Lucene index using Apache Jena 2.10.2 for use with the new text search feature. I find the documentation hard to follow.

I first tried to use configuration in code, but had trouble with the dependencies. Any combination of lecene-core and solr-solrj would either result in certain 'classNotFound' errors or a 'StandardAnalyzer overrides final method tokenStream' error. Example of Code:

Dataset ds1 = DatasetFactory.createMem() ;

EntityDefinition entDef = new EntityDefinition("uri", "text", RDFS.label) ;

Directory dir =  new RAMDirectory();

// Have also tried creating the index in a file
File indexDir = new File("luceneIndexes");
Directory dir = FSDirectory.open(indexDir);

// Fails on this line
Dataset ds = TextDatasetFactory.createLucene(ds1, dir, entDef) ;

I think the only solution may be to create an Text Dataset Assembler, but if anyone has advice on creating this in code I would prefer to do it that way.

bmoran
  • 30
  • 7
  • When reporting errors, it helps if you give details e.g. "classNotFound" - which class? – AndyS Jul 31 '13 at 08:01

1 Answers1

1

The example is exactly the one from Jena, which does work.

It looks like you have a confusion of jar versions. Have you tried using maven to resolve the dependencies? Looking at "mvn dependency:tree" shows you what versions are used.

jena-text is built for Lucene 4.3.1 or Solr 4.3.1.

See the POM from: https://repository.apache.org/content/groups/snapshots/org/apache/jena/jena-text/1.0.0-SNAPSHOT/

AndyS
  • 14,989
  • 15
  • 20
  • 2
    Yes I was missing the lucene analyzer dependency, thanks. However I am still unsure of how to solve the bigger problem of creating the index for an existing TDB. I believe the problem could be my lack of understanding EntityDefinition. Could you explain where the entityField and primaryField come from. Should these parameters be specific to my TDB? The other source of error could come while loading the data that a default model does not match mine: 'Model m = dataset.getDefaultModel(); RDFDataMgr.read(m, DBDirectory);'? Any thoughts? – bmoran Jul 31 '13 at 14:35
  • Hi @bmoran have you resolved problem? How you load your TDB into model? – Claudio Pomo Jul 07 '14 at 09:47
  • Yes, this problem was resolved. This problem was more about how to create a Lucene Index (used for implementing text search) based on a TDB, and was resolved by switching dependency versions. Some Jena dependency versions were incompatible. If you are having trouble loading a TDB into an existing model, I believe there are examples in the Jena documentation, but I haven't used Jena in a few months. I never worked with loading a TDB into an existing model though. – bmoran Jul 08 '14 at 14:19