29

Does anybody know whether one exists?

I've been googling this for monthes...

Thanks

Roey
  • 829
  • 1
  • 11
  • 20
  • 1
    It's time the open source community makes one. It seems to me that true stemming is _very_ difficult, to the point of requiring vast amounts of manpower, but that some basic stemming is possible, and perhaps a minimal stemmer is better than zero stemming. I'll probably start working on this on my own. If anyone is interested, please contact me. – Asaf Bartov Jan 27 '10 at 14:15
  • ...And as a first step, I'll try to use hspell(3)'s enumeration. It's effectively a ready stemmer! – Asaf Bartov Jan 27 '10 at 14:19

2 Answers2

22

Update
HebMorph

Out of curiosity sparked by your question, I contacted Itamar Syn-Hershko who was active on Lucene mailing lists about a year ago when he was working on a Hebrew analyzer for Lucene. I asked him if he completed his analyzer. Here are some relevant bits from his response:

To make a long story short, no I didn't. There is no decent free / open-source Hebrew analyzer for Lucene, that I can say for sure. I'm not sure what is your background on the subject, but believe me when I say there is no easy way of doing this; it might be also the Lucene isn't built for Hebrew searches, but I do agree a solution has to be given. Granted, the safest way to index and search Hebrew texts is to use a specialized stemmer, and integration with Lucene is not the easiest even after you've done this. There are a few very good solutions for Hebrew search in the market, only one that I know of is using Lucene in it's core; I've recently tried contacting them, no response yet...

The commercial product based on Lucene that is mentioned is called ATTIVIO and the ATTIVIO website does claim to have Hebrew support. At SIGTRS (Hebrew Text Retrieval interest group), there has been some discussion regarding ATTIVIO that claims it is Lucene based.

So, apparently, it is possible to create a decent Hebrew analyzer for Lucene, but there is no free analyzer available at this time.

Avram
  • 4,191
  • 29
  • 40
Naaff
  • 8,961
  • 3
  • 35
  • 42
0

dtsearch has a hebrew stemming plugin call "pensim". It appears to be developed by "wizcomtech.com".

mosheb
  • 666
  • 6
  • 12