0

I would like to search in products codes - mix of chars and numbers (for example: A210/444, Alexx 1982 X, ...). (Btw: Have anybody some best-practices for searching in this type of data?)

I have a index with index_analyzer and search_analyzer:

{
    "settings": {
        "analysis": {
            "analyzer": {
                "index_analyzer": {
                    "tokenizer": "standard",
                    "filter": [
                        "standard",
                        "lowercase",
                        "asciifolding",
                        "custom_word_delimiter",
                        "custom_edgengram"
                    ]
                },
                "search_analyzer": {
                    "tokenizer": "standard",
                    "filter": [
                        "standard",
                        "asciifolding",
                        "custom_word_delimiter",
                        "lowercase"
                    ]
                }
            },
            "filter": {
                "custom_word_delimiter": {
                    "type": "word_delimiter",
                    "preserve_original": "true"
                },
                "custom_edgengram": {
                    "type": "edgeNGram",
                    "min_gram": "2",
                    "max_gram": "30"
                }
            }
        }
    }
}

Problem is with automatic typing. index_analyzer is ok, all values is of type word.

curl -XGET 'http://localhost:9200/myindex/_analyze?analyzer=index_analyzer&pretty' -d 'Alexx 1982 X' | elasticat.rb

+---+------------+------+------+
| 1 | al         | 0–5  | word |
| 1 | ale        | 0–5  | word |
| 1 | alex       | 0–5  | word |
| 1 | alexx      | 0–5  | word |
| 2 | 19         | 6–10 | word |
| 2 | 198        | 6–10 | word |
| 2 | 1982       | 6–10 | word |
+---+------------+------+------+

But, search_analyzer (no edgeNGram) ...

curl -XGET 'http://localhost:9200/myindex/_analyze?analyzer=search_analyzer&pretty' -d 'Alexx 1982 X' | elasticat.rb
+---+------------+-------+------------+
| 1 | alexx      | 0–5   | <ALPHANUM> |
| 2 | 1982       | 6–10  | <NUM>      |
| 3 | x          | 11–12 | <ALPHANUM> |
+---+------------+-------+------------+

... recognize 1982 as number and this make problems in searching (with _all placeholder). In search results is no hit when I try search only 1982.

Is any way to force use only some string type?

Thanks for any idea!

Martin

Dingo
  • 2,426
  • 1
  • 17
  • 14
  • I dont think it should be causing any issues. But you said you're searching on `_all` field, are you setting it to use your custom analyzers? – Evaldas Buinauskas Oct 27 '15 at 20:01
  • I do get results when searching for `myindex/_search?q=1982`. Can you show the request you're making? – Val Oct 28 '15 at 05:09

0 Answers0