I would like to search in products codes - mix of chars and numbers (for example: A210/444
, Alexx 1982 X
, ...). (Btw: Have anybody some best-practices for searching in this type of data?)
I have a index with index_analyzer
and search_analyzer
:
{
"settings": {
"analysis": {
"analyzer": {
"index_analyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"asciifolding",
"custom_word_delimiter",
"custom_edgengram"
]
},
"search_analyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"asciifolding",
"custom_word_delimiter",
"lowercase"
]
}
},
"filter": {
"custom_word_delimiter": {
"type": "word_delimiter",
"preserve_original": "true"
},
"custom_edgengram": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "30"
}
}
}
}
}
Problem is with automatic typing.
index_analyzer
is ok, all values is of type word.
curl -XGET 'http://localhost:9200/myindex/_analyze?analyzer=index_analyzer&pretty' -d 'Alexx 1982 X' | elasticat.rb
+---+------------+------+------+
| 1 | al | 0–5 | word |
| 1 | ale | 0–5 | word |
| 1 | alex | 0–5 | word |
| 1 | alexx | 0–5 | word |
| 2 | 19 | 6–10 | word |
| 2 | 198 | 6–10 | word |
| 2 | 1982 | 6–10 | word |
+---+------------+------+------+
But, search_analyzer
(no edgeNGram) ...
curl -XGET 'http://localhost:9200/myindex/_analyze?analyzer=search_analyzer&pretty' -d 'Alexx 1982 X' | elasticat.rb
+---+------------+-------+------------+
| 1 | alexx | 0–5 | <ALPHANUM> |
| 2 | 1982 | 6–10 | <NUM> |
| 3 | x | 11–12 | <ALPHANUM> |
+---+------------+-------+------------+
... recognize 1982
as number and this make problems in searching (with _all
placeholder). In search results is no hit when I try search only 1982
.
Is any way to force use only some string type?
Thanks for any idea!
Martin