5

Im' trying to do a fulltext search with mysql, to match a string. The problem is that it's returning odd results in the first place.

For example, the string 'passat 2.0 tdi' :

            AND MATCH (
            records_veiculos.titulo, records_veiculos.descricao
            )
            AGAINST (
             'passat 2.0 tdi' WITH QUERY EXPANSION
            )

is returning this as the first result (the others are fine) :

Volkswagen Passat Variant 1.9 TDI- ANO 2003

wich is incorrect, since there's no "2.0" in this example.

What could it be?

edit: Also, since this will probably be a large database (expecting up to 500.000 records), will this search method be the best for itself, or would it be better to install any other search engine like Sphinx? Or in case it doesn't, how to show relevant results?

edit2: For the record, despite the question being marked as answered, the problem with the MySQL delimiters persists, so if anyone has a suggestion on how to escape delimiters, it would be appreciated and worth the 500 points at stake. The sollution I found to increase the resultset was to replace WITH QUERY EXPANSION with IN BOOLEAN MODE, using operators to force the engine to get the words I needed, like :

AND MATCH (
records_veiculos.titulo, records_veiculos.descricao
)
AGAINST (
 '+passat +2.0 +tdi' IN BOOLEAN MODE
)

It didn't solve at all, but at least the relevance of the results as changed significantly.

Riedsio
  • 9,165
  • 1
  • 22
  • 33
yoda
  • 9,912
  • 19
  • 61
  • 90

4 Answers4

8

From the MySQL documentation on Fulltext search:

"The FULLTEXT parser determines where words start and end by looking for certain delimiter characters; for example, “ ” (space), “,” (comma), and “.” (period)."

This means that the period is delimiting the 2 and 0. So it's not looking for '2.0'; it's looking for '2' and '0', and not finding it. WITH QUERY EXPANSION is probably causing relevant related words to show up, thus obviating the need for '2' and '0' to be individual words in the result rankings. A character minimum may also be being enforced.

Robert Elwell
  • 6,443
  • 1
  • 26
  • 32
5

By default I believe mysql only indexes and matches words with 4 or more characters. You could also try escaping the period? It might be ignored this or otherwise using it as a stop character.

methodin
  • 6,473
  • 1
  • 21
  • 25
  • 2
    +1 Can not think why this was negative voted. Very true. `2.0` escaped or not will be ignored unless the server wide default settings have been changed for `FULLTEXT` indexes. – Orbling Dec 10 '10 at 02:05
  • 1
    @Yoda: Have a read here, particularly about `ft_max_word_len`: http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html – Orbling Dec 10 '10 at 02:14
  • any suggestions on how to escape chars? I've tryed using '\', with no success. – yoda Dec 10 '10 at 04:03
  • 1
    Take a look at: http://stackoverflow.com/questions/584976/how-do-you-get-your-fulltext-boolean-search-to-pick-up-the-term-c This doesn't provide an optimal solution but it does provide one. You can also just remove the period all together but you'd have to do it both on input into table and on the search query. – methodin Dec 10 '10 at 15:12
  • Thanks @methodin, saved me a lot of problems :) – yoda Dec 16 '10 at 06:48
4

What is the match rank that it returns for that? Does the match have to contain all "words" my understanding was it worked like Google and only needs to match some of the words.

Having said that, have a mind to the effect of adding WITH QUERY EXPANSION, that automatically runs a second search for "related" words, which may not be what you have typed, but which the fulltext engines deems probably related.

Relevant Documentation: http://dev.mysql.com/doc/refman/5.1/en/fulltext-query-expansion.html

Orbling
  • 19,537
  • 3
  • 48
  • 64
1

The "." is what's matching on 2003 in your query results.

If you're going to do searches on 3 character text strings, you should set ft_min_word_len=3 in your mysql config, restart mysql. Otherwise, a search for "tdi" will return results with "TDI-" but not with just "TDI", because rows with "TDI-" will be indexed but "TDI" alone will not.

After making that config change, you'll have to rebuild your index on that table. (Warning: your index might be significantly larger now.)

Michael Jensen
  • 221
  • 1
  • 2
  • thanks for the input, but it has already been discussed most of those things, and the problem remains. – yoda Dec 17 '10 at 18:04