0

Im currently using a query for an autocomplete box with like. However I want to use the match, against which should be faster but I'm running against some issues with the sorting.

I want to rank a query like this:

  1. [query] %
  2. [query]%
  3. % [query]%
  4. %[query]%

For now I use

SELECT * FROM table 
WHERE name LIKE '%query%'
ORDER BY (case 
WHEN name LIKE 'query %' THEN 1 
WHEN name LIKE 'query%' THEN 2 
WHEN name LIKE '% query%' THEN 3 
ELSE 4 END) ASC

When I use...

SELECT * FROM table 
WHERE MATCH(name) AGAINST('query*' IN BOOLEAN MODE)

...all results get the same 'ranking score'.

For example searching for Natio returns Pilanesberg National Park and National Park Kruger with the same score while I want the second result as first becouse it starts with the query.

How can I achieve this?

NLAnaconda
  • 1,255
  • 2
  • 11
  • 24

1 Answers1

1

I had your same problem and I had to approach it in a different way.

The documentation of MySQL says:

The term frequency (TF) value is the number of times that a word appears in a document. The inverse document frequency (IDF) value of a word is calculated using the following formula, where total_records is the number of records in the collection, and matching_records is the number of records that the search term appears in.

${IDF} = log10( ${total_records} / ${matching_records} )

When a document contains a word multiple times, the IDF value is multiplied by the TF value:

${TF} * ${IDF}

Using the TF and IDF values, the relevancy ranking for a document is calculated using this formula:

${rank} = ${TF} * ${IDF} * ${IDF}

And this is followed by an example where it explains the above declaration: it search for the word 'database' in different fields and returns a rank based upon the results.

In your example the words "Pilanesberg National Park", "National Park Kruger" will return the same rank against ('Natio' IN BOOLEAN MODE)* because the rank is based not on the common sense similarity of the word (or in this case you'd expected to tell the database what's meaning -for you- "similar to"), but is based on the above formula, related to the frequency.

And note also that the value of the freqency is affected by the type of index (InnoDB or MyISAM) and by the version of MySQL (in older version you cannot use Full-text indexes with InnoDB tables).

Regarding your problem, you can use MySQL user defined variables or functions or procedures in order to evaluate the rank basing upon your idea of rank. Examples here, here or here. And also here.

See also:

MySQL match() against() - order by relevance and column?

MYsql FULLTEXT query yields unexpected ranking; why?

Community
  • 1
  • 1
Paolo Gibellini
  • 300
  • 11
  • 19