80

Okay, so I'm trying to make a full text search in multiple columns, something simple like this:

SELECT * FROM pages WHERE MATCH(head, body) AGAINST('some words' IN BOOLEAN MODE)

Now i want to order by relevance, (how many of the words are found?) which I have been able to do with something like this:

SELECT * , MATCH (head, body) AGAINST ('some words' IN BOOLEAN MODE) AS relevance 
FROM pages
WHERE MATCH (head, body) AGAINST ('some words' IN BOOLEAN MODE)
ORDER BY relevance

Now here comes the part where I get lost, I want to prioritize the relevance in the head column.

I guess I could make two relevance columns, one for head and one for body, but at that point I'd be doing somewhat the same search in the table three times, and for what i'm making this function, performance is important, since the query will both be joined and matched against other tables.

So, my main question is, is there a faster way to search for relevance and prioritize certain columns? (And as a bonus possibly even making relevance count number of times the words occur in the columns?)

Any suggestions or advice would be great.

Note: I will be running this on a LAMP-server. (WAMP in local testing)

Kristoffer la Cour
  • 2,491
  • 3
  • 23
  • 35
  • Do you really have to put MATCH...AGAINST in both the SELECT clause **and** in the WHERE clause? Can you not alias it in the SELECT clause and refer to the alias in the WHERE clause? I'm trying to use prepared statements and this seems redundant/strange to me. – S. Imp Sep 08 '18 at 00:19
  • 2
    No, as stated in MySQL documentation since 5.5, MATCH ... AGAINST will be computed once when both in SELECT and WHERE, so no extra overhead. – Bob2u Jan 22 '20 at 21:57

4 Answers4

157

This might give the increased relevance to the head part that you want. It won't double it, but it might possibly good enough for your sake:

SELECT pages.*,
       MATCH (head, body) AGAINST ('some words') AS relevance,
       MATCH (head) AGAINST ('some words') AS title_relevance
FROM pages
WHERE MATCH (head, body) AGAINST ('some words')
ORDER BY title_relevance DESC, relevance DESC

-- alternatively:
ORDER BY title_relevance + relevance DESC

An alternative that you also want to investigate, if you've the flexibility to switch DB engine, is Postgres. It allows to set the weight of operators and to play around with the ranking.

Denis de Bernardy
  • 67,991
  • 12
  • 114
  • 140
  • 14
    As an aside, MySQL 5.6 supports full text searches on InnoDB tables! – Jabari Feb 25 '13 at 21:12
  • 1
    Can you provide a SQL fiddle for this? – User Sep 24 '15 at 13:25
  • How much of a negative impact do multiple searches have? I would need 4 matches ion my SELECT as i have 4 differen weight factors. Would that make performance much lower? – ToBe Nov 23 '15 at 09:56
  • @ToBe I have seen on other similar questions more than one person say that there is no extra overhead with using multiple `MATCH` statements, due to the way MySQL works internally. – BadHorsie Jun 08 '16 at 18:27
  • Make sure you run these two. `ALTER TABLE talk_webpages ADD FULLTEXT(head)` and `ALTER TABLE talk_webpages ADD FULLTEXT(head, body)` – Supun Kavinda Apr 18 '19 at 03:40
  • @Denis Thank you for this answer. even In my scenario I want title_relevance to shows as first and the rest next. but even in title_relevance I needed them to order by their posted_date (Latest should shows first). and next group should have that logic too. How to achieve that? – Banjo Nov 06 '20 at 05:12
15

Just adding for who might need.. Don't forget to alter the table!

ALTER TABLE table_name ADD FULLTEXT(column_name);
Termininja
  • 5,689
  • 12
  • 40
  • 45
Camilla
  • 479
  • 5
  • 14
  • 3
    if you execute above command more than once, it will be create multiple indexes for same column(s). So just run this command only once. – hakiko Sep 29 '18 at 19:02
  • Better yet, use CREATE FULLTEXT INDEX indexname on tablename(column_name(s)). You should also really check if the index exists before you try to create it. You can check if it exists using: SELECT INDEX_NAME FROM INFORMATION_SCHEMA.STATISTICS WHERE `TABLE_CATALOG` = 'def' AND `TABLE_SCHEMA` = DATABASE() AND `TABLE_NAME` = 'tablename' AND `INDEX_NAME` = 'indexname'; – Dave Hilditch Apr 01 '20 at 21:19
10

I have never done so, but it seems like

MATCH (head, head, body) AGAINST ('some words' IN BOOLEAN MODE)

Should give a double weight to matches found in the head.


Just read this comment on the docs page, Thought it might be of value to you:

Posted by Patrick O'Lone on December 9 2002 6:51am

It should be noted in the documentation that IN BOOLEAN MODE will almost always return a relevance of 1.0. In order to get a relevance that is meaningful, you'll need to:

SELECT MATCH('Content') AGAINST ('keyword1 keyword2') as Relevance 
FROM table 
WHERE MATCH ('Content') AGAINST('+keyword1+keyword2' IN BOOLEAN MODE) 
HAVING Relevance > 0.2 
ORDER BY Relevance DESC 

Notice that you are doing a regular relevance query to obtain relevance factors combined with a WHERE clause that uses BOOLEAN MODE. The BOOLEAN MODE gives you the subset that fulfills the requirements of the BOOLEAN search, the relevance query fulfills the relevance factor, and the HAVING clause (in this case) ensures that the document is relevant to the search (i.e. documents that score less than 0.2 are considered irrelevant). This also allows you to order by relevance.

This may or may not be a bug in the way that IN BOOLEAN MODE operates, although the comments I've read on the mailing list suggest that IN BOOLEAN MODE's relevance ranking is not very complicated, thus lending itself poorly for actually providing relevant documents. BTW - I didn't notice a performance loss for doing this, since it appears MySQL only performs the FULLTEXT search once, even though the two MATCH clauses are different. Use EXPLAIN to prove this.

So it would seem you may not need to worry about calling the fulltext search twice, though you still should "use EXPLAIN to prove this"

jisaacstone
  • 3,966
  • 2
  • 23
  • 38
  • 1
    Adding head twice to the match() function does not work, sadly. Maybe because the query doesn't count the number of times the words are occurring? And I've been using that page you refer to as well, but i can for some reason not make it work... I have not indexed my columns yet, and therefor can not search without the "IN BOOLEAN MODE' tag... – Kristoffer la Cour Jun 07 '11 at 02:04
  • I think a non-booleen search would return # of occurrences, but booleen does not? – jisaacstone Jun 07 '11 at 02:08
  • I will look more into it tomorrow, but i'm going to hold for now. Thanks for the answer, we'll see if it helps me when i get a hold of this. – Kristoffer la Cour Jun 07 '11 at 02:21
  • I was having a problem using IN BOOLEAN MODE and then ordering by relevance and this solved my problem with relevance always being returned as 1. Thanks. – Jazzy Nov 01 '12 at 21:24
  • Generating a score field solved my issue: I was getting results, but a lot of them were complete noise. Thanks, +1 – Chris Baker Jul 23 '13 at 19:40
  • i have similar problem when search "some words", sometimes results with "some" too comes in top when words with "some words" listed in the below with low relevance nt sure why ? – mahen3d Dec 01 '14 at 06:18
  • .The code "keyword1 keyword2" will just output results where keyword1 is emediate keyword2, so automatically, those keywords not near each other will not be part of the results. – Jones G Feb 19 '18 at 15:53
4

I was just playing around with this, too. One way you can add extra weight is in the ORDER BY area of the code.

For example, if you were matching 3 different columns and wanted to more heavily weight certain columns:

SELECT search.*,
MATCH (name) AGAINST ('black' IN BOOLEAN MODE) AS name_match,
MATCH (keywords) AGAINST ('black' IN BOOLEAN MODE) AS keyword_match,
MATCH (description) AGAINST ('black' IN BOOLEAN MODE) AS description_match
FROM search
WHERE MATCH (name, keywords, description) AGAINST ('black' IN BOOLEAN MODE)
ORDER BY (name_match * 3  + keyword_match * 2  + description_match) DESC LIMIT 0,100;
Noah King
  • 91
  • 1
  • 1
  • Isn't this a really heavy query? – Beanow Oct 15 '13 at 15:29
  • 5
    Move the math into the select statement and it lightens the load a lot. `SELECT search.*, (MATCH (name) AGAINST ('black' IN BOOLEAN MODE) * 3) + (MATCH (keywords) AGAINST ('black' IN BOOLEAN MODE)*2 + MATCH (description) AGAINST ('black' IN BOOLEAN MODE)) AS totalScore , FROM search WHERE MATCH (name, keywords, description) AGAINST ('black' IN BOOLEAN MODE) ORDER BY totalScore DESC LIMIT 0,100;` – invertedSpear Oct 14 '14 at 19:01