1

I want to call insert select and I try to use this select (with help from this INSERT SELECT query when one column is unique)

SELECT minids.userid, username, password, full_name, country, email,
      (select openclipart_files.id from aiki_users, openclipart_files
       where aiki_users.userid = users.userid and avatar like '%svg' AND
             filename = avatar) as avatar,
      homepage, usergroup, notify, nsfwfilter
FROM aiki_users users
INNER JOIN (SELECT MIN(userid) as userid FROM aiki_users GROUP by username) 
           minids ON minids.userid = users.userid;

I use it with insert

INSERT INTO openclipart_users(id, user_name, password, full_name, country,
email, avatar, homepage, user_group, notify, nsfw_filter) SELECT ...

It take very long to execute (I cancel it after few minutes)

aiki_users table have 100k rows and openclipart_files have 30k rows, basically I just copy all stuff from aiki_users to openclipart_users skipping duplicates and I want avatar to be id of the file where in old table is a string (only about 300 users have svg avatars about 1k users have non '' avatar but I only what svg).

IS there any way to quickly insert that avatar (without it INSERT SELECT execute in few seconds) into openclipart_users, any solustion that will work will be fine (by quickly I mean less then a minute).

EDIT output of explain over SELECT

+----+--------------------+-------------------+--------+---------------+----------+---------+----------------------------------+--------+-------------+
| id | select_type        | table             | type   | possible_keys | key      | key_len | ref                              | rows   | Extra       |
+----+--------------------+-------------------+--------+---------------+----------+---------+----------------------------------+--------+-------------+
|  1 | PRIMARY            | <derived3>        | ALL    | NULL          | NULL     | NULL    | NULL                             | 106689 |             |
|  1 | PRIMARY            | users             | eq_ref | PRIMARY       | PRIMARY  | 4       | minids.userid                    |      1 |             |
|  3 | DERIVED            | aiki_users        | index  | NULL          | username | 302     | NULL                             | 111273 | Using index |
|  2 | DEPENDENT SUBQUERY | openclipart_files | ALL    | NULL          | NULL     | NULL    | NULL                             |  37715 |             |
|  2 | DEPENDENT SUBQUERY | aiki_users        | eq_ref | PRIMARY       | PRIMARY  | 4       | openclipart_staging.users.userid |      1 | Using where |
+----+--------------------+-------------------+--------+---------------+----------+---------+----------------------------------+--------+-------------+
Community
  • 1
  • 1
jcubic
  • 51,975
  • 42
  • 183
  • 323
  • How long does it take to run the SELECT only? What does EXPLAIN say about it? – Vatev Jul 26 '12 at 17:15
  • How much time does it take to **fetch all** rows of your select? – hmmftg Jul 26 '12 at 17:16
  • 1
    As with almost any SQL DB out there, subselects are notoriously bad when either your source tables are large or your subselect tables are large. Compound this with using the LIKE operator, which is a very expensive conditional operator compared to a straight comparison (try `RIGHT(avatar, 3) = 'svg'` and see how it compares). See if you can possibly refactor it to using nothing but joins (including joining on subselects). For example, try joining to `SELECT avatar, CASE RIGHT(avatar, 3) WHEN 'svg' THEN 1 ELSE 0 END AS IsSVG FROM openclipart_files` with avatar = filename and IsSVG = 1. – SPFiredrake Jul 26 '12 at 17:21
  • @hmmftg I don't know after few minutes I cancel the query. – jcubic Jul 26 '12 at 17:30
  • 1. Is it really possible to have two users with the same username and different userid values? 2. Which table contains the column `avatar`: `aiki_users` or `openclipart_files`? 3. No matter what you do, you are going to get a scan on the table that contains the `avatar` column because of the use of a leading wildcard. If that table is tiny, then fine, but if that table is large, then the entire query will be slave to that expression. – Thomas Jul 26 '12 at 17:56
  • @Thomas 1 there are lots of them registration system was busted (people click register few times and it was creating few users). 2 aiki_users have avatar (avatar belong to user not file) – jcubic Jul 26 '12 at 17:58
  • @SPFiredrake I was able to fix it using your hint, I create `case RIGHT(avatar, 3) when 'svg' then (select ...) ELSE null END as avatar` – jcubic Jul 26 '12 at 18:00
  • @jcubic - Btw, using `Right(avatar,3)` won't help either. You will still get a table scan as the engine must run that function against every row to know which ones to return. – Thomas Jul 26 '12 at 18:02
  • @Thomas it help because I have only 300 rows that have 'svg' not 100k in inner query. So I scan only 300 rows now it's tiny. Adding my own anwser – jcubic Jul 26 '12 at 18:07
  • @jcubic - It doesn't matter how many rows meet the criteria. What matters is how many rows in the table where the column is located. If there 100K rows but only 300 end with `svg`, the system must run `Right` against 100K rows. – Thomas Jul 26 '12 at 18:09
  • @Thomas Yes, but now the subselect will only run for 300 rows as opposed to 100k, which is the speedup he's noticing. Also, he sped up his query in the final result by getting rid of the `aiki_users` reference in his subselect, which was unnecessary in the first place since he's already working with it as the primary source table. – SPFiredrake Jul 27 '12 at 13:54

4 Answers4

2

Converting to join only syntax (getting rid of the correlated subquery and joining to a subselect instead):

SELECT minids.userid, username, password, full_name, country, email,
      clip.id as avatar,
      homepage, usergroup, notify, nsfwfilter
FROM aiki_users users
INNER JOIN (SELECT MIN(userid) as userid FROM aiki_users GROUP by username) 
           minids ON minids.userid = users.userid
LEFT OUTER JOIN openclipart_files clip ON 
           clip.owner = users.userid AND RIGHT(users.avatar, 3) = 'svg' 
           AND clip.filename = users.avatar

Try that out.

jcubic
  • 51,975
  • 42
  • 183
  • 323
SPFiredrake
  • 3,692
  • 15
  • 26
  • But only 300 rows join by filename = avatar and I need all 100k with NULL values when `avatar not like '%svg'` – jcubic Jul 28 '12 at 07:11
  • That's why it's a left join (left outer join). The join syntax ensures that it only creates a matching record IF AND ONLY IF the join predicates match. Otherwise all the `clip` columns for rows that don't have matches will be null. – SPFiredrake Jul 30 '12 at 13:26
  • I din't know that, I know only subset of SQL, need to learn more about JOINs. I need to test you solution and if it work I'll accept your answer, much better then my hack. – jcubic Jul 30 '12 at 13:41
  • The second join is `LEFT JOIN` shouldn't it be `LEFT OUTER JOIN` like in comment? And why it have `LEFT JOIN (SELECT` if it's join couldn't it be `LEFT OUTER JOIN openclipart_files ON owner = users.user_id ...`? – jcubic Jul 30 '12 at 14:08
  • `LEFT JOIN` is the same as `LEFT OUTER JOIN`. Just like you can use `JOIN` instead of `INNER JOIN`. In regards to the subselect, you're right. I originally did it because I was thinking you were getting the last 3 of a column from `openclipart_files` so I started doing it that way. Then I reexamined your query and realized that wasn't the case at all. I'll change it to a straight join instead of a subselect (since you don't need to perform any UDFs on the data, should be fine). – SPFiredrake Jul 30 '12 at 15:16
1
SELECT 
    MIN(userid), username, password, full_name, country, email,
    openclipart_files.id,
    homepage, usergroup, notify, nsfwfilter
FROM aiki_users 
LEFT JOIN openclipart_files  ON filename = avatar AND avatar like '%svg'
GROUP BY username
Vatev
  • 7,017
  • 1
  • 28
  • 37
1

Using hint from @SPFiredrake with CASE and RIGHT I create this query

SELECT minids.userid, username, password, full_name, country, email,
       case RIGHT(avatar, 3) 
       when 'svg' then 
            (select openclipart_files.id 
             from openclipart_files
             where filename = users.avatar AND users.userid = owner)
       else 
            null 
       end as avatar, homepage, usergroup, first_login, notify, nsfwfilter
FROM aiki_users users 
INNER JOIN (SELECT MIN(userid) as userid FROM aiki_users GROUP by username) 
minids ON minids.userid = users.userid;

which run in 2-4 seconds (inner SQL was run only for those 300 cases).

Community
  • 1
  • 1
jcubic
  • 51,975
  • 42
  • 183
  • 323
0

Some Improvements:

SELECT MINIDS.USERID, USERNAME, PASSWORD, FULL_NAME, COUNTRY, EMAIL,
       (SELECT OPENCLIPART_FILES.ID
          FROM OPENCLIPART_FILES
         WHERE AVATAR LIKE '%svg' AND USERS.FILENAME = AVATAR) AS AVATAR, 
       HOMEPAGE, USERGROUP, NOTIFY, NSFWFILTER
  FROM     AIKI_USERS USERS
       INNER JOIN
           (SELECT MIN(USERID) AS USERID
              FROM AIKI_USERS
            GROUP BY USERNAME) MINIDS
       ON MINIDS.USERID = USERS.USERID;
hmmftg
  • 1,117
  • 1
  • 15
  • 27