20

What is the shortest and/or efficient SQL statement to sort a table with a column of email address by it's DOMAIN name fragment?

That's essentially ignoring whatever is before "@" in the email addresses and case-insensitive. Let's ignore the internationalized domain names for this one.

Target at: mySQL, MSSQL, Oracle

Sample data from TABLE1

id   name           email 
------------------------------------------
 1   John Doe       johndoe@domain.com
 2   Jane Doe       janedoe@helloworld.com
 3   Ali Baba       ali@babaland.com
 4   Foo Bar        foo@worldof.bar.net
 5   Tarrack Ocama  me@am-no-president.org

Order By Email
SELECT * FROM TABLE1 ORDER BY EMAIL ASC

id   name           email 
------------------------------------------
 3   Ali Baba       ali@babaland.com
 4   Foo Bar        foo@worldof.bar.net
 2   Jane Doe       janedoe@helloworld.com
 1   John Doe       johndoe@domain.com
 5   Tarrack Ocama  me@am-no-president.org

Order By Domain
SELECT * FROM TABLE1 ORDER BY ?????? ASC

id   name           email 
------------------------------------------
 5   Tarrack Ocama  me@am-no-president.org
 3   Ali Baba       ali@babaland.com
 1   John Doe       johndoe@domain.com
 2   Jane Doe       janedoe@helloworld.com
 4   Foo Bar        foo@worldof.bar.net

EDIT:
I am not asking for a single SQL statement that will work on all 3 or more SQL engines. Any contribution are welcomed. :)

Old Pro
  • 22,324
  • 4
  • 52
  • 96
o.k.w
  • 24,261
  • 6
  • 60
  • 62
  • Please post the code you have written so far. People generally do not like to just write for code for you. – Mitch Wheat Nov 28 '09 at 04:21
  • String manipulation functions are fairly non-standardized. I'd be surprised if you could find one query that works across all 3. – Donnie Nov 28 '09 at 04:22
  • @Mitch: I've added a couple os SQL statements, as you ca see, pretty straightforward and I do not have one for the one I am asking. :P – o.k.w Nov 28 '09 at 04:24
  • @Donnie: Yes I figured, that's why I do not require this as a generic SQL statement, but targetting the popular 3. Perhaps I should state that more clearly. – o.k.w Nov 28 '09 at 04:26
  • Will three different statements, targeted at the three DBMSs you mention do, or do you need an approach that requires one SQL statement only? If one SQL, then you'll need to add a column, as several answers (mine included) suggest, – lexu Nov 28 '09 at 04:42
  • @lexu: Can be 3, or even 1. :) I've edited my post to highlight this. – o.k.w Nov 28 '09 at 05:36
  • 1
    Domain names (and emails) should be in lowercase - there's no risk in driving the text to lowercase when they are captured because domain names (and the internet) is case insensitive. But domain names resolve from right to left - a 2nd period encountered before the @ symbol indicates a subdomain, and there is no limit to the number of subdomains. For example, in `foo@worldof.bar.net` - `worldof` is a subdomain, `bar` is the domain, net is the root. Does this change how you expect your list to be ordered? – OMG Ponies Nov 28 '09 at 06:23
  • 1
    Given the example data, the email has to be cut up into substrings. But the data is also deterministic - it will always return the same data. This means you could consider using a materialized view (indexed view in SQL Server) to separate the strings you need with the email address, while having indexes to take advantage of. Unfortunately, of the databases you listed - MySQL does not yet support materialized views, Oracle and SQL Server do. – OMG Ponies Nov 28 '09 at 06:24
  • @OMG Ponies: Good point you have there about using View. Care to put it as an answer? I find that very useful, could benefit someone else. :) About the sub-domain, I'm interested in that too, but I felt it is too much for this question. Very grateful for your thoughts. – o.k.w Nov 28 '09 at 08:06
  • @Mitch: Though I agree with you, evidence shows that the code always gets written. – erikkallen Nov 28 '09 at 11:42
  • normal sort should work because once you get to @ symbol it is the smallest in all chars (Upper, lower). so it will be sorted according to domain name only. – Rick_C137 Jun 11 '18 at 13:08

13 Answers13

23

Try this

Query(For Sql Server):

select * from mytbl
order by SUBSTRING(email,(CHARINDEX('@',email)+1),1)

Query(For Oracle):

select * from mytbl
order by substr(email,INSTR(email,'@',1) + 1,1)

Query(for MySQL)

pygorex1 already answered

Output:

id name email

5   Tarrack Ocama   me@am-no-president.org
3   Ali Baba    ali@babaland.com
1   John Doe    johndoe@domain.com
2   Jane Doe    janedoe@helloworld.com
4   Foo Bar foo@worldof.bar.net
priyanka.sarkar
  • 23,174
  • 39
  • 117
  • 161
  • Waited couple days for more answers, seems like this is the one that is most complete. I do appreciate other alternative methods and ideas in terms of scalability and performance. Wye-opening indeed. Thanks all! – o.k.w Nov 30 '09 at 02:55
  • 1
    This query for Postgres: `select * from mytbl order by SUBSTRING(email,(position('@',email)+1),1)` – Elliot Coad Jun 20 '12 at 12:00
  • 4
    For Sql Server, it looks like that would just sort on the 1st letter of the domain, so I'd adjust as follows: `select * from mytbl order by SUBSTRING(email,(CHARINDEX('@',email)+1),LEN(email)-CHARINDEX('@', email))` – Otus Jul 18 '12 at 16:14
  • pygorex1 already answered ye right i could not find it you should have added a link or his text + his name dont make it complicated... – robgha01 Jul 09 '20 at 11:00
17

For MySQL:

select email, SUBSTRING_INDEX(email,'@',-1) AS domain from user order by domain desc;

For case-insensitive:

select user_id, username, email, LOWER(SUBSTRING_INDEX(email,'@',-1)) AS domain from user order by domain desc;
leepowers
  • 35,484
  • 22
  • 93
  • 127
  • SUBSTRING_INDEX is just MySQL. You'll need to Google for Substring / string modification functions in Oracle & SQL Server. Unfortunately, there's no standard ANSI SQL code for this AFAIK. – leepowers Nov 28 '09 at 04:36
  • that's what I thought. I wonder if o.k.w meant those 3 DBMSs as a hard requirement? – lexu Nov 28 '09 at 04:38
  • @lexu,pygorex1: Not hard requirement. I just thought putting 3 popular ones might help others. pygorex1, +1 for your answer :P – o.k.w Nov 28 '09 at 04:43
  • Thanks for the +1. This is a nice trick - though bear in mind you'll get faster performance from a separate domain in it's own indexed column. – leepowers Nov 28 '09 at 05:04
8

If you want this solution to scale at all, you should not be trying to extract sub-columns. Per-row functions are notoriously slow as the table gets bigger and bigger.

The right thing to do in this case is to move the cost of extraction from select (where it happens a lot) to insert/update where it happens less (in most normal databases). By incurring the cost only on insert and update, you greatly increase the overall efficiency of the database, since that's the only point in time where you need to do it (i.e., it's the only time when the data changes).

In order to achieve this, split the email address into two distinct columns in the table, email_user and email_domain). Then you can either split it in your application before insertion/update or use a trigger (or pre-computed columns if your DBMS supports it) in the database to do it automatically.

Then you sort on email_domain and, when you want the full email address, you use email_name|'@'|email_domain.

Alternatively, you can keep the full email column and use a trigger to duplicate just the domain part in email_domain, then you never need to worry about concatenating the columns to get the full email address.

It's perfectly acceptable to revert from 3NF for performance reasons provided you know what you're doing. In this case, the data in the two columns can't get out of sync simply because the triggers won't allow it. It's a good way to trade disk space (relatively cheap) for performance (we always want more of that).

And, if you're the sort that doesn't like reverting from 3NF at all, the email_name/email_domain solution will fix that.

This is also assuming you just want to handle email addresses of the form a@b - there are other valid email addresses but I can't recall seeing any of them in the wild for years.

paxdiablo
  • 772,407
  • 210
  • 1,477
  • 1,841
4

For SQL Server, you could add a computed column to your table with extracts the domain into a separate field. If you persist that column into the table, you can use it like any other field and even put an index on it, to speed things up, if you query by domain name a lot:

ALTER TABLE Table1
  ADD DomainName AS 
     SUBSTRING(email, CHARINDEX('@', email)+1, 500) PERSISTED

So now your table would have an additional column "DomainName" which contains anything after the "@" sign in your e-mail address.

marc_s
  • 675,133
  • 158
  • 1,253
  • 1,388
2

Assuming you really must cater for MySQL, Oracle and MSSQL .. the most efficient way might be to store the account name and domain name in two separate fields. The you can do your ordering:

select id,name,email from table order by name

select id,name,email,account,domain from table order by email

select id,name,email,account,domain from table order by domain,account

as donnie points out, string manipulation functions are non standard .. that is why you will have to keep the data redundant!

I've added account and domain to the third query, since I seam to recall not all DBMSs will sort a query on a field that isn't in the selected fields.

lexu
  • 8,440
  • 5
  • 42
  • 62
2

This will work with Oracle:

select id,name,email,substr(email,instr(email,'@',1)+1) as domain
from table1
order by domain asc
Jonathan
  • 18,696
  • 6
  • 62
  • 66
P Sharma
  • 2,470
  • 11
  • 29
  • 35
2

For postgres the query is:

SELECT * FROM table
ORDER BY SUBSTRING(email,(position('@' in email) + 1),252)

The value 252 is the longest allowed domain (since, the max length of an email is 254 including the local part, the @, and the domain.

See this for more details: What is the maximum length of a valid email address?

zachaysan
  • 1,482
  • 14
  • 28
1

You are going to have to use the text manipulation functions to parse out the domain. Then order by the new column.

monksy
  • 13,745
  • 16
  • 68
  • 121
1

MySQL, an intelligent combination of right() and instr()

SQL Server, right() and patindex()

Oracle, instr() and substr()

And, as said by someone else, if you have a decent to high record count, wrapping your email field in functions in you where clause will make it so the RDBMS can't use any index you might have on that column. So, you may want to consider creating a computed column which holds the domain.

Donnie
  • 41,533
  • 8
  • 62
  • 82
1

If you have million records, I suggest you to create new column with domain name only.

YOU
  • 106,832
  • 29
  • 175
  • 207
1

My suggestion would be (for mysql):

SELECT 
    LOWER(email) AS email,
    SUBSTRING_INDEX(email, '@', + 1) AS account,
 REPLACE(SUBSTRING_INDEX(email, '@', -1), CONCAT('.',SUBSTRING_INDEX(email, '.', -1)),'') -- 2nd part of mail - tld.
  AS domain,
    CONCAT('.',SUBSTRING_INDEX(email, '.', -1)) AS tld
FROM
********
ORDER BY domain, email ASC;
And then just add a WHERE...
Mr_KeyCode
  • 31
  • 3
1

The original answer for SQL Server didn't work for me....

Here is a version for SQL Server...

select SUBSTRING(email,(CHARINDEX('@',email)+1),len(email)), count(*) 
from table_name 
group by SUBSTRING(email,(CHARINDEX('@',email)+1),len(email))
order by count(*) desc
eddievan
  • 88
  • 1
  • 8
0

work smarter not harder:

SELECT REVERSE(SUBSTRING_INDEX(REVERSE(SUBSTRING(emails.email, POSITION('@' IN emails.email)+1)),'.',2)) FROM emails
Akhil Jain
  • 12,808
  • 13
  • 51
  • 88
  • 3
    Calling *reverse* twice is not smart. Smart (and obvious) is to have separate columns for the local part and domain name so the database can use an index. – user359996 Mar 07 '11 at 22:14