32

I have a T-SQL query that takes data from one table and copies it into a new table but only rows meeting a certain condition:

SELECT VibeFGEvents.* 
INTO VibeFGEventsAfterStudyStart 
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON 
    CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
    AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
    AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id

The code using the table relies on its order, and the copy above does not preserve the order I expected. I.e. the rows in the new table VibeFGEventsAfterStudyStart are not monotonically increasing in the VibeFGEventsAfterStudyStart.id column copied from VibeFGEvents.id.

In T-SQL how might I preserve the ordering of the rows from VibeFGEvents in VibeFGEventsStudyStart?

Luke Girvin
  • 12,672
  • 8
  • 57
  • 79
dumbledad
  • 12,928
  • 20
  • 97
  • 226
  • 1
    for simplicity , couldn't you use shorter names , so it will be clearer ? – Royi Namir Jan 20 '13 at 13:23
  • @RoyiNamir shorter names = less meaningful, so will it be clearer. – Tony Hopkinson Jan 20 '13 at 13:30
  • 2
    @TonyHopkinson seeking for `VibeFGEventsStudyStart.MIN_TitleInstID` and seeking if all others are the same name , is painful. – Royi Namir Jan 20 '13 at 13:31
  • Not for sql server it isn't. – Tony Hopkinson Jan 20 '13 at 13:35
  • 4
    What does it have to do with Sql server ? I'm talking about asking questions. we dont care about his actual names. we care about his problem. That's why he is here. to ask a question.(clearer = for us , the SO users) – Royi Namir Jan 20 '13 at 13:39
  • Royi - I could have shortened the names for the post, sorry, though in my own code I prefer them long for the reason Tony gives. The data is an archive of study data so I can get away with inefficient queries as there's not that much data and it isn't changing. – dumbledad Jan 20 '13 at 13:40
  • Tony - if you mean that there is no way to preserve the ORDER BY ordering in the SELECT INTO for SQL Server do you want to restate that as an answer and I'll mark it as such? – dumbledad Jan 20 '13 at 13:41
  • I've added SQL Server as a tag so it is clearer which database I'm using. – dumbledad Jan 20 '13 at 13:43
  • @RoyiNamir. I see what you are saying, though I (an other SO user) didn't have the same problem. As as far as I can see the column names are irrelevant to the underlying issue, so I ignored them. – Tony Hopkinson Jan 20 '13 at 13:47
  • 7
    Since relational databases per se really don't have any concept of *order* - what's the point of *preserving* the order upon insert? In general, any relational table **is not ordered** by default; a result set can be ordered **if** you explicitly define an `ORDER BY` clause in your `SELECT` – marc_s Jan 20 '13 at 13:49
  • Thanks marc_s. I'll put ORDER BY into the calling code and live without the order I expect in the DB itself. – dumbledad Jan 20 '13 at 13:56
  • There is no other way as a database has no concept of order. Never had in SQL. – TomTom Jan 20 '13 at 13:58
  • Yeah I read this as the order was being overritten, not that it was never there in the first place. – Tony Hopkinson Jan 20 '13 at 14:01

9 Answers9

42

I know this is a bit old, but I needed to do something similar. I wanted to insert the contents of one table into another, but in a random order. I found that I could do this by using select top n and order by newid(). Without the 'top n', order was not preserved and the second table had rows in the same order as the first. However, with 'top n', the order (random in my case) was preserved. I used a value of 'n' that was greater than the number of rows. So my query was along the lines of:

insert Table2 (T2Col1, T2Col2)
  select top 10000 T1Col1, T1Col2
  from Table1
  order by newid()
Michael
  • 8,401
  • 2
  • 27
  • 42
23

What for?

Point is – data in a table is not ordered. In SQL Server the intrinsic storage order of a table is that of the (if defined) clustered index.

The order in which data is inserted is basically "irrelevant". It is forgotten the moment the data is written into the table.

As such, nothing is gained, even if you get this stuff. If you need an order when dealing with data, you HAVE To put an order by clause on the select that gets it. Anything else is random - i.e. the order you et data is not determined and may change.

So it makes no sense to have a specific order on the insert as you try to achieve.

SQL 101: sets have no order.

Luke Girvin
  • 12,672
  • 8
  • 57
  • 79
TomTom
  • 1
  • 9
  • 78
  • 143
  • 2
    It is a core concept in SQL - pretty much SQL is set based. The order HAS tobe imposed when materialzing a set (in a select). Unless one defines an order in a SELECT; the results are arbitrary and can theoretically change between calls. The order of the data or inserts is lost the moment the data is in the table. THere is no "hidden natural" order. THis is the core of set based operations. – TomTom Jan 20 '13 at 15:55
  • Is it more accurate to say that the order is not guaranteed without an order by? Say I bulk insert data from a file into Table1 and then run select * from Table1 - I get the data back in exactly the same order that it was inserted, no?. When will that change? This is important for 3d party apps that migrate / import data via odbc. If the 3d party app doesn't let you apply an order clause, and if you need the data imported "in order" (to avoid record locks, etc.), then you would be well advised to make sure the data is inserted into the table in the desired order? – spioter Sep 09 '14 at 21:21
  • 2
    No, not guaranteed. It MAY happen, it may not. If there is a clustered index on table 1 with another order it likely comes in that order. If there is another index due to a where clause it comes in a random order depending how sql server decides to search for things. In a more complex query you may find paralellism using differen threads then merging results. Not guaranteed means you rely on side effects which MAY break. Called super crappy programming. – TomTom Sep 10 '14 at 00:21
  • 8
    Sometimes it's just nice to see the data in a certain order by default. It's not needed, you're right, but some people prefer it. – David Wilson Jun 08 '16 at 20:39
  • 2
    How about tables without clustered indexes, like those with UUID primary keys ? – NielsK Jun 10 '16 at 10:56
  • 2
    Or dealing with code that uses the sequence of primary keys in its functionality. – Bon Jul 20 '16 at 18:26
  • Order seems to matter if the place you are inserting into has a constraint and you are using INSERT IGNORE – William Entriken Jul 01 '19 at 19:24
  • @NielsK the clustered index will only cause data to be returned in order if the *query plan* uses it. For a simple `SELECT * FROM JustThisOneTable` it will work as expected. It is still a code smell however, simply because of the complexities we have seen with it leading to nontransparent behavior if you omit an `ORDER BY` clause (in other words, write readable code by not relying on sophisticated assumptions). Also, a query hint to force use the clustered index is another bad idea, because you should leverage the statistics SQL computes to estimate an optimal query plan. – Elaskanator Oct 11 '19 at 19:42
  • 9
    "What for" doesn't help anyone. There were many scenarios where I'd needed to save data into a temporary table so I can repair the data a few days later. It wasn't meant as a permanent storage but as a temporary solution for data repair. Anyway, the next answer should be vote up. – jjthebig1 Oct 27 '19 at 19:55
  • Literally ran into this writing inserts with expected ordering, such that the consumer of a temp table expects to iterate over it in a certain order. Ordering the insert means less intermediary and redundant code to complete the same operation. – Captain Prinny Nov 19 '19 at 16:56
  • 2
    Thing that is getting me is that, when using columnstore, insert order _does_ matter... it allows for rowgroup omission and much faster queries... in fact leveraging insert order is recommended by MS here for that very purpose: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-query-performance?view=sql-server-ver15#1-organize-data-to-eliminate-more-rowgroups-from-a-full-table-scan – Kram Jul 16 '20 at 23:34
4

I've found a specific scenario where we want the new table to be created with a specific order in the columns' content:

  • Amount of rows is very big (from 200 to 2000 millions of rows), so we are using SELECT INTO instead of CREATE TABLE + INSERT because needs to be loaded as fast as possible (minimal logging). We have tested using the trace flag 610 for loading an already created empty table with a clustered index but still takes longer than the following approach.
  • We need the data to be ordered by specific columns for query performances, so we are creating a CLUSTERED INDEX just after the table is loaded. We discarded creating a non-clustered index because it would need another read for the data that's not included in the ordered columns from the index, and we discarded creating a full-covering non-clustered index because it would practically double the amount of space needed to hold the table.

It happens that if you manage to somehow create the table with columns already "ordered", creating the clustered index (with the same order) takes a lot less time than when the data isn't ordered. And sometimes (you will have to test your case), ordering the rows in the SELECT INTO is faster than loading without order and creating the clustered index later.

The problem is that SQL Server 2012+ will ignore the ORDER BY column list when doing INSERT INTO or when doing SELECT INTO. It will consider the ORDER BY columns if you specify an IDENTITY column on the SELECT INTO or if the inserted table has an IDENTITY column, but just to determine the identity values and not the actual storage order in the underlying table. In this case, it's likely that the sort will happen but not guaranteed as it's highly dependent on the execution plan.

A trick we have found is that doing a SELECT INTO with the result of a UNION ALL makes the engine perform a SORT (not always an explicit SORT operator, sometimes a MERGE JOIN CONCATENATION, etc.) if you have an ORDER BY list. This way the select into already creates the new table in the order we are going to create the clustered index later and thus the index takes less time to create.

So you can rewrite this query:

SELECT
    FirstColumn = T.FirstColumn,
    SecondColumn = T.SecondColumn
INTO
    #NewTable
FROM
    VeryBigTable AS T
ORDER BY            -- ORDER BY is ignored!
    FirstColumn,
    SecondColumn

to

SELECT
    FirstColumn = T.FirstColumn,
    SecondColumn = T.SecondColumn
INTO
    #NewTable
FROM
    VeryBigTable AS T

UNION ALL

-- A "fake" row to be deleted
SELECT
    FirstColumn = 0,
    SecondColumn = 0

ORDER BY
    FirstColumn,
    SecondColumn

We have used this trick a few times, but I can't guarantee it will always sort. I'm just posting this as a possible workaround in case someone has a similar scenario.

EzLo
  • 12,897
  • 10
  • 28
  • 33
1

You cannot do this with ORDER BY but if you create a Clustered Index on VibeFGEvents.id after your SELECT INTO the table will be sorted on disk by VibeFGEvents.id.

Cyndi Baker
  • 576
  • 6
  • 14
1

Just add top to your sql with a number that is greater than the actual number of rows:

SELECT top 25000 * 
into spx_copy
  from SPX
  order by date
Greg Gum
  • 25,941
  • 27
  • 127
  • 194
  • 1
    By adding TOP, I get the SQL to run. However, they are still loaded into the destination table as if the "ORDER BY" was not there .... I tested the command separately, and as a select it still works correctly, and orders it. Any ideas? – JosephDoggie Feb 01 '21 at 17:24
  • Hmm, no idea. It works for me. – Greg Gum Feb 02 '21 at 20:44
0

The reason why one would desire this (a specific order) is because you cannot define the order in a subquery, so, the idea is that, if you create a table variable, THEN make a query from that table variable, you would think you would retain the order(say, to concatenate rows that must be in order- say for XML or json), but you can't. So, what do you do? The answer is to force SQL to order it by using TOP in your select (just pick a number high enough to cover all your rows).

MC9000
  • 1,295
  • 7
  • 35
  • 59
  • [Another answer](https://stackoverflow.com/a/23237448/2799848) mentioned this already. Please elaborate why using `TOP` makes it work – Elaskanator Oct 11 '19 at 21:05
  • 1
    This is by design (and it makes intuitive sense since TOP would make no sense without an ordering of the return set). Like I clarified: "... force SQL to order it by using TOP ..." in my comment above (yes, this was mentioned already, but I hoped to clarify it for others. – MC9000 Oct 16 '19 at 02:30
0

I have run into the same issue and one reason I have needed to preserve the order is when I try to use ROLLUP to get a weighted average based on the raw data and not an average of what is in that column. For instance, say I want to see the average of profit based on number of units sold by four store locations? I can do this very easily by creating the equation Profit / #Units = Avg. Now I include a ROLLUP in my GROUP BY so that I can also see the average across all locations. Now I think to myself, "This is good info but I want to see it in order of Best Average to Worse and keep the Overall at the bottom (or top) of the list)." The ROLLUP will fail you in this so you take a different approach.

Why not create row numbers based on the sequence (order) you need to preserve?

    SELECT OrderBy = ROW_NUMBER() OVER(PARTITION BY 'field you want to count' ORDER BY 'field(s) you want to use ORDER BY')
    , VibeFGEvents.*  
    FROM VibeFGEvents
    LEFT OUTER JOIN VibeFGEventsStudyStart
    ON 
        CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
        AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
        AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
    WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL

Now you can use the OrderBy field from your table to set the order of values. I removed the ORDER BY statement from the query above since it does not affect how the data is loaded to the table.

MHeath
  • 71
  • 7
0

I'v made a test on MS SQL 2012, and it clearly shows me, that insert into ... select ... order by makes sense. Here is what I did:

create table tmp1 (id int not null identity, name sysname);
create table tmp2 (id int not null identity, name sysname);

insert into tmp1 (name) values ('Apple');
insert into tmp1 (name) values ('Carrot');
insert into tmp1 (name) values ('Pineapple');
insert into tmp1 (name) values ('Orange');
insert into tmp1 (name) values ('Kiwi');
insert into tmp1 (name) values ('Ananas');
insert into tmp1 (name) values ('Banana');
insert into tmp1 (name) values ('Blackberry');

select * from tmp1 order by id;

And I got this list:

  • 1 Apple
  • 2 Carrot
  • 3 Pineapple
  • 4 Orange
  • 5 Kiwi
  • 6 Ananas
  • 7 Banana
  • 8 Blackberry

No surprises here. Then I made a copy from tmp1 to tmp2 this way:

insert into tmp2 (name)
select name
from tmp1
order by id;

select * from tmp2 order by id;

I got the exact response like before. Apple to Blackberry. Now reverse the order to test it:

delete from tmp2;

insert into tmp2 (name)
select name
from tmp1
order by id desc;

select * from tmp2 order by id;
  • 9 Blackberry
  • 10 Banana
  • 11 Ananas
  • 12 Kiwi
  • 13 Orange
  • 14 Pineapple
  • 15 Carrot
  • 16 Apple

So the order in tmp2 is reversed too, so order by made sense when there is a identity column in the target table!

CLS
  • 451
  • 5
  • 7
-1

Try using INSERT INTO instead of SELECT INTO

INSERT INTO VibeFGEventsAfterStudyStart 
SELECT VibeFGEvents.* 
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON 
    CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
    AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
    AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id`
dumbledad
  • 12,928
  • 20
  • 97
  • 226
Geo2013
  • 11
  • 1
  • 1
    A table has no order. See the other comments in this question. – usr Jan 20 '13 at 14:05
  • 1
    I do understand table has no order in this scenario. The point is rows can be inserted using INSERT INTO with a sub query that can be ordered. – Geo2013 Jan 20 '13 at 14:38
  • A simple observation: In this case, the `ORDER BY` clause is for `SELECT VibeFGEvents.* FROM ...` statement and not for `INSERT` statement. – Bogdan Sahlean Jan 20 '13 at 14:58
  • 2
    It does not matter what order you insert in. When selecting, that order is gone. You cannot get it "out" again. – usr Jan 20 '13 at 15:13
  • Except there is implicit ordering by the first column which often can be an sequential integer identity primary key. – Bon Jul 20 '16 at 18:28
  • Using an `ORDER BY` with an `INSERT` statement [appears to only matter](https://stackoverflow.com/a/56005568/2799848) when populating the `IDENTITY` column. – Elaskanator Oct 11 '19 at 21:09