T-SQL: Opposite to string concatenation - how to split string into multiple records

Question

Possible Duplicate:
Split string in SQL

I have seen a couple of questions related to string concatenation in SQL. I wonder how would you approach the opposite problem: splitting coma delimited string into rows of data:

Lets say I have tables:

userTypedTags(userID,commaSeparatedTags) 'one entry per user
tags(tagID,name)

And want to insert data into table

userTag(userID,tagID) 'multiple entries per user

Inspired by Which tags are not in the database? question

EDIT

Thanks for the answers, actually more then one deserves to be accepted but I can only pick one, and the solution presented by Cade Roux with recursions seems pretty clean to me. It works on SQL Server 2005 and above.

For earlier version of SQL Server the solution provided by miies can be used. For working with text data type wcm answer will be helpful. Thanks again.

Just use a Split routine for it. Plenty of folks have posted code for it on SO and elsewhere. — Kevin Fairchild, Nov 24 '08 at 17:24
See http://stackoverflow.com/questions/2647/split-string-in-sql as well. — Jarrod Dixon, Mar 08 '10 at 19:43
sounds like you need to separate the that column into it's own table. If tags are only stored in a delimited list, how would you write efficient sql to find records associated with a specific tag? — Joel Coehoorn, Nov 24 '08 at 17:25
That is the idea behind the question, separating list of tags string to become set of records. — kristof, Nov 25 '08 at 12:25
Okay, that makes more sense then. So I should be hounding your predecessor for creating that mess in the first place ;) — Joel Coehoorn, Nov 25 '08 at 14:39
Is see that doing it in SQL is possibly the requirement but isn't creating a CLR splitting table function that simply call string.split with parameters an efficient, elegant and easy to deploy and maintain solution (including, but not limited to cultureinfo, stringsplitoptions etc.)) luke — luckyluke, Mar 04 '11 at 14:20
In SQL Server 2016 you have STRING_SPLIT function: https://msdn.microsoft.com/en-us/library/mt684588.aspx — Jovan MSFT, Mar 08 '16 at 21:18

score 148 · Accepted Answer · answered Nov 24 '08 at 18:01

148

There are a wide varieties of solutions to this problem documented here, including this little gem:

CREATE FUNCTION dbo.Split (@sep char(1), @s varchar(512))
RETURNS table
AS
RETURN (
    WITH Pieces(pn, start, stop) AS (
      SELECT 1, 1, CHARINDEX(@sep, @s)
      UNION ALL
      SELECT pn + 1, stop + 1, CHARINDEX(@sep, @s, stop + 1)
      FROM Pieces
      WHERE stop > 0
    )
    SELECT pn,
      SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS s
    FROM Pieces
  )

answered Nov 24 '08 at 18:01

Cade Roux

83,561
38
170
259

2

Fantastic function. Could do with using nchar() and nvarchar(). Also see below for suggestion with variable-length delimiter. – Rory Oct 17 '09 at 16:47
1

On SQL Server 2008, this version fails when there are more than 101 items in the list: "The statement terminated. The maximum recursion 100 has been exhausted before statement completion." – Mike Schenk Feb 12 '10 at 17:23
1

@MikeSchenk You can use the OPTION (MAXRECURSION n) hint (http://msdn.microsoft.com/en-us/library/ms181714.aspx) to change the level of recursion - however, it's not allowed in UDF definitions. This question (http://social.msdn.microsoft.com/forums/en-US/transactsql/thread/7c7d5fea-38ad-4bc5-9038-a157e640561f) would imply that you can specify it outside the UDF and have it still work. – Cade Roux Feb 12 '10 at 18:20
2

Warning: fails with larger input strings (above about 1000 characters). "The maximum recursion 100 has been exhausted before statement completion." – cbp Apr 14 '10 at 05:39
I realize I'm jumping into this incredibly late... like 1.5 years, but there's a pretty interesting solution that doesn't require any recursion that wasn't mentioned here. I've posted it below. – Nathan Wheeler May 14 '10 at 21:11
1

How can I call this function from a select query? I get the following error: Cannot find either column "dbo" or the user-defined function or aggregate "dbo.fn_Split", or the name is ambiguous. – Lajos Arpad Dec 21 '12 at 15:36
@LajosArpad You should post your entire example (with the actual code you are trying) in another question. – Cade Roux Dec 21 '12 at 20:40

score 85 · Answer 2 · edited Mar 27 '14 at 14:53

85

You can also achieve this effect using XML, as seen here, which removes the limitation of the answers provided which all seem to include recursion in some fashion. The particular use I've made here allows for up to a 32-character delimiter, but that could be increased however large it needs to be.

create FUNCTION [dbo].[Split] (@sep VARCHAR(32), @s VARCHAR(MAX))
RETURNS TABLE
AS
    RETURN
    (
        SELECT r.value('.','VARCHAR(MAX)') as Item
        FROM (SELECT CONVERT(XML, N'<root><r>' + REPLACE(REPLACE(REPLACE(@s,'& ','&amp; '),'<','&lt;'), @sep, '</r><r>') + '</r></root>') as valxml) x
        CROSS APPLY x.valxml.nodes('//root/r') AS RECORDS(r)
    )

Then you can invoke it using:

SELECT * FROM dbo.Split(' ', 'I hate bunnies')

Which returns:

-----------
|I        |
|---------|
|hate     |
|---------|
|bunnies  |
-----------

I should note, I don't actually hate bunnies... it just popped into my head for some reason.

The following is the closest thing I could come up with using the same method in an inline table-valued function. DON'T USE IT, IT'S HORRIBLY INEFFICIENT! It's just here for reference sake.

CREATE FUNCTION [dbo].[Split] (@sep VARCHAR(32), @s VARCHAR(MAX))
RETURNS TABLE
AS
    RETURN
    (
        SELECT r.value('.','VARCHAR(MAX)') as Item
        FROM (SELECT CONVERT(XML, N'<root><r>' + REPLACE(@s, @sep, '</r><r>') + '</r></root>') as valxml) x
        CROSS APPLY x.valxml.nodes('//root/r') AS RECORDS(r)
    )

edited Mar 27 '14 at 14:53

tradergordo

57
1
7

answered May 14 '10 at 21:10

Nathan Wheeler

5,726
2
26
47

@md5sum: Great solution! – p.campbell May 14 '10 at 21:13
@md5sum - if you could get this into an inline table-valued function, I'd be onboard. Typically scalar functions perform horribly on SQL Server. I'd love to see this benchmarked against the ITVF solution. – Cade Roux May 14 '10 at 22:06
@Cade Roux - I'm sorry, but I don't think I understand exactly what you're asking here... this is a table-valued function. Unless you're just wanting it to return the selected value rather than inserting it into the temp table? – Nathan Wheeler May 17 '10 at 16:38
@md5sum - inline table valued functions (without a BEGIN) generally perform poorer and are handled completely different than multi-statement table-valued functions (as this is). Of course you never can tell untill you benchmark them... – Cade Roux May 17 '10 at 16:59
@md5sum I see I mentioned scalar functions - I probably forgot to finish my thought - performance on UDFs generally is best on inline TVF, then multi-statement TVF, then scalar functions (which are completely horrible). – Cade Roux May 17 '10 at 17:01
@Cade Roux - The inline version (the best incarnation I could come up with) is a complete joke in comparison. I even tried it a couple extra times just to ensure that something hadn't kicked off on the server to screw with my results. Version posted above splitting 3168 rows of "I HATE BUNNIES" consistently takes ~110ms. The inline version using the same parameters consistently takes ~1:25.175. I can post the inline version if you'd like. I may have done something horribly wrong when I built it, but as far as I can tell I did it the only proper way possible. – Nathan Wheeler May 17 '10 at 17:27
@md5sum I'm late voting you up, but wanted to note that I'd be interested to see how it scales for a lot more rows. I know that even simple scalar functions can perform horribly versus inline code on hundreds of thousands of rows. – Cade Roux Aug 26 '10 at 12:54
@Cade Roux - I'm not really sure. I know that I've used it for around 1000 and it performs fine. However, I would argue that SQL Server isn't the place to do that amount of string manipulation anyway (at least not in any reasonable application architecture design patterns). – Nathan Wheeler Aug 26 '10 at 14:21
@md5sum I agree this is probably a modeling/design issue. However, if you've got terabytes of data sometimes you don't want to pull them into yet another system to do something with them. Obviously the design of the data should be improved to facilitate the anticipated processing. Sometimes it's not so easy to stream millions of rows through something better than T-SQL. Which is why design defects are 1000x times more difficult to compensate for when discovered in production stage of life-cycle. – Cade Roux Aug 26 '10 at 20:05
@md5sum thanks for the solutions, but the first example needs the parameters renamed to @sep and @s respectively. (couldn't change it myself) – Tabloo Quijico Nov 10 '10 at 16:38
@Tabloo Quijico - amazing nobody else caught the mismatched parameter names. Thanks! – Nathan Wheeler Nov 10 '10 at 17:07
In this example, you're limiting the text to 5 characters. So your output should return `I | hate | bunni` – arviman Oct 25 '11 at 20:31
If you have amperstands (&) in your data this will not work unless you string replace them with '&' – jkirkwood Oct 28 '11 at 18:25
@arviman - Yeah, A friend of mine pointed that out... I just never got around to updating the answer here. – Nathan Wheeler Feb 02 '12 at 22:26
@jkirkwood - Yes, since you're converting it to XML, you have to format all XML "special characters" to their appropriate "escaped" value (can't think of the right word for that right now). So, it would apply to < and > as well. – Nathan Wheeler Feb 02 '12 at 22:26
This is a great solution! I just changed the VARCHAR(5) to NVARHAR(MAX) and it works great for me! Thanks! – Ryan Jan 14 '13 at 16:42
It actually returns **I hate bunni** due to the `VARCHAR(5)` :) – Jeremy Thompson Apr 17 '13 at 23:55
@JeremyThompson - I could have sworn that was fixed at one time... you can see from the comments above that it was mentioned before :) – Nathan Wheeler Apr 18 '13 at 16:50
got bad performance when changed to uniqueidentifier, cost me 2s to split 250 guid values, don't know why... – Colin Niu Nov 15 '13 at 10:41
@ColinNiu - What did you change to uniqueidentifier? You could check the execution, but I'm betting that the process of converting the strings to uniqueidentifiers is what will be eating your performance, rather than the process of splitting them up. – Nathan Wheeler Feb 28 '14 at 22:35
3

Edit number 7 changed the first function to an inline function. Which is confusing because the answer discusses the differences between the two solutions. Just in case anyone else is confused and sees this comment – JoshBerke Aug 12 '14 at 22:15
Strange, in my testing I find that the table valued function version (I went into the history of this response to get it) performed more than 8 times WORSE than the inline version. I will stick with the inline version thank you. – Tim Friesen Aug 18 '15 at 17:21
1

@JoshBerke, I have edited this answer to fix the issue you mentioned. – Tim Friesen Aug 18 '15 at 17:29
1

Nevermind, my edit was rejected even though the goal of my edit was to bring back the original intent that the author was conveying. Poor reviewing in my opinion. – Tim Friesen Aug 18 '15 at 19:12

score 18 · Answer 3 · edited Oct 21 '10 at 14:37

18

I use this function (SQL Server 2005 and above).

create function [dbo].[Split]
(
    @string nvarchar(4000),
    @delimiter nvarchar(10)
)
returns @table table
(
    [Value] nvarchar(4000)
)
begin
    declare @nextString nvarchar(4000)
    declare @pos int, @nextPos int

    set @nextString = ''
    set @string = @string + @delimiter

    set @pos = charindex(@delimiter, @string)
    set @nextPos = 1
    while (@pos <> 0)
    begin
        set @nextString = substring(@string, 1, @pos - 1)

        insert into @table
        (
            [Value]
        )
        values
        (
            @nextString
        )

        set @string = substring(@string, @pos + len(@delimiter), len(@string))
        set @nextPos = @pos
        set @pos = charindex(@delimiter, @string)
    end
    return
end

edited Oct 21 '10 at 14:37

bdukes

137,241
21
139
173

answered Nov 24 '08 at 17:22

user39603

2,155
1
15
13

Thanks for that. I think that will also work in SQLServer 2000 – kristof Nov 25 '08 at 12:47
You're right. I thought table-valued functions were introduced with SQL Server 2005, but they weren't. – user39603 Nov 26 '08 at 12:24
@commaCheck is not used, other than an assignment, so it can be removed. – David May 04 '10 at 00:31
Also, set @string = substring(@string, @pos + 1, len(@string)) should be set @string = substring(@string, @pos + len(@delimiter), len(@string)) – David May 04 '10 at 00:43

score 11 · Answer 4 · answered Jul 31 '10 at 12:18

11

For the particular case of splitting strings into words I've come across another solution for SQL Server 2008.

with testTable AS
(
SELECT 1 AS Id, N'how now brown cow' AS txt UNION ALL
SELECT 2, N'she sells sea shells upon the sea shore' UNION ALL
SELECT 3, N'red lorry yellow lorry' UNION ALL
SELECT 4, N'the quick brown fox jumped over the lazy dog'
)

SELECT display_term, COUNT(*) As Cnt
 FROM testTable
CROSS APPLY sys.dm_fts_parser('"' + txt + '"', 1033, 0,0)
GROUP BY display_term
HAVING COUNT(*) > 1
ORDER BY Cnt DESC

Returns

display_term                   Cnt
------------------------------ -----------
the                            3
brown                          2
lorry                          2
sea                            2

answered Jul 31 '10 at 12:18

Martin Smith

402,107
79
682
775

4

Interesting, but one must note that it **requires** "Full-Text Search" to be installed & available – quetzalcoatl Jan 08 '13 at 12:32
1

@quetzalcoatl - And it also requires `sysadmin` permissions as well. Still might be useful for somebody. – Martin Smith Jan 08 '13 at 12:37
Absolutely perfect. This is like the opposite of the "Stuff" command. No need for testing a barrage of split functions only to doubt whether you really want to install it on production. Perfect for my requirements. Thanks! – Peter PitLock Oct 15 '14 at 09:49

score 7 · Answer 5 · edited May 23 '17 at 12:34

Slight modification of the solution above so it works with variable-length delimiters.

create FUNCTION dbo.fn_Split2 (@sep nvarchar(10), @s nvarchar(4000))
RETURNS table
AS
RETURN (
    WITH Pieces(pn, start, stop) AS (
      SELECT 1, 1, CHARINDEX(@sep, @s)
      UNION ALL
      SELECT pn + 1, stop + (datalength(@sep)/2), CHARINDEX(@sep, @s, stop + (datalength(@sep)/2))
      FROM Pieces
      WHERE stop > 0
    )
    SELECT pn,
      SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 4000 END) AS s
    FROM Pieces
  )

NB: I've used datalength() since len() reports incorrectly if there are trailing spaces.

score 7 · Answer 6 · answered Mar 24 '11 at 17:38

Here's a Split function that is compatible with SQL Server versions prior to 2005.

CREATE FUNCTION dbo.Split(@data nvarchar(4000), @delimiter nvarchar(100))  
RETURNS @result table (Id int identity(1,1), Data nvarchar(4000)) 
AS  
BEGIN 
    DECLARE @pos   INT
    DECLARE @start INT
    DECLARE @len   INT
    DECLARE @end   INT

    SET @len   = LEN('.' + @delimiter + '.') - 2
    SET @end   = LEN(@data) + 1
    SET @start = 1
    SET @pos   = 0

    WHILE (@pos < @end)
    BEGIN
        SET @pos = CHARINDEX(@delimiter, @data, @start)
        IF (@pos = 0) SET @pos = @end

        INSERT @result (data) SELECT SUBSTRING(@data, @start, @pos - @start)
        SET @start = @pos + @len
    END

    RETURN
END

+1 for avoiding recursion (since SQL Server does it so poorly), avoiding XML (since SQL does not have an easy API for escaping special XML characters), and also avoiding CLR code (since some companies' data centers don't allow custom code on shared SQL Server instances). — Zarepheth, Dec 18 '13 at 21:33

score 7 · Answer 7 · answered Jul 22 '11 at 10:37

7

Using CLR, here's a much simpler alternative that works in all cases, yet 40% faster than the accepted answer:

using System;
using System.Collections;
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
using Microsoft.SqlServer.Server;

public class UDF
{
    [SqlFunction(FillRowMethodName="FillRow")]
    public static IEnumerable RegexSplit(SqlString s, SqlString delimiter)
    {
        return Regex.Split(s.Value, delimiter.Value);
    }

    public static void FillRow(object row, out SqlString str)
    {
        str = new SqlString((string) row);
    }
}

Of course, it is still 8 times slower than PostgreSQL's regexp_split_to_table.

answered Jul 22 '11 at 10:37

sayap

5,581
2
33
38

EXPLAIN ANALYSE with PostgreSQL, and the poor man's version of it, i.e. checking "Include Actual Execution Plan" in SSMS with SQL Server. Exact same table with millions of records in both databases. – sayap Mar 18 '12 at 21:16
I like this (although I haven't benchmarked it yet), but you should include the TableDefinition property on the SqlFunction attribute so that data tools can generate a proper function definition. – karaken12 Jul 30 '12 at 14:13

score 6 · Answer 8 · answered Nov 24 '08 at 17:53

SELECT substring(commaSeparatedTags,0,charindex(',',commaSeparatedTags))

will give you the first tag. You can proceed similarly to get the second one and so on by combining substring and charindex one layer deeper each time. That's an immediate solution but it works only with very few tags as the query grows very quickly in size and becomes unreadable. Move on to functions then, as outlined in other, more sophisticated answers to this post.

score 2 · Answer 9 · answered Nov 24 '08 at 17:35

I wrote this awhile back. It assumes the delimiter is a comma and that the individual values aren't bigger than 127 characters. It could be modified pretty easily.

It has the benefit of not being limited to 4,000 characters.

Good luck!

ALTER Function [dbo].[SplitStr] ( 
        @txt text 
) 
Returns @tmp Table 
        ( 
                value varchar(127)
        ) 
as 
BEGIN 
        declare @str varchar(8000) 
                , @Beg int 
                , @last int 
                , @size int 

        set @size=datalength(@txt) 
        set @Beg=1 


        set @str=substring(@txt,@Beg,8000) 
        IF len(@str)<8000 set @Beg=@size 
        ELSE BEGIN 
                set @last=charindex(',', reverse(@str)) 
                set @str=substring(@txt,@Beg,8000-@last) 
                set @Beg=@Beg+8000-@last+1 
        END 

        declare @workingString varchar(25) 
                , @stringindex int 



        while @Beg<=@size Begin 
                WHILE LEN(@str) > 0 BEGIN 
                        SELECT @StringIndex = CHARINDEX(',', @str) 

                        SELECT 
                                @workingString = CASE 
                                        WHEN @StringIndex > 0 THEN SUBSTRING(@str, 1, @StringIndex-1) 
                                        ELSE @str 
                                END 

                        INSERT INTO 
                                @tmp(value)
                        VALUES 
                                (cast(rtrim(ltrim(@workingString)) as varchar(127)))
                        SELECT @str = CASE 
                                WHEN CHARINDEX(',', @str) > 0 THEN SUBSTRING(@str, @StringIndex+1, LEN(@str)) 
                                ELSE '' 
                        END 
                END 
                set @str=substring(@txt,@Beg,8000) 

                if @Beg=@size set @Beg=@Beg+1 
                else IF len(@str)<8000 set @Beg=@size 
                ELSE BEGIN 
                        set @last=charindex(',', reverse(@str)) 
                        set @str=substring(@txt,@Beg,8000-@last) 
                        set @Beg=@Beg+8000-@last+1 

                END 
        END     

        return
END

Great + not being limited!! Thanks – 321X Jun 05 '12 at 12:58 — 321X, Jun 05 '12 at 12:58

Darren · Answer 10 · 2012-05-22T11:13:40.213

I up-voted "Nathan Wheeler" answer as I found "Cade Roux" answer did not work above a certain string size.

Couple of points

-I found adding the DISTINCT keyword improved performance for me.

-Nathan's answer only works if your identifiers are 5 characters or less, of course you can adjust that...If the items you are splitting are INT identifiers as I am you can us same as me below:

CREATE FUNCTION [dbo].Split
(
    @sep VARCHAR(32), 
    @s VARCHAR(MAX)
)
RETURNS 
    @result TABLE (
        Id INT NULL
    )   
AS
BEGIN
    DECLARE @xml XML
    SET @XML = N'<root><r>' + REPLACE(@s, @sep, '</r><r>') + '</r></root>'

    INSERT INTO @result(Id)
    SELECT DISTINCT r.value('.','int') as Item
    FROM @xml.nodes('//root//r') AS RECORDS(r)

    RETURN
END

Depending upon what is being split, there could be negative consequences to using `DISTINCT`. Namely, that maybe the resulting table is supposed to contain some repeated values, but with `DISTINCT` it only has unique values. — Zarepheth, Dec 18 '13 at 21:18

Marek · Answer 11 · 2012-04-24T13:15:08.377

I usually do this with the following code:

create function [dbo].[Split](@string varchar(max), @separator varchar(10))
returns @splited table ( stringPart varchar(max) )
with execute as caller
as
begin
    declare @stringPart varchar(max);
    set @stringPart = '';

    while charindex(@separator, @string) > 0
    begin
        set @stringPart = substring(@string, 0, charindex(@separator, @string));
        insert into @splited (stringPart) values (@stringPart);
        set @string = substring(@string, charindex(@separator, @string) + len(@separator), len(@string) + 1);
    end

    return;
end
go

You can test it with this query:

declare @example varchar(max);
set @example = 'one;string;to;rule;them;all;;';

select * from [dbo].[Split](@example, ';');

T-SQL: Opposite to string concatenation - how to split string into multiple records

11 Answers11

Linked

Related