Extract values which do not end by specific words

Question

I have a table with some data. It could look for example like this:

7   Gelb    
8   Schwarz
9   Weiß my color
10  Grau
16  Gelb I  
17  Gelb II 
18  Gelb III    
19  Gelb IV 
27  Schwarz I   
28  Schwarz II  
29  Schwarz III 
30  Schwarz IV  
31  Schwarz V   
32  Schwarz VI  
39  Weiß my color III   
40  Weiß my color IV    
41  Weiß my color V 
42  Weiß my color VI

As you can see, in some records we have roman numbers in convention <name><space><roman number>

For instance, there are "Gelb", "Weiß my color" and "Schwarz" and there are also records for them in roman convention. For some, like "Grau", there are no duplicates.

So there will be record with unique color name without a roman number e.g record "Grau" and in the table it could contain or not some records with it and roman numbers for it.

Roman numbers would be always at the end like: <name><space><romannumber>

My goal is only to get unique names. So out of example i want to extract only:

7   Gelb    
8   Schwarz    
9   Weiß my color   
10  Grau

How can i achieve that?

I started with this, would it be enough?

Select Id, Name From MyTable Where Name Not Like = '%<space><anyromancharacter>'

I cannot change structure of the database.

I think you should first find the last index of the whitespace. Then use the `SUBSTRING` function to get a substring from index 0 to the last index of a whitespace (if it exists. if not use the whole string). Last but not least use `SELECT DISTINCT` to get the unique names. For finding the last index of a character you could for example look here: http://stackoverflow.com/questions/15709712/what-is-best-way-to-get-last-indexof-character-in-sql-2008 — schlonzo, Apr 28 '16 at 06:33
This might help: `SELECT * FROM MyTable WHERE Name NOT LIKE '% [I II IV V VI IX X XI]%'` — Felix Pamittan, Apr 28 '16 at 06:35
Select ... From MyTable Where Name Not Like '%[^A-Z]%' Collate SQL_Latin1_General_CP1_CS_AS try some regex — backtrack, Apr 28 '16 at 06:38
Are there definitely no names that may resemble roman numbers? E.g. one could imagine that, in some languages, `liv` could be an actual word. — Damien_The_Unbeliever, Apr 28 '16 at 06:38
@Backtrack working nice but it will also exclude "Weiß my color". How to solve this? — , Apr 28 '16 at 06:53
@Backtrack would be nice to fix a bit your input then you could mark as answer — , Apr 28 '16 at 06:56
@JimmyJimm Select ... From MyTable Where Name Not Like '%([IVX]+)%' Collate SQL_Latin1_General_CP1_CS_AS — backtrack, Apr 28 '16 at 06:57

backtrack · Accepted Answer · 2016-04-28T07:23:26.613

4

update

select * from dbo.test
Where value not Like '%[MDILXV]_' Collate SQL_Latin1_General_CP1_CS_AS

Step 1 :

select * from dbo.test

    id  value
    1   Gelb
    2   Gelb I
    3   Weiß my color III
    4   Weiß my color

When i give

   select * from dbo.test
    Where value not Like '%[IXLV]' Collate SQL_Latin1_General_CP1_CS_AS

id  value
1   Gelb
4   Weiß my color

edited Apr 28 '16 at 07:23

answered Apr 28 '16 at 07:01

backtrack

7,500
5
43
91

@JimmyJimm, Check it now – backtrack Apr 28 '16 at 07:10
seems now it works, but note that i can have romanian numbers even up to 4000 so not only up to X, will it also work whatever is after last space it will be excluded? e.g i can have at the end MMMDLXIV – Apr 28 '16 at 07:14
ok great, can you just explain this: MDILXV , why _ at the end and why Collate SQL_Latin1_General_CP1_CS_AS ? – Apr 28 '16 at 07:40
Hi @JimmyJimm, The reason for _ is to match empty space at the end of the string. In the data set you have provided, When i inserted into SQL i found the empty space at the end, Also i will match any character at the end of the string – backtrack Apr 28 '16 at 07:56
@JimmyJimm, Consider upvoting and accepting if it looks okay – backtrack Apr 28 '16 at 07:57
@JimmyJimm, Which part, '_' ? or entire answer ? – backtrack Apr 28 '16 at 08:02
so you mean _ at the end of this: '%[MDILXV]_' i dont need to put it there correct? In db there will be not spaces after romanian number. But let's say it will appear one or more spaces after this will handle that or? Still dont get it. – Apr 28 '16 at 08:04
by the way when i modified from this '%[IXLV]' to this: '%[MDILXV]_' i get all records again.. ;/ – Apr 28 '16 at 08:07
use this , '%[IXLV]_' i tired to add all possible character. I will help – backtrack Apr 28 '16 at 08:11
i think i not need this last _ if i remove it it works again. So if i am sure after romanian there will be no space is this correct to use?: '%[MDILXV]' – Apr 28 '16 at 08:11
Ji @JimmyJimm, Okay. As i mentioned, I had some white space issue in my data set – backtrack Apr 28 '16 at 08:11
ah ok. So should i use either this: '%[MDILXV]' or this: '%[IXLV]' – Apr 28 '16 at 08:13

score 3 · Answer 2 · edited Jan 18 '21 at 12:31

Here is my solution:

First, generate a list of Roman Numerals up to a specified limit. Then, extract the last word from your table and check if it exists in the list of Roman Numerals:

ONLINE DEMO

;WITH E1(N) AS( 
    SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)
),
E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b), 
E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b), 
CteTally(N) AS(
    SELECT TOP(1000)  -- Replace value inside TOP for MAX roman numbers
        ROW_NUMBER() OVER(ORDER BY(SELECT NULL))
    FROM E4
),
CteRoman(N, Roman) AS(
    SELECT *
    FROM CteTally t
    CROSS APPLY(
        SELECT
            REPLICATE('M', t.N/1000)
             + REPLACE(REPLACE(REPLACE(
                  REPLICATE('C', t.N%1000/100),
                  REPLICATE('C', 9), 'CM'),
                  REPLICATE('C', 5), 'D'),
                  REPLICATE('C', 4), 'CD')
             + REPLACE(REPLACE(REPLACE(
                  REPLICATE('X', t.N%100 / 10),
                  REPLICATE('X', 9),'XC'),
                  REPLICATE('X', 5), 'L'),
                  REPLICATE('X', 4), 'XL')
             + REPLACE(REPLACE(REPLACE(
                  REPLICATE('I', t.N%10),
                  REPLICATE('I', 9),'IX'),
                  REPLICATE('I', 5), 'V'),
                  REPLICATE('I', 4),'IV')
    ) r(a)
),
CteLastWord AS(
    SELECT *,
        LastWord = CASE 
                    WHEN CHARINDEX(' ', Name) = 0 THEN Name
                    ELSE REVERSE(LEFT(REVERSE(Name), CHARINDEX(' ', REVERSE(Name)) - 1))
                   END
    FROM MyTable
)
SELECT
    id, Name    
FROM CteLastWord w
WHERE 
    NOT EXISTS(
        SELECT 1 
        FROM CteRoman
        WHERE
            Roman = w.LastWord              
    )
ORDER BY w.Id

Reference:

Convert Number to Roman Numerals in Sql Server

score 1 · Answer 3 · answered Apr 28 '16 at 06:58

I would do this. First of all create the ToRomanNumerals function from here

And now create the numbers table with the roman numbers (I have created it from 1..100) and then use the LEFT(CHARINDEX) to remove the roman numerals from the name, like this:

DROP TABLE #Table1

CREATE TABLE #Table1
    ([ID] int, [name] varchar(17))
;

INSERT INTO #Table1
    ([ID], [name])
VALUES
    (7, 'Gelb'),
    (8, 'Schwarz'),
    (9, 'Weiß my color'),
    (10, 'Grau'),
    (16, 'Gelb I'),
    (17, 'Gelb II'),
    (18, 'Gelb III'),
    (19, 'Gelb IV'),
    (27, 'Schwarz I'),
    (28, 'Schwarz II'),
    (29, 'Schwarz III'),
    (30, 'Schwarz IV'),
    (31, 'Schwarz V'),
    (32, 'Schwarz VI'),
    (39, 'Weiß my color III'),
    (40, 'Weiß my color IV'),
    (41, 'Weiß my color V'),
    (42, 'Weiß my color VI')
;

--select name, patindex('%M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})', name) from #Table1
--select name, patindex('% [IVXLC]%', name) from #Table1

;with n as 
(select 1 as n
union all
select n.n+1 as n
from n where n < 100),
nr as
(select n, dbo.ToRomanNumerals(n) r
from n)
select name, nr.r, COALESCE(LEFT(name, PATINDEX('% ' + nr.r, name)), name) from #Table1 t
LEFT JOIN nr ON t.name LIKE '% ' + nr.r

Result:

name              r        
----------------- -------- -----------------
Gelb              NULL     Gelb
Schwarz           NULL     Schwarz
Weiß my color     NULL     Weiß my color
Grau              NULL     Grau
Gelb I            I        Gelb 
Gelb II           II       Gelb 
Gelb III          III      Gelb 
Gelb IV           IV       Gelb 
Schwarz I         I        Schwarz 
Schwarz II        II       Schwarz 
Schwarz III       III      Schwarz 
Schwarz IV        IV       Schwarz 
Schwarz V         V        Schwarz 
Schwarz VI        VI       Schwarz 
Weiß my color III III      Weiß my color 
Weiß my color IV  IV       Weiß my color 
Weiß my color V   V        Weiß my color 
Weiß my color VI  VI       Weiß my color 

(18 row(s) affected)

have to work on what i have, i cannot change database structure. is there really no way to extract based on this table structure? Annyway thanks for help. — , Apr 28 '16 at 07:00
but i cannot also make functions... I have to have it in one query without functions, new tables etc.. If so i would do this ;/ that's sh** customer db — , Apr 28 '16 at 07:05
Is it just no so simple just to find last space character and if something after that exclude this record? — , Apr 28 '16 at 07:05
A function can easily be converted to a CROSS APPLY query the way Felix Pamittan made. In addition to that, you can create a function in the tempdb database and drop it afterwards. Finding last space may not work, as you need to be sure that everything that follows that last space is indeed a roman numeral — cha, Apr 28 '16 at 23:08

Munna Extreme · Answer 4 · 2016-04-28T06:52:52.573

0

I hope this solves your issue. Add one more column in your table which contains only Roman Number of the name, if no Roman number save empty string.

select distinct left(NAME,LEN(NAME)-CHARINDEX(RomanNumberColumn,REVERSE(NAME))) FROM TABLE

edited Apr 28 '16 at 06:52

answered Apr 28 '16 at 06:39

Munna Extreme

368
1
11

Unfortunately, `Weiß my` will be returned by this query. – Felix Pamittan Apr 28 '16 at 06:42
Okay, but I guess we can't check Roman Numbers clearly in your requirement because some names end with " I" which is not a Roman in that case but it looks like Roman number and also like name. – Munna Extreme Apr 28 '16 at 06:47
yes for this will be wrong, and also i added distinct but still see many Gelb's, Schwarz's ... select DISTINCT left(NAME,LEN(NAME)- CHARINDEX(' ',REVERSE(NAME))) as Name, Id, FK_Variation_ID FROM tbVariation_Attribute So we have still two errors to be solved. – Apr 28 '16 at 06:48
i think we do not have to check romanian numbers i think, we have to just look for last space and if there is something after that space we have to exclude that record – Apr 28 '16 at 06:49
@Backtrack answer is working but aso excluded "Weiß my color" – Apr 28 '16 at 06:52
try which suits to your requirement, please check updated answer with extra column to maintain Roman Number details. – Munna Extreme Apr 28 '16 at 06:54
its not working also with "Weiß my color" it will be excluded – Apr 28 '16 at 06:54
for that just add one column name like Roman Number, save the roman numbers of the name in that column, if no roman number save empty string, and replace Name column "CHARINDEX(NAME" to "CHARINDEX(RomanNumberColumn" – Munna Extreme Apr 28 '16 at 06:56
i cannot because i cannot change table structure have to work on what there is :( – Apr 28 '16 at 06:56

irakliG. · Answer 5 · 2016-04-28T07:25:45.477

0

this should be work:

select distinct ID, name from YourTable
where right (name,charindex(' ',REVERSE(name))) not like '%[IVXLCDM]%' COLLATE SQL_Latin1_General_CP1_CS_AS

Where checks only last word in NAME column if it contains X V I L characters.

edited Apr 28 '16 at 07:25

answered Apr 28 '16 at 07:02

irakliG.

166
10

no "Weiß my color" appeared after your query (note that there could be higher romanian numbers if you do this way – Apr 28 '16 at 07:08
With this query "Weiß my color" should appear. For higher romanian numbers you can add '%C%', '%D%' and '%M%' in 'WHERE' clause. – irakliG. Apr 28 '16 at 07:15

score 0 · Answer 6 · answered Apr 28 '16 at 07:43

Replace cte with your table name, same to columns. I use id for numeric code and name for names.

SELECT DISTINCT c.id,
                t.name 
FROM (
    SELECT  c1.name,
            DENSE_RANK() OVER (PARTITION BY c2.name ORDER BY c1.name) as DR
    FROM cte c1
    LEFT JOIN cte c2 
        ON c2.name LIKE c1.name + '%' 
) as t
INNER JOIN cte c
    ON c.name = t.name 
WHERE t.DR = 1

Output:

id          name
----------- -----------------
7           Gelb
8           Schwarz
9           Wei? my color
10          Grau

(4 row(s) affected)

Extract values which do not end by specific words

6 Answers6