-2

I'm using BigQuery and I want to extract string after the specific match strings, in my case, the strings is sc

I have a string like this :

www.xxss.com?psct=T-EST2%20.coms&.com/u[sc'sc(mascscin', sc'.c(scscossccnfiscg.scjs']-/ci=1(sctitis)

My expected result is:

titis)

Is this possible?

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397

2 Answers2

1

In general, across all RDBMS finding the index of the last instance of a match in a string is easy to compute by first reversing the string. Then we are only looking for the first match.

Update: BigQuery

Follow the documentation for REGEXP_EXTRACT in the String Functions documentation for BigQuery

NOTE: BigQuery provides regular expression support using the re2 library; see that documentation for its regular expression syntax.

However, this problem can be solved without RegEx.

BigQuery supports array processing and has a SPLIT function, so you could split by the lookup variable and capture only the last result:

SELECT ARRAY_REVERSE(SPLIT(  !YOUR COLUMN HERE!  , "sc"))[OFFSET(1)]

The following adaptation from my original submission may still work:

SELECT REVERSE(SUBSTR(REVERSE(@text), 1, STRPOS(REVERSE(@text), "cs") -1))

For those who have a similar requirement in MS SQL Server the following syntax can be used.
other RDBMS can use a similar query, you will have to use the appropriate platform functions to acheive the result.

DECLARE @text varchar(200) = 'www.xxss.com?psct=T-EST2%20.coms&.com/u[sc''sc(mascscin'', sc''.c(scscossccnfiscg.scjs'']-/ci=1(sctitis)'
SELECT REVERSE(LEFT(REVERSE(@text), CharIndex('cs', REVERSE(@text),1) -1))

Produces: titis)

You could achieve a similar result by obtaining the last index of 'sc' as above and using that value in a SUBSTRING however for that to work you need to re-compute the Length, this solution instead uses the LEFT function and then REVERSE's the result , reducing the functional complexity of the query by 1 (1 less function call)


Step this through:

  1. Reverse the value:

     SELECT REVERSE(@text)
    

    Results in:

     )sititcs(1=ic/-]'sjcs.gcsifnccssocscs(c.'cs ,'nicscsam(cs'cs[u/moc.&smoc.02%2TSE-T=tcsp?moc.ssxx.www
    
  2. Now we find the first Index of 'cs'
    Note: we have to reverse the sequece of the lookup string as well!

     SELECT CharIndex('cs', REVERSE(@text),1)
    

    Result: 7

  3. Select the characters before this index:
    Note: we must use -1 here because SQL uses 1-based index result from CharIndex so we must reduce it by 1

     SELECT LEFT(REVERSE(@text), CharIndex('cs', REVERSE(@text),1) -1)
    
  4. Finally, we reverse the result:

     SELECT REVERSE(LEFT(REVERSE(@text), CharIndex('cs', REVERSE(@text),1) -1))
    
Community
  • 1
  • 1
Chris Schaller
  • 6,580
  • 3
  • 34
  • 56
-2

Guess you could use 'sc' as seperator, define (if constant string length) string length in your query (wildcard),

 STRING_SPLIT ( string , separator )
Fr0stY
  • 179
  • 1
  • 2
  • 11
  • This is still a valid response, OP has updated the question to indicate BigQuery is being used, that engine supports 'SPLIT' and array processing. – Chris Schaller Jan 09 '20 at 12:55