How to use regex to find url in the css string

Question

I'm having a hard time with extracting the url in a text using python

I got the text from style attribute of a tag with beautiful soup, the text is always:

background:url(//somedomaine.com/annonces/103028/large.jpg) no-repeat center center

My goal is to extract "//somedomaine.com/annonces/103028/large.jpg" but I'm new with regex, I tried to use the "$" modifier with "url" but it didn't help.

1

Show what you tried. – Daniel Roseman Oct 13 '19 at 18:30

score 3 · Answer 1 · answered Oct 13 '19 at 18:32

3

background:url$([^$]+)\)

This regex will look for the text background:url(, and thencapture everything up until the first ) it encounters.

Demo

answered Oct 13 '19 at 18:32

Nick Reed

5,029
4
14
34

1

Couldn't you just do a non-greedy match there and the character group wouldn't be necessary? `$(.+?)$`? – Green Cloak Guy Oct 13 '19 at 18:34
1

You absolutely could, and it would satisfy OP's requirements, too (hence my upvote on your answer). My answer was just force of habit - whenever HTML/CSS is involved, I often use negated character classes, since they'll match across lines. It improves flexibility, since HTML/CSS lets the tag opening and closing be on different lines. – Nick Reed Oct 13 '19 at 18:37

score 2 · Accepted Answer · answered Oct 13 '19 at 18:32

Here's an incredibly generic match:

text = "background:url(//somedomaine.com/annonces/103028/large.jpg) no-repeat center center"
regstr = r"background:url\((.*)\) no-repeat center center"

import re
x = re.match(regstr, text)
print(x.group(1))  # '//somedomaine.com/annonces/103028/large.jpg'

The regex here is very straightforward - match the largest possible set of arbitrary characters (.*) surrounded by the given text ("background:url(" in the front, ") no-repeat center center" in the back).

score 1 · Answer 3 · answered Oct 13 '19 at 18:35

1

If you want a non-regex solution and just search for substring,

url = text[text.find('url(') + 4: text.find(')')]

Not robust for urls containing )|url(

answered Oct 13 '19 at 18:35

modesitt

6,434
2
30
61

How to use regex to find url in the css string

3 Answers3