Ruby 2.3 - Add strings after and before certain regexp match

Question

I have a string

text 6ffdfd <a href="http://worldnews.com" target="_blank">toto</a> sdsdsd

I would like to find a regex that would

1. add a opening span tag just after the end of the a tag html link (that is to say to be precise after the string "target="_blank">"
1. add a closing span tag just before the a tag closing

The desired end result would be:

 <a href="http://worldnews.com" target="_blank"><span>toto</span></a> sdsdsd

For the moment , I don't find how to achieve 1, and I only partially managed 2. because my current code is wrongly adding white space that I don't want between /span and the closing a tag

Current code

orig_string = 'text 6ffdfd <a href="http://example.com" target="_blank">toto</a> sdsdsd'
end_result = orig_string.gsub(/<\/a>/, '</span> \\0')
print end_result

I have a set up a online editable DEMO here: https://repl.it/repls/SecondCapitalPika

Do not use regex to parse HTML. Use a proper HTML parser and add the `` tag to the DOM. — Stefan, Dec 06 '17 at 14:14
hi Stefan, I am doing it as admin on Active Admin inputs. Why shouldn't I do it with a regexp? I'm a ruby newbie so I thought that could work. Are there security issues ? I want to change the input that I enter so that the one with are saved to the database (instead of the one without s) — Mathieu, Dec 06 '17 at 14:15
See [The Stack Overflow Regular Expressions FAQ](https://stackoverflow.com/a/22944075/477037), there are several links explaining why you should not use regular expressions to parse HTML. — Stefan, Dec 06 '17 at 14:20
Thanks a lot. I went through the very "intense" debate betwene people for and agaisnt using regexp to parse html.In the end I'll go for using it , agreeing with this person "If you have a small set of HTML pages that you want to scrape data from and then stuff into a database, regexes might work fine. For example, I recently wanted to get the names, parties, and districts of Australian federal Representatives, which I got off of the Parliament's web site. This was a limited, one-time job" (source: https://stackoverflow.com/a/1733489/1467802) — Mathieu, Dec 06 '17 at 15:15
Indeed it's not like i am parsing 10k pages, it's just me and wanting to add a tag when I add an input i control 100% in my Active Admin Rails panel — Mathieu, Dec 06 '17 at 15:16
But the reading was very interesting and definiitely detered me from ever using regexp when parsing will be on volatile/uncontrollable/random/mass html parsing — Mathieu, Dec 06 '17 at 15:17

Nermin · Accepted Answer · 2017-12-06T15:58:43.493

1

orig_string =~ /(?<=>)([^<]*)(?=<\/a>)/
if $1.present?
  end_result = orig_strig.gsub(/(?<=>)([^<]*)(?=<\/a>)/, '<span>\1</span>')
end

Break down

(?<=>) # to have character >  before
([^<]*) # match everything until character <, match everything in a tag
(?=<\/a>) # to have </a> after

Will result in

print end_result
'text 6ffdfd <a href="http://example.com" target="_blank"><span>toto</span></a> sdsdsd'

edited Dec 06 '17 at 15:58

answered Dec 06 '17 at 14:14

Nermin

5,810
11
22

could you add just a little more info: indeed this works the first time but if i change the text/input in my admin panel, then the script kick in again and i end up with 4 spans instead of 2:) Could I use some if string contains no span yet, then do this... – Mathieu Dec 06 '17 at 15:37
aweosme, let me check it. i must really try learning regexp, it seems powerful ! – Mathieu Dec 06 '17 at 16:00
how does $1.present work I mean don't you have to say where you check the presence of $1 ? like on orig_string? – Mathieu Dec 06 '17 at 16:01
i get undefined method `present?' for nil:NilClass:) – Mathieu Dec 06 '17 at 16:02
`nil` should return false for `present?`. Can you try `unless $1.blank?` – Nermin Dec 06 '17 at 16:21

score 1 · Answer 2 · answered Dec 06 '17 at 14:14

1

If you don't necessarily need a regex, then you could use Nokogiri:

require 'nokogiri'

text = <<-TEXT
  text 6ffdfd <a href="http://worldnews.com" target="_blank">toto</a> sdsdsd
  6ffdfd text <a href="http://worldnews.com" target="_blank">tete</a> sdsdsd
  6ffdfd text <a href="http://worldnews.com">titi</a> sdsdsd
TEXT

doc = Nokogiri.HTML text
doc.css('a[target="_blank"]').each { |anchor| anchor.add_next_sibling '<span>span</span>' }

answered Dec 06 '17 at 14:14

Sebastian Palma

29,105
6
30
48

Hi Sebastian, thanks for your help. Please refer to my comments under my question where I explain why in the end i still opted for a regexp. On top of those reasons, i also did not want to make my Ruby on Rails 4 more heavy than it already is for a very simple need by importing/requiring another library (nokogiri) – Mathieu Dec 06 '17 at 15:18

Ruby 2.3 - Add strings after and before certain regexp match

2 Answers2