-1

The expression I have now (<div class=\"oembed.*)V(.*?)<\/div>

https://regexr.com/56l4q

How to get data till the 1st"</div>"? Because now it goes till the last 3rd "</div>"

UPDATE #1:

I'm looking for V that's in a div that starts with the class="oembed

The result I want to get with regex:

<div class="oembed  oembed-type-instagram oembed-pre-frame" data-oembed-medialink="https://www.instagram.com/p/B_BEXwtp-V7/" style="margin:10px auto;" data-oembed-url="https://api.instagram.com/oembed/?url=https%3A%2F%2Fwww.instagram.com%2Fp%2FB_BEXwtp-V7&format=json&maxwidth=500&embed=widget&width=1" data-oembed-id="B_BEXwtp-V7" data-oembed-options='{"maxwidth":"500","embed":"widget","width":true}'>1st div</div>

UPDATE #2:

I'm using this PHP code: https://paiza.io/projects/aU_FO4ihErlQngYFy6xvJg

The result is:

'   Vがある<br>
<div id="body-top" class="content-moki clearfix">
   Vがある<br>
<span class="headline">
<br>
にモデル「V」を入れると、プレビューでエンベットが崩れる<br>
<br>
<div class="oembed  oembed-type-instagram" data-oembed-medialink="https://www.instagram.com/p/B5mtrL3p3X@CHANGE@/" style="margin:10px auto;max-width: 500px;" data-oembed-url="https://api.instagram.com/oembed/?url=https%3A%2F%2Fwww.instagram.com%2Fp%2FB5mtrL3p3X@CHANGE@&format=json&maxwidth=500&width=1" data-oembed-id="B5mtrL3p3X@CHANGE@" data-oembed-options=\'{"maxwidth":"500","width":true}\'><figure class="moki-embed-instagram"><img src="https://instagram.com/p/B5mtrL3p3X@CHANGE@/media/?size=l"><figcaption><i class="fa fa-instagram icon"></i></figcaption></figure></div>    @CHANGE@がある<br>a </div> @CHANGE@...</div>'

But I have to get this result:

'   Vがある<br>
<div id="body-top" class="content-moki clearfix">
   Vがある<br>
<span class="headline">
<br>
にモデル「V」を入れると、プレビューでエンベットが崩れる<br>
<br>
<div class="oembed  oembed-type-instagram" data-oembed-medialink="https://www.instagram.com/p/B5mtrL3p3X@CHANGE@/" style="margin:10px auto;max-width: 500px;" data-oembed-url="https://api.instagram.com/oembed/?url=https%3A%2F%2Fwww.instagram.com%2Fp%2FB5mtrL3p3X@CHANGE@&format=json&maxwidth=500&width=1" data-oembed-id="B5mtrL3p3X@CHANGE@" data-oembed-options=\'{"maxwidth":"500","width":true}\'><figure class="moki-embed-instagram"><img src="https://instagram.com/p/B5mtrL3p3X@CHANGE@/media/?size=l"><figcaption><i class="fa fa-instagram icon"></i></figcaption></figure></div>    Vがある<br>a </div> V...</div>'

You can see the difference here - https://www.diffchecker.com/n4LIOMtH

whitesiroi
  • 2,453
  • 3
  • 26
  • 58
  • @ChrisRuehlemann thank you for your comment, but still the same result – whitesiroi Jun 13 '20 at 08:38
  • Your requirement is unclear. Please explain in detail. It'll help us understand the problem staement better. –  Jun 13 '20 at 08:41
  • @Mandy8055 Thank you for your help, I did update my question – whitesiroi Jun 13 '20 at 08:44
  • If the html is dynamic; I'll suggest that you please use a parser because regex's may backfire. Otherwise; does [**this**](https://regex101.com/r/WWx7Mp/4) help? –  Jun 13 '20 at 08:45
  • @Mandy8055 Thank you for your help, I'm using it in php, so I need to make it with brackets, like - "/^(
    )/su" => "$1{$replace}$2"
    – whitesiroi Jun 13 '20 at 08:52
  • @Mandy8055 I'm looking for something like this (
    then with php I'll change it to "$1{$replace}$2"
    – whitesiroi Jun 13 '20 at 08:56
  • @Mandy8055 can you please look at this url https://regexr.com/56l6g – whitesiroi Jun 13 '20 at 09:02
  • 3
    Why don't you use a parser? [Parsing HTML with regex is a hard job](https://stackoverflow.com/a/4234491/372239) HTML and regex are not good friends. Use a parser, it is simpler, faster and much more maintainable. – Toto Jun 13 '20 at 09:32
  • @Toto & Mandy8055 - yes parser is awesome, but I'm working on project that have bunch of legacy code and "V" is just an example and I have bunch of other expressions in array - I just need to add this expression (but it's greedy one ) – whitesiroi Jun 13 '20 at 09:42
  • @mickmackusa I disagree with your assumption of intentionally defying SO page design. So, to borrow from your comment: your attempt at mind reading is not good. But I do agree with your critique of my regex and have theefore decided to delete it. – Chris Ruehlemann Jun 13 '20 at 15:28

2 Answers2

2

Sheesh, I had to silence html warnings and jump through a bunch of utf-8 hoops to get this DOM parser technique to spit out the right result, but here goes... I did adjust your sample html a little and wrap it all in a parent div for stability. I assume this is okay to do since your sample string looks like a fragment of the actual document.

My XPath expression will search for a <div> containing the class oembed anywhere in the document then search its text to ensure that it contains the targeted substring (V). If it finds a qualifying node, the body of the foreach will see that the substring is replaced as desired.

So long as your document can be parsed, it will be a more accurate/reliable solution not to mention easier to maintain versus regex -- which is a DOM-ignorant tool.

Code: (Demo)

$html = <<<HTML
<div>
   Vがある<br>
   <div id="body-top" class="content-moki clearfix">
       Vがある<br>
       <span class="headline">
           <br>
           にモデル「V」を入れると、プレビューでエンベットが崩れる<br>
           <br>
           <div class="oembed  oembed-type-instagram" data-oembed-medialink="https://www.instagram.com/p/B5mtrL3p3XV/" style="margin:10px auto;max-width: 500px;" data-oembed-url="https://api.instagram.com/oembed/?url=https%3A%2F%2Fwww.instagram.com%2Fp%2FB5mtrL3p3XV&format=json&maxwidth=500&width=1" data-oembed-id="B5mtrL3p3XV" data-oembed-options='{"maxwidth":"500","width":true}'>
               <figure class="moki-embed-instagram">
                   <img src="https://instagram.com/p/B5mtrL3p3XV/media/?size=l">
                   <figcaption>
                       <i class="fa fa-instagram icon"></i>
                   </figcaption>
               </figure>
               Vがある
           </div>
           <br>a 
        </span>
        V...
    </div>
</div>
HTML;

$needle = 'V';
$replace = '@CHANGE@';

libxml_use_internal_errors(true);
$dom = new DOMDocument('1.0', 'utf-8'); 
$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//div[contains(@class, 'oembed')]/text()[contains(.,'$needle')]") as $node) {
    $node->nodeValue = str_replace($needle, $replace, $node->nodeValue);
}
echo $dom->saveXML($dom->documentElement);

Output: (notice only the V that exists inside the targeted div is replaced)

<div>
   Vがある<br/>
   <div id="body-top" class="content-moki clearfix">
       Vがある<br/>
       <span class="headline">
           <br/>
           にモデル「V」を入れると、プレビューでエンベットが崩れる<br/>
           <br/>
           <div class="oembed  oembed-type-instagram" data-oembed-medialink="https://www.instagram.com/p/B5mtrL3p3XV/" style="margin:10px auto;max-width: 500px;" data-oembed-url="https://api.instagram.com/oembed/?url=https%3A%2F%2Fwww.instagram.com%2Fp%2FB5mtrL3p3XV&amp;format=json&amp;maxwidth=500&amp;width=1" data-oembed-id="B5mtrL3p3XV" data-oembed-options="{&quot;maxwidth&quot;:&quot;500&quot;,&quot;width&quot;:true}">
               <figure class="moki-embed-instagram">
                   <img src="https://instagram.com/p/B5mtrL3p3XV/media/?size=l"/>
                   <figcaption>
                       <i class="fa fa-instagram icon"/>
                   </figcaption>
               </figure>
               @CHANGE@がある
           </div>
           <br/>a 
        </span>
        V...
    </div>
</div>
mickmackusa
  • 33,121
  • 11
  • 58
  • 86
0

I'd use somthing like:

(?:<div class="oembed|\G)(?:(?!</div>).)*?\KV

Demo & explanation

Code:

$res = preg_replace('~(?:<div class="oembed|\G)(?:(?!</div>).)*?\KV~', '@CHANGE@', $text);
Toto
  • 83,193
  • 59
  • 77
  • 109