-1

I'm not too good with Regex, however, I believe it's the only option I currently have so I'd highly appreciate if someone is able to help me create a Regex pattern.

Basically, I can have one OR more of the following tags:

[start:**uniqueID**:data1:data2:(..)]**###More Data**[/end:**uniqueID**]

**###More Data** is enclosed in two tags, pre and post; **start** and **end** respectively. The start tag has a uniqueID, which is the same in the end tag, this is used to identify the start and end of the data I wish to extract. data1 and data2 are just placeholders, I have no way of knowing what will be written, except that the pre starts with "[start" and post starts with "[/end"

My question is, I'd like to first extract the data inside the start tag (enclosed within the [] brackets), then parse **###More Data** (which ends at the [/end:uniqueID]) tag.

Hopefully I explained it well, I'd appreciate any help. Thanks.

Karizan
  • 29
  • 5
  • 3
    give us the exact string and expected result – Christopher H. Mar 29 '18 at 17:49
  • 1
    Are the double asterisks your attempt at formatting or are they part of the string? –  Mar 29 '18 at 17:49
  • and u want the `uniqueId` only ? – Christopher H. Mar 29 '18 at 17:50
  • I've changed up the format since the forum editor messed with my example, please let me know if it's easily readable now. – Karizan Mar 29 '18 at 17:50
  • @Karizan if you'll answer my question i'll help you with readability. –  Mar 29 '18 at 17:51
  • I'd like to extract data1, data2 and basically anything else inside the tag. the uniqueID is only used to identify the ending of the string, since there will be other tags in the same line. – Karizan Mar 29 '18 at 17:52
  • What about `\[[^]]*start:([^:]+)[^]]*]([^[]*)`? – ctwheels Mar 29 '18 at 17:52
  • @Karizan Enclose your example in back apostrophe and show exactly what it is to be. – NetMage Mar 29 '18 at 17:52
  • @Amy They are part of the string that I wish to extract! – Karizan Mar 29 '18 at 17:53
  • why was my answer downvoted ??did i miss anything ?? – Christopher H. Mar 29 '18 at 17:55
  • @Karizan So the string literally has 12 asterisks in it? – NetMage Mar 29 '18 at 17:59
  • @ctwheels I've tried the pattern, but it seems to match everything within those two tags, but it does not separate them. There is data in both tags, so I want to get this data, THEN extract what's between the tags, if that makes any sense. – Karizan Mar 29 '18 at 18:00
  • @Karizan it's not clear what you actually want as output. Can you clarify in your question? Post sample output. – ctwheels Mar 29 '18 at 18:01
  • @NetMage Yes, 12 in total. 4 - 4 - 4 (If it proves troublesome, I can remove it) – Karizan Mar 29 '18 at 18:02
  • @ctwheels [start:**uniqueID**:data1:data2:(..)] I want -> uniqueID, data1, data2 and anything that may follow them until it reaches the end of the tag at "]' ... Then after retrieve this data, I want all the data which is called ###More data. – Karizan Mar 29 '18 at 18:03
  • FYI: You could use one of the many online regex testers/visualizers that make it easier to construct (complex) regexes and test them on individual test strings. Examples for such services: https://www.debuggex.com/, http://regexstorm.net/tester, https://regex101.com/ –  Mar 29 '18 at 18:06
  • @elgonzo Thanks for telling me, I've been using regex101.com but I had no luck with it, which is why I sought help. – Karizan Mar 29 '18 at 18:08
  • @WiktorStribiżew How do you make sure that uniqueid in start tag is the same in the end tag? – Eser Mar 29 '18 at 18:39
  • Like [`\[start:\*\*(\w+)\*\*:([^:]+):([^:]+):[^][]*]\*\*#+(.*?)\*\*\[/end:\*\*\1\*\*]`](http://regexstorm.net/tester?p=%5c%5bstart%3a%5c*%5c*%28%5cw%2b%29%5c*%5c*%3a%28%5b%5e%3a%5d%2b%29%3a%28%5b%5e%3a%5d%2b%29%3a%5b%5e%5d%5b%5d*%5d%5c*%5c*%23%2b%28.*%3f%29%5c*%5c*%5c%5b%2fend%3a%5c*%5c*%5c1%5c*%5c*%5d&i=%5bstart%3a**uniqueID**%3adata1%3adata2%3a%28..%29%5d**%23%23%23More+Data**%5b%2fend%3a**uniqueID**%5d) – Wiktor Stribiżew Mar 29 '18 at 18:41
  • @WiktorStribiżew now what is better than BurnsBA's answer below? (I ask this because your first comment started with *use this*) – Eser Mar 29 '18 at 18:45
  • @WiktorStribiżew I would guess the `data`n are each preceded by colon? @Karizan is your `(...)` to indicate you can have more than two `data`n, each preceded by colon? – NetMage Mar 29 '18 at 18:47
  • I do not quite get the requirements, so what I suggest is just a hint for OP. – Wiktor Stribiżew Mar 29 '18 at 19:48

1 Answers1

0

Something like

var matchExpr = @"\[start:([^:]+):[^]]+\](.*)\[/end:\1\]";

Breaking it down:

// Match the starting tag
var matchExpr = @"\[start:" + 

// capture the unique id
"([^:]+):" + 

// move over everything until the closing character of the start
// tag, which is the "]" character
"[^]]+\]" + 

// the capture group of interest
"(.*)" + 

// The end tag. Note the back reference to the unique id
// from the start tag.
"\[/end:\1\]";  

try it here

BurnsBA
  • 2,890
  • 16
  • 29