-1

I am receiving blobs of text from an IRC server, and the data is not always sent in a consistent format.

This is the format that I'm expecting.

:hey!hey.tmi.twitch.tv PRIVMSG #stream :message here\r\n

I've developed this regex pattern to match it:

:[^()]+![^()]+.tmi.twitch.tv PRIVMSG #[^()]+ :[^()]+\\r\\n

And I'm doing this to assign the matches into a list:

newlist = re.findall(':[^()]+![^()]+.tmi.twitch.tv PRIVMSG #[^()]+ :[^()]+\\r\\n',str(string))

But, for example, when the input is sent in duplicate (doesn't happen often, but does happen)

:hey!hey.tmi.twitch.tv PRIVMSG #stream :message here\r\n:otherhey!otherhey!tmi.twitch.tv PRIVMSG #otherstream :othermessage here\r\n

It matches the whole string.

SO! I'm trying to combine this regex: ^[^PRIVMSG]*PRIVMSG[^PRIVMSG]*$ With the other one, so that the findall will return me every instance in the string that the full pattern is matched.

But the result is that I don't get any matches. What am I missing? Any help appreciated.

Konrad Rudolph
  • 482,603
  • 120
  • 884
  • 1,141
Terik Brunson
  • 177
  • 12
  • 3
    Maybe it would be easier to split the input at `\r\n` and work on the individual lines. – Wups Oct 03 '20 at 10:18
  • 2
    `[^PRIVMSG]` means match one character which is not `G` or `I` or `M` or `P` or `R` or `S` or `V`. You probably want a negative lookahead `(?!PRIVMSG)` but your regex around it needs a bigger overhaul. Perhaps see also the discussion of negation in the beginner tips section in the [Stack Overflow `regex` tag info page.](/ags/regex/info) – tripleee Oct 03 '20 at 10:24
  • Sorry for the typo; the link should be [/tags/regex/info](/tags/regex/info) – tripleee Oct 03 '20 at 11:57

1 Answers1

-1

This is because the quantifiers in your regex is acting greedy. making it non-greedy will make it work.

:[^()]+?![^()]+?.tmi.twitch.tv PRIVMSG #[^()]+? :[^()]+?\\r\\n

Demo

Python example

import re
text=":hey!hey.tmi.twitch.tv PRIVMSG #stream :message here\r\n:otherhey!otherhey!tmi.twitch.tv PRIVMSG #otherstream :othermessage here\r\n"
print(re.findall(r":[^()]+?![^()]+?.tmi.twitch.tv PRIVMSG #[^()]+? :[^()]+?\r\n",text))

Output

[':hey!hey.tmi.twitch.tv PRIVMSG #stream :message here\r\n',
 ':otherhey!otherhey!tmi.twitch.tv PRIVMSG #otherstream :othermessage here\r\n']
Liju
  • 1,799
  • 2
  • 3
  • 16