2

I'm trying to program a vocabulary game.

I am using a regular expression to hide my word, which I have to guess. I'm not comfortable with the syntax used with regular expressions - outwith the simple examples, I get very confused.

Take for example the verb

'to crank (sth) up'

I want to transform that into:

to   _ _ _ _ _   (sth)  _ _ 

The programme will feed from a vocabulary CSV file. My convention is to add (sth) or (smb) for transitive verbs. I don't want to hide those bits between brackets. Likewise, I don't want to hide the to that denotes the infinitive tense.

The transformations I'm applying so far are:

chosen_word = "to crank (sth) up"

# To make the space between words double for better legibility
hidden_word = re.sub("\s", "  ", chosen_word)

# To hide the letters of the word 
hidden_word = re.sub("[a-z]", "_ ", hidden_word)

But that results in:

_ _    _ _ _ _ _   ( _ _ _ )  _ _

How can I code a re.sub() method that transforms all alphabetical characters to _ except the patterns to and sth and smb?

mkrieger1
  • 10,793
  • 4
  • 39
  • 47
  • 2
    Does this answer your question? [Regex: match everything but specific pattern](https://stackoverflow.com/questions/1687620/regex-match-everything-but-specific-pattern) – Chayim Friedman Jan 10 '21 at 23:31
  • 2
    Although I flagged as a duplicate, I want to say that regex is probably not the right way to solve that – Chayim Friedman Jan 10 '21 at 23:32

1 Answers1

2

You can capture the exclusions and then use a dynamic replacement pattern:

hidden_word = re.sub(r"(\bto\b|\(s(?:th|b)\))|[a-z]", lambda x: x.group(1) or "_ ", hidden_word)

See the Python demo. Regex details:

  • (\bto\b|\(s(?:th|b)\)) - Group 1: either a whole word to or (sth) or (sb)
  • | - or
  • [a-z] - a lowercase ASCII letter
  • lambda x: x.group(1) or "_ " - the match is either replaced with Group 1 value (if it was matched) or with an underscore plus space otherwise.
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • Many thanks Wiktor, it works wonderfully! To ensure I understand this bit: `lambda x: x.group(1) or "_ "` What is the relationship between the integer '1' and the previous RegEx? You talk about Group 1 - is that the RegEx _captured_ with the outermost, non-escaped parentheses, which is to the left of the '|' symbol? – Xavier Villà Aguilar Jan 11 '21 at 19:52
  • @XavierVillàAguilar See the `(\bto\b|\(s(?:th|b)\))` explanation, it is *Group 1*. `x.group(1)` refers to the part of a match captured with that group. Capturing groups are set with a pair of unescaped parentheses. – Wiktor Stribiżew Jan 11 '21 at 20:55