How to convert any string into a valid custom pattern using Python?

Question

I want my string to only have alphanumeric characters, -, and underscores. Thats it. I am trying to write a method that takes in a user input string and converts it so that it follows the guideline.

My regex is obviously a-zA-Z0-9_-. What I want to do is replace all the spaces with the -, and just remove all the other characters that don't fall under my regex.

So, the string 'Hello, world!' would get converted into 'Hello-world'. The special characters get removed, and the space is replaced with a -.

What would be the most efficient way to do this using python? Do I have to iterate over the entire string character by character, or is there a better way? Thanks!

Does your output include digits? They are alphanumeric, but fail your regex — Patrick Haugh, Jan 31 '17 at 15:54
Could it be, that you need this for forming a url of a title? — ppasler, Jan 31 '17 at 15:54
@PatrickHaugh No digits, just A to Z (in both upper and lowercase, dash(-) and underscore(_) allowed. I made a mistake before. Its fixed now. — darkhorse, Jan 31 '17 at 15:56
Then have a look at this: http://stackoverflow.com/questions/5574042/string-slugification-in-python — ppasler, Jan 31 '17 at 15:57

Psidom · Accepted Answer · 2017-01-31T16:02:54.603

You can do it with two subs: 1) replace spaces with -; 2) remove other unwanted characters:

s = 'Hello, world!'

import re
re.sub("[^a-zA-Z_-]", "", re.sub("\s+", "-", s))
# 'Hello-world'

If you want to keep digits in your string:

re.sub("[^a-zA-Z0-9_-]", "", re.sub("\s+", "-", s))
# 'Hello-world'

Here [^a-zA-Z_-] matches a single character that is not a letter(upper and lower case), underscore and dash, the dash needs to be placed at the end of the character class [] so that it won't be treated as range but literal.

Tristan · Answer 2 · 2017-01-31T16:12:13.920

What you want is also often used when generating URL names for content. It is implemented in django.utils.text.slugify. The slugify function converts to lowercase though. Here is a simplified version of Djangos slugify function that preserves case:

import re
def slugify(value):
    value = re.sub('[^A-Za-z_\s-]', '', value, flags=re.U).strip()
    return re.sub('[-\s]+', '-', value, flags=re.U)
print(slugify("Hello World!"))
# Hello-World

How to convert any string into a valid custom pattern using Python?

2 Answers2