Questions tagged [regex]

Regular expressions provide a declarative language to match patterns within strings. They are commonly used for string validation, parsing, and transformation. Because regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool. Do not post questions asking for an explanation of what a symbol means or what a particular regular expression will match.

IMPORTANT NOTE: Requests to explain a regular expression pattern or construct will be closed as duplicates of the canonical post What does this regex mean which contains a lot of details on regular expression constructs. The post also contains links to many popular online regular expression testers (where the meanings of regex constructs can be found). One such tool is Regex101.


Regular expressions are a powerful formalism for pattern matching in strings. They are available in a variety of dialects (also known as flavors) in a number of programming languages and text-processing tools, as well as many specialized applications. The term "Regular expression" is typically abbreviated as "RegEx" or "regex".

Before asking a question here, please take the time to review the following brief guidelines.

How To Ask

  • Be clear about what you need.

    Always indicate which platform you need or want to use (programming language, tool, occasionally even version information). Keep in mind that regex dialects are different; the lowest common denominator will usually be quite different from what is possible and recommended for a tool with a modern, souped-up regex engine.

    Also, are you looking for a regular expression for input validation (which needs to be rather strict), or do you need one for information extraction (which can be somewhat relaxed)?

    If your question relates to regular expressions in the strict computer science/automata theory sense, please state this explicitly.

    For most other questions, you should always include sample input, expected output, and an outline of what you have tried, and where you are stuck. Often, an example of what you do not want to match is also very helpful, and important to know.

  • Show us what you tried.

    A link to one of the many online regex testing tools (see link section) with your attempt and some representative data can do wonders.

    Even if you cannot post your problem online, showing us your best attempt helps us focus on what you need help with.

  • Search for duplicates.

    Before posting, check if your issue has already been solved by somebody else asking something similar. The following section outlines some common recurring topics.

Avoid Common Problems and Pitfalls

  • Do not assume that the tool you are using supports precisely the syntax of another tool.

    While modern Perl/Ruby/Python/PHP/Java regular expression support is widespread, you cannot assume that it is universal. In particular, many older tools (Awk, sed, grep, lex, etc.), as well as some newer ones (JavaScript, many text editors), use different dialects, some of which do not necessarily support e.g. non-capturing parentheses (?:...), non-greedy quantifiers *?, backreferences (\1, \2, etc), common character class abbreviations (\t, \d, POSIX character classes [[:class:]]), arbitrary repetition {m,n}, lookaheads (?=...), (?<=...), (?!...), etc. etc.

    If your question is not specific to any particular implementation, try the tag. This will generally imply a fairly minimal set of operators, corresponding to the ones specified in the common mathematical definition of regular languages.

  • Understand the difference between "glob" expressions and true regular expressions.

    Glob patterns are a less potent pattern matching language, which is commonly used for file name wildcards. In glob, * means "anything", while a lone * in a regular expression is, in fact, a syntax error in some dialects (though many engines will silently ignore it, rather than issue a warning; and others still will see it as a literal *).

    For the record, the regex way to say (as much as possible of) "anything" is .* where the "any single character (except newline, usually)" . metacharacter is repeated zero or more times (*). But see below about how "any character" and greediness is often problematic.

    See also What are the differences between glob-style pattern and regular expression?

  • Specifying a single repetition is unnecessary.

    Using {1} as a single-repetition quantifier is harmless but never useful. It is basically an indication of inexperience and/or confusion.

    h{1}t{1}t{1}p{1} matches the same string as the simpler expression http (or ht{2}p for that matter) but as you can see, the redundant {1} repetitions only make it harder to read.

  • Square brackets are commonly misunderstood or misused.

    Beginners often attempt to use square brackets for everything, including grouping. While [Jun][Jul] may look like a regex for matching months, it actually matches JJ, Ju, Jl, uJ, uu, ul, nJ, nu, or nl; not Jun or Jul. [Jun|Jul] is a wasteful way to write the functionally identical [|Junl]—it matches any one character from the set comprising |, J, u, l, and n.

    For the record, [abc] defines a character class which matches a single character which can be a or b or c. The proper way to express alternation is (Jun|Jul|Aug) in many dialects (though BRE and related dialects will need backslashes; \(Jun\|Jul\|Aug\) for traditional grep et al.) or, somewhat more parsimoniously, (Ju[nl]|Aug)

  • Negation is tricky.

    Related to the previous, beginners will use negated character classes to attempt to restrict what can be matched. For example, to match turn but not turned, the following does not do what you want: turn[^ed] -- it will match turn followed by any single character which is not e or d (so it will not match turner, for example).

    In fact, the traditional regex does not allow for this to be expressed easily. With ERE, you could say turn($|[^e]|e$|e[^d]) to say that turn can be followed by nothing, or a character which is not e, or by e if it is not in turn followed by d. Modern regular expression dialects have an extension called lookarounds which allow you to say turn(?!ed)—but make sure your tool supports this syntax before plunging ahead.

    Notice also how the character class negation operator is distinct from the beginning of line anchor (^[abc] matches a, b, or c at beginning of the line, whereas [^abc] matches a single character which is not a, b, or c).

    See also the next bullet point.

  • If there is a way to match, the engine will find it.

    A common beginner's mistake is to supply useless optional leading or trailing elements. The trailing s? in dogs? does nothing to prevent a match on doggone or endogenous. If you want to prevent those, you will need to elaborate—perhaps something like dogs?\> (provided your dialect supports the final word boundary operator and provided that's what you mean).

    As it is, the regular expression dogs? will match exactly the same strings as just dog (though if your application captures the match, only the former will capture a trailing s if there is one).

  • Matches are greedy.

    The regex a.*b will match the entire string "abbbbbb" because * will always match as much as possible. Say a[^ab]*b if that's what you mean, or use non-greedy matching if your dialect supports it.

  • Watch what you capture

    If you use grouping parentheses, the parentheses define what is captured into a backreference. If you edit in parentheses for grouping purposes, make sure you are not renumbering your backreferences.

    Also, in particular, watch out for (abc){2,3} which only captures the last occurrence of abc in the matched string. If you want the repetition to be part of the capture, it needs to be inside the parentheses, like this: ((abc){2,3})

  • Don't use regex for everything!

    In particular, using (typically line-oriented) traditional regex tools to handle structured formats like HTML, XML, JSON, configuration files with block structure (Apache, nginx, many name servers, etc.) is likely to fail, or to produce incorrect results in numerous corner cases.

    Asking for HTML regexes tends to be met with negative reactions. The reasoning extends to all structured formats. If there is a parser for it, use that instead.

Further Reading

Learning regular expressions

Books

Documentation for JavaScript

Online sandboxes (for testing and publishing regexes online)

  • RegexPlanet (supports a variety of flavors to choose from)
  • Regexpal (ECMAScript flavor, as implemented by JavaScript)
  • Regexhero (.NET flavor)
  • RegexStorm.net (.NET flavor with link sharing capability)
  • RegExr v2.1 (in JavaScript)
  • RegExr v1.0 (ECMAScript flavor, as implemented by Adobe Flash)
  • reFiddle (in JavaScript, à la jsFiddle)
  • Rubular (Ruby flavor)
  • myregexp.com (Java-applet with source code)
  • regexe.com (German; probably Java flavor)
  • regex101 (in ECMAScript (JavaScript), Python, PHP (PCRE 16-bit), Golang, generates explanation of pattern)
  • regexper.com (generates graphical representation for ECMAScript flavor)
  • debuggex (generates graphical representation and shows processing of pattern – JavaScript, Python, and PCRE-compatible)
  • pyregex.com (Web validator for Python regular expressions)
  • regviz.org (Visual debugging of regular expressions for JavaScript)
  • Ultrapico Expresso (a standalone tool for testing .NET regular expressions)
  • Pythex (Quick way to test your Python regular expressions)

Online Regex generator (for building Regular Expressions via simplified input)

Other links

Regex Uses:

Regular expressions are useful in a wide variety of text processing tasks, and more generally string processing, where the data need not be textual. Common applications include data validation, data scraping (especially web scraping), data wrangling, simple parsing, the production of syntax highlighting systems, and many other tasks.

While regular expressions would be useful on Internet search engines, processing them across the entire database could consume excessive computer resources depending on the complexity and design of the regex. Although in many cases system administrators can run regex-based queries internally, most search engines do not offer regex support to the public. Notable exceptions: searchcode, or previously Google Code Search, which has been shut down in 2012.
Google also offers re2 (a C++ a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python): it does not backtrack and guarantees linear runtime growth with input size.

240636 questions
4865
votes
99 answers

How to validate an email address in JavaScript

Is there a regular expression to validate an email address in JavaScript?
pix0r
  • 30,601
  • 18
  • 82
  • 102
4643
votes
31 answers

Regular expression to match a line that doesn't contain a word

I know it's possible to match a word and then reverse the matches using other tools (e.g. grep -v). However, is it possible to match lines that do not contain a specific word, e.g. hede, using a regular expression?…
knaser
  • 1,371
  • 7
  • 18
  • 16
3567
votes
79 answers

How to validate an email address using a regular expression?

Over the years I have slowly developed a regular expression that validates MOST email addresses correctly, assuming they don't use an IP address as the server part. I use it in several PHP programs, and it works most of the time. However, from time…
acrosman
  • 12,375
  • 10
  • 36
  • 54
1994
votes
16 answers

What is a non-capturing group in regular expressions?

How are non-capturing groups, i.e. (?:), used in regular expressions and what are they good for?
never_had_a_name
  • 80,383
  • 96
  • 257
  • 374
1567
votes
23 answers

How do you use a variable in a regular expression?

I would like to create a String.replaceAll() method in JavaScript and I'm thinking that using a regex would be most terse way to do it. However, I can't figure out how to pass a variable in to a regex. I can do this already which will replace all…
JC Grubbs
  • 34,411
  • 27
  • 64
  • 74
1498
votes
23 answers

How do you access the matched groups in a JavaScript regular expression?

I want to match a portion of a string using a regular expression and then access that parenthesized substring: var myString = "something format_abc"; // I want "abc" var arr = /(?:^|\s)format_(.*?)(?:\s|$)/.exec(myString); console.log(arr); //…
nickf
  • 499,078
  • 194
  • 614
  • 709
1290
votes
5 answers

\d less efficient than [0-9]

I made a comment yesterday on an answer where someone had used [0123456789] in a regex rather than [0-9] or \d. I said it was probably more efficient to use a range or digit specifier than a character set. I decided to test that out today and found…
weston
  • 51,132
  • 20
  • 132
  • 192
1094
votes
3 answers

Negative matching using grep (match lines that do not contain foo)

I have been trying to work out the syntax for this command: grep ! error_log | find /home/foo/public_html/ -mmin -60 OR: grep '[^error_log]' | find /home/baumerf/public_html/ -mmin -60 I need to see all files that have been modified except for…
jerrygarciuh
  • 18,340
  • 23
  • 77
  • 126
1076
votes
9 answers

Is there a regular expression to detect a valid regular expression?

Is it possible to detect a valid regular expression with another regular expression? If so please give example code below.
psytek
  • 8,281
  • 3
  • 15
  • 7
969
votes
42 answers

How to validate phone numbers using regex

I'm trying to put together a comprehensive regex to validate phone numbers. Ideally it would handle international formats, but it must handle US formats, including the following: 1-234-567-8901 1-234-567-8901 x1234 1-234-567-8901 ext1234 1 (234)…
Nicholas Trandem
  • 2,795
  • 5
  • 28
  • 32
875
votes
11 answers

Check whether a string matches a regex in JS

I want to use JavaScript (can be with jQuery) to do some client-side validation to check whether a string matches the regex: ^([a-z0-9]{5,})$ Ideally it would be an expression that returned true or false. I'm a JavaScript newbie, does match() do…
Richard
  • 26,935
  • 26
  • 98
  • 142
872
votes
56 answers

What is the best regular expression to check if a string is a valid URL?

How can I check if a given string is a valid URL address? My knowledge of regular expressions is basic and doesn't allow me to choose from the hundreds of regular expressions I've already seen on the web.
Vitor Silva
  • 15,284
  • 8
  • 30
  • 27
803
votes
14 answers

Regular Expressions: Is there an AND operator?

Obviously, you can use the | (pipe?) to represent OR, but is there a way to represent AND as well? Specifically, I'd like to match paragraphs of text that contain ALL of a certain phrase, but in no particular order.
hugoware
  • 33,265
  • 24
  • 58
  • 70
709
votes
12 answers

How to negate specific word in regex?

I know that I can negate group of chars as in [^bar] but I need a regular expression where negation applies to the specific word - so in my example how do I negate an actual bar, and not "any chars in bar"?
Bostone
  • 34,822
  • 38
  • 158
  • 216
1
2 3
99 100