regex 

Send to Kindle
home » snippets » python » regex





Code Meaning
\d a digit
\D a non-digit
\s whitespace (tab, space, newline, etc.)
\S non-whitespace
\w alphanumeric
\W non-alphanumeric

Anchoring

Code Meaning
^ start of string, or line
$ end of string, or line
\A start of string
\Z end of string
\b empty string at the beginning or end of a word
\B empty string not at the beginning or end of a word
Syntax Description
(?P<name>...) Capture group with name "name". To refer to this in the same regex, use (?P=name) and to refer to it in a substitution, use \g<name>
(?=...) (Positive) Lookahead assertion:  Matches if ... matches next, but doesn’t consume any of the string.
(?!...) Negative Lookahead assertion:  Matches if ... does not match next, but doesn’t consume any of the string.
(?<=...) Positive lookbehind assertion:  Succeeds only when the current position is preceded by a match for ....  The contained ... must only match strings of some fixed length, meaning that abc or a
(?<!...) Negative lookbehind assertion:  Similar to the Positive lookbehind assertion but requires ... to not precede the current position in the string.


Match objects


Named capture groups


Specifing flags / compilation options.

Ref: Compilation Flags

Flag Meaning
DOTALL, S Make . match any character, including newlines
IGNORECASE, I Do case-insensitive matches
LOCALE, L Do a locale-aware match
MULTILINE, M Multi-line matching, affecting ^ and $
VERBOSE, X Enable verbose REs, which can be organized more cleanly and understandably.
UNICODE, U Makes several escapes like \w, \b, \s and \d dependent on the Unicode character database.


Snippets

text = "There are 24 hours in a day, 7 days in a week, 4 weeks in a month"

r = re.compile(r'\d+')

m = r.search("text")

c = itertools.count(1)
re.sub(r'\d+', lambda m: str(c.next()), in_this_text)
re.sub(r'index = (?P<counter>\d+)', lambda m: "index = {0}".format(c.next()), in_this_text)


# Verbose / multiline regex.
regex = re.compile(ur"""
    \$ (?:
      (?P<name>\w+) |
      # this is incorrect - it doesn't handle } inside the expression.
      \{(?P<expression>[^}]+)\}
    )
    """, re.VERBOSE)

# You can also use the (?x) flag instead of using re.VERBOSE
regex = re.compile(ur"""
    (?x)
    Verbose multiline regex # comment
    """)