python正则表达式的括号

 

今天用到python的正则表达式时,遇到括号()的用法问题,折腾了一晚上,这里记录一下。

python的正则表达式中,括号()的用法是比较特殊的,分为捕获(capturing)和非捕获(non-capturing)两个版本,官方文档是这么说的:

(…) Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below. To match the literals ‘(‘ or ‘)’, use ( or ), or enclose them inside a character class: [(], [)].

(?:…)

A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

感觉不在说人话……直接上例子:

  • ()示例:
    >>> s = r"13:52:02.075781"
    >>> print(re.findall(r"(\d{2}:){2}\d{2}", s))
    ['52:']
    
  • (?:)示例:
    >>> s = r"13:52:02.075781"
    >>> print(re.findall(r"(?:\d{2}:){2}\d{2}", s))
    ['13:52:02']
    

    给了一个表示时间的字符串,我想要匹配hh-mm-ss,两个例子中只有括号的用法不一样:(?:...)只是将括号里的pattern作为一个整体来看,最终返回的结果是满足整个pattern的,在例子中即(\d{2}:){2}\d{2}。而()多了一层“捕获”,即找到满足整个pattern的子串后,返回()中的捕获内容,在例子中,先匹配到了13:52:02,又因为捕获模式是\d{2}:,所以最后返回的结果是满足该捕获模式的52:

总结:(?:...)用来将括号内pattern作为一个整体,方便多次匹配;而(...)用来捕获pattern其中的一小部分

ref:

  1. https://docs.python.org/3/library/re.html
  2. https://blog.csdn.net/Leonard_wang/article/details/79813425