今天用到python的正则表达式时,遇到括号()的用法问题,折腾了一晚上,这里记录一下。
python的正则表达式中,括号()的用法是比较特殊的,分为捕获(capturing)和非捕获(non-capturing)两个版本,官方文档是这么说的:
(…) Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below. To match the literals ‘(‘ or ‘)’, use ( or ), or enclose them inside a character class: [(], [)].
(?:…)
A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.
感觉不在说人话……直接上例子:
- ()示例:
>>> s = r"13:52:02.075781" >>> print(re.findall(r"(\d{2}:){2}\d{2}", s)) ['52:']
- (?:)示例:
>>> s = r"13:52:02.075781" >>> print(re.findall(r"(?:\d{2}:){2}\d{2}", s)) ['13:52:02']
给了一个表示时间的字符串,我想要匹配hh-mm-ss,两个例子中只有括号的用法不一样:
(?:...)
只是将括号里的pattern作为一个整体来看,最终返回的结果是满足整个pattern的,在例子中即(\d{2}:){2}\d{2}
。而()
多了一层“捕获”,即找到满足整个pattern的子串后,返回()
中的捕获内容,在例子中,先匹配到了13:52:02,又因为捕获模式是\d{2}:
,所以最后返回的结果是满足该捕获模式的52:
总结:(?:...)
用来将括号内pattern作为一个整体,方便多次匹配;而(...)
用来捕获pattern其中的一小部分
ref:
- https://docs.python.org/3/library/re.html
- https://blog.csdn.net/Leonard_wang/article/details/79813425