Finding out why a regular expression does not match a given string can very tedious. I would like to write a utility that identifies the sub-expression causing the non-match. My idea is to use a parser to create a tree representing the complete regular expression. Then I could simplify the expression by dropping sub-expressions one by one from right to left and from bottom to top until the remaining regex matches.

The last sub-expression dropped should be (part of) the problem.

As a first step, I am looking for a parser for Python regular expressions, or a Python regex grammar to create a parser from.

But may be my idea is flawed? Or a similar (or better) tools already exists? Any advice will be highly appreciated!