Regular Expression to Find HTML Comments in Java

Regular Expression to Find HTML Comments in Java

Problem

You would like to select the contents of all comments in an HTML (or XML) document using Java Regular Expressions.

For instance:

Solution

There is a powerful regular expression, which can be found on this page.

static final String commentRegex = "(// )?\\<![ \\r\\n\\t]*(--([^\\-]|[\\r\\n]|-[^\\-])*--[ \\r\\n\\t]*)\\>";

However, this regular expression might lead to your application to 'hang' if there are (bad, bad input) documents with starting comments without matching comment end, like: