Linux – sed -r vs. sed – exactly how are the regex possibilities extended

bashlinuxregexsed

In bash, as I understand it, I can use characters like . & ^ * in regular expressions with sed, but the -r option changes the nature of how regular expressions are, uh, expressed, kinda like grep vs. grep -E. But I can't find any summary of exactly HOW the syntax changes. Is there a list somewhere? Am I being naive in thinking this is the kind of thing that it ought to be possible to summarize in a table that could be printed on a couple of pages?

Do the characters that work with plain old non-extended sed regex expressions, still work the same way with the -r option? In other words are expressions that are valid WITHOUT the -r option, still valid, and still mean the same thing, WITH the -r option? Like they were a subset of the expressions valid WITH the -r option?

I keep thinking there must be a pithy summary of the difference with examples somewhere.

Best Answer

According to info sed, Extended regexps are those that 'egrep' accepts; they can be clearer because they usually have less backslashes, but are a GNU extension and hence scripts that use them are not portable. egrep being a synonym for grep -E.

This is indeed the case: without:

echo "abcdef" | sed 's/\([cd]\+\)/\U\1/'
abCDef

With:

echo "abcdef" | sed -r 's/([cd]+)/\U\1/'
abCDef

Some expressions may be valid with both, but in many cases they will be interpreted differently. The character escaping logic in regular, POSIX-compliant sed totally escapes me.