Sunday, November 13, 2011

grep()-ing for decimal points with regular expressions

Decimal points/periods are metacharacters that have special meaning in regular expression.

You normally have to "escape" metacharacters in R regular expressions if you want to search for the literal character. Escaping is done with a double slash "\\" before the metacharacter.

At least for decimal points (perhaps all metacharacters) you don't have to escape them if they are contained within brackets. All of the following grep() statements will find decimal points.

grep("[\\.]", test$orig_tag)
grep("[.]", test$orig_tag)
grep("\\.", test$orig_tag)

No comments:

Post a Comment