Regex Tutorial [Part – 4]

In previous post, there was discussion about groups and alternation used in regex. This post explains about common shorthand symbols and inline modifiers in regex.

Shorthand Symbols

There are many shorthand symbols available in regex. Here is a list of few of them

Shorthand SymbolsMeaning
\tA tab
\nA new line
\rA carriage return
\sA whitespace character (vertical space, horizontal space, new line etc.)
\SA non-whitespace character (equivalent to [^s])
\dA digit(equivalent to [0-9])
\DA non-digits(equivalent to [^0-9] or [^\d])
\wAn alphanumeric or _(equivalent to [a-z0-9A-Z_])
\WAnything other than non alphanumeric or _(equivalent to [^a-z0-9A-Z_] or [^\w])
\bWord boundary
\BNon-word boundary

While most of the symbols are self-explanatory, \b needs an explanation. e.g. regex cat will be matched in string tom cat as well as tomcatx. On other hand if regex used is \bcat\b, it will match tom cat but won’t match tomcatx or tomcat or cattom. So  \b acts as word boundary and is of zero-width i.e. it do not consumes any string.

Positions of word boundary include:

  1. Start of string
  2. End of string
  3. Between two characters where one is word character( \w) and other is non-word character( \W).

e.g. the positions mentioned in below string qualifies for word boundary by corresponding rules mentioned above(spaces only for clarity here)

Inline Modifiers

Inline modifiers can be used alongside regex to specify some actions. Most commonly used of them are

Inline ModifiersMeaning
(?i)Ignore case. Matches the pattern without case sensitivity
(?s). matches new line as well as carriage return
(?m)^ and $ acts as start and end of line

Inline modifiers is used in regex as:  (?i)aBCd

  1. If there are multiple inline modifiers, it can be specified as (?smi).*.
  2. Unless specified, entire string including new line is considered on which regex pattern is matched. (?m) modifier specifies that the pattern should only match within a line and not in entire string. For a string, there can be multiple lines obtained after splitting on \n.

Summary

This part explained two of the most useful concepts in regex. Next part will dive further in regex.

Leave a Reply

Your email address will not be published.