Regex course – part one. Basic concepts.

JavaScript Regex

Regular expressions (regex) are sequences of characters defining a search pattern. Since it can be extremely useful in programmers everyday life, it was implemented into JavaScript. In this series of articles, I will show you how it works and what are its real-life usages. Hopefully, by the end of this part of the course, you will be able to create your very own first regular expressions. Let’s go!

Ways to create a regular expression

In JavaScript, you can construct a regular expression in two ways. To fully understand it, you need to know that a regular expression is enclosed in two forward slashes. They are not an actual part of the pattern, but they indicate its beginning and end. Using them, you can tell the JS interpreter, that it is dealing with a regex:

The other way is to use a RegExp constructor:

Once you created the object, you can call the test method on it, which takes a string and will return true, if the pattern was matched:

Simple patterns

The simplest type of a regular expression is the one in which you want to find a direct match. The expression  /dog/ will match only if characters occur together in that exact order.

The power of regex lies elsewhere, though. In many situations, you might want to use a more complex pattern.

Special characters

We can do more than just looking for simple occurrences of a certain string. A way to do this is to use special characters. They are not interpreted as a direct part of a searched string, but can describe it in a more generic way.

Any character

It is represented by a dot. It matches any single character except the newline character.

The wildcard is one of the special characters. What if we want to actually match a single dot, not any character?

Backslash

It is used to switch the meaning of a special character to just a regular one. Thanks to that we can actually search for a dot in a text and it will not be interpreted as a special character.

Character set

It is represented by square brackets. This pattern matches one character that might be any of the characters from the brackets.

Just note that special characters like the dot are not special anymore inside of the character set, thus a backslash is not needed there. We can even go a little further and define a range of characters:

Be aware that when it comes to the range of the letters, the capital letters come first. It means, that  /[a-Z]/ would actually throw an error.

Uncaught SyntaxError: Invalid regular expression: /[a-Z]/: Range out of order in character class

How about a little challenge? Create a regular expression that would match strings in a fiddle, but would not match certain strings:

You can easily get a negated character set by adding ^ sign. It will match anything that is not enclosed in the brackets.

Multiple repetitions

A very useful feature is to match an exact number of occurrences of some expression. You can achieve that with curly brackets. Let’s create a function that will take a string and decide if it is a valid phone number. The format that we will use is this:

where x is a number between 0 and 9.

Note that we have some customization here:

  • {x}    matches exactly x occurrences
  • {x,}   matches at least x occurrences
  • {x,y} matches at least x occurrences and not more than y occurrences

Zero or more repetitions

With the asterisk * we can match an expression 0 or more times. It is actually equivalent to {0,}

With that, we can easily construct a pattern that will match any number of any characters:  /.*/

Flags

You can add a little more to your regex then a pattern. Flags are additional modifiers affecting the search. If you use slashes to define your regular expressions, you add them after the closing slash. If you use RegExp constructor, you pass them as a second argument. The most significant flags are:

i : ignore case

With this flag, the search is case-insensitive:

As simple as that!

g: global match

Thanks to that flag, all matches will be found. Without it, it would stop after the first match.

String.prototype.replace

You might bump into an opportunity to use it quite quickly. Chances are that you already know a function replace. It returns a new string, replacing its contents if they match a pattern. You can provide such pattern as a string or a regular expression. The tricky part is, that if you do it with the string, you can’t replace all occurrences of a pattern, just a single one. Using the knowledge of the flags, you can easily deal with it:

There are more flags which we will talk about in the next part of the course.

Summary

With all this information you can start writing your own regular expressions and put them to use. There are great tools that I highly recommend that will help you on this. In the future parts of the course, we will learn a lot more advanced concepts, when regular expressions can shine even more, including digging deeper into RegExp object that JavaScript provides. Till then, try exercising the knowledge that you have already and you will see that regular expressions really come in handy. See you next time!

Comments (1)

  1. Hi Marcin, thanks for sharing the knowledge! Just an important note: the range [A-z] will actually match more than letters. As you can see on the ASCII table( http://www.asciitable.com), A to z will also match the symbols [ \ ] ^ _ and `, so be careful and play safe with [A-Za-z] instead or use a flag to ignore the case.

Leave a Reply

Your email address will not be published. Required fields are marked *