- 1. Regex course – part one. Basic concepts.
- 2. Regex course – part two. Writing more elegant and precise patterns.
- 3. Regex course – part three. Grouping and using ES6 features.
- 4. Regex course – part four. Avoiding catastrophic backtracking using lookahead
Regular expressions (regex) are sequences of characters defining a search pattern. Since it can be extremely useful in programmers everyday life, it was implemented into JavaScript. In this series of articles, I will show you how it works and what are its real-life usages. Hopefully, by the end of this part of the course, you will be able to create your very own first regular expressions. Let’s go!
Ways to create a regular expression
In JavaScript, you can construct a regular expression in two ways. To fully understand it, you need to know that a regular expression is enclosed in two forward slashes. They are not an actual part of the pattern, but they indicate its beginning and end. Using them, you can tell the JS interpreter, that it is dealing with a regex:
1 |
const regex = /dog/; |
The other way is to use a RegExp constructor:
1 |
const regex = new RegExp('dog'); |
Once you created the object, you can call the test method on it, which takes a string and will return true, if the pattern was matched:
1 2 |
regex.test('dog'); // true regex.test('hot-dog'); // true |
Simple patterns
The simplest type of a regular expression is the one in which you want to find a direct match. The expression /dog/ will match only if characters occur together in that exact order.
1 2 |
/dog/.test('hot-dog'); // true /dog/.test('do games'); // false |
The power of regex lies elsewhere, though. In many situations, you might want to use a more complex pattern.
Special characters
We can do more than just looking for simple occurrences of a certain string. A way to do this is to use special characters. They are not interpreted as a direct part of a searched string, but can describe it in a more generic way.
Any character
It is represented by a dot. It matches any single character except the newline character.
1 2 3 |
const regex = /.og/; regex.test('fog'); // true regex.test('dog'); //true |
The wildcard is one of the special characters. What if we want to actually match a single dot, not any character?
Backslash
It is used to switch the meaning of a special character to just a regular one. Thanks to that we can actually search for a dot in a text and it will not be interpreted as a special character.
1 2 3 4 5 6 7 |
const regex1 = /dog./; regex1.test('dog.'); // true regex1.test('dog1'); // true const regex2 = /dog\./; regex1.test('dog.'); // true regex1.test('dog1'); // false |
Character set
It is represented by square brackets. This pattern matches one character that might be any of the characters from the brackets.
1 2 3 |
/[dfl]og/.test('dog'); // true /[dfl]og/.test('fog'); // true /[dfl]og/.test('log'); // true |
Just note that special characters like the dot are not special anymore inside of the character set, thus a backslash is not needed there. We can even go a little further and define a range of characters:
1 2 |
/[A-z]/.test('a'); // true /[A-z]/.test('Z'); // true |
Be aware that when it comes to the range of the letters, the capital letters come first. It means, that /[a-Z]/ would actually throw an error.
1 |
const pattern = /[a-Z]/; |
Uncaught SyntaxError: Invalid regular expression: /[a-Z]/: Range out of order in character class
How about a little challenge? Create a regular expression that would match strings in a fiddle, but would not match certain strings:
You can easily get a negated character set by adding ^ sign. It will match anything that is not enclosed in the brackets.
1 2 3 |
/[^df]og/.test('dog'); // false /[^df]og/.test('fog'); // false /[^df]og/.test('log'); // true |
Multiple repetitions
A very useful feature is to match an exact number of occurrences of some expression. You can achieve that with curly brackets. Let’s create a function that will take a string and decide if it is a valid phone number. The format that we will use is this:
1 |
+xx xxx xxx xxx |
where x is a number between 0 and 9.
1 2 3 4 5 6 |
function isPhoneNumber(number){ return /\+[0-9]{2} [0-9]{3} [0-9]{3} [0-9]{3}/.test(number); } isPhoneNumber('+12 123 123 123'); // true isPhoneNumber('123212'); // false |
Note that we have some customization here:
- {x} matches exactly x occurrences
- {x,} matches at least x occurrences
- {x,y} matches at least x occurrences and not more than y occurrences
Zero or more repetitions
With the asterisk * we can match an expression 0 or more times. It is actually equivalent to {0,}
With that, we can easily construct a pattern that will match any number of any characters: /.*/
Flags
You can add a little more to your regex then a pattern. Flags are additional modifiers affecting the search. If you use slashes to define your regular expressions, you add them after the closing slash. If you use RegExp constructor, you pass them as a second argument. The most significant flags are:
i : ignore case
With this flag, the search is case-insensitive:
1 2 |
/dog/i.test('dog'); // true new RegExp('dog', 'i').test('DoG'); |
As simple as that!
g: global match
Thanks to that flag, all matches will be found. Without it, it would stop after the first match.
String.prototype.replace
You might bump into an opportunity to use it quite quickly. Chances are that you already know a function replace. It returns a new string, replacing its contents if they match a pattern. You can provide such pattern as a string or a regular expression. The tricky part is, that if you do it with the string, you can’t replace all occurrences of a pattern, just a single one. Using the knowledge of the flags, you can easily deal with it:
1 2 3 4 5 |
const lorem = 'lorem_ipsum_dolor_sit_amet'; lorem.replace('_', ' '); // 'lorem ipsum_dolor_sit_amet' lorem.replace(/_/g, ' '); // 'lorem ipsum dolor sit amet' |
There are more flags which we will talk about in the next part of the course.
Summary
With all this information you can start writing your own regular expressions and put them to use. There are great tools that I highly recommend that will help you on this. In the future parts of the course, we will learn a lot more advanced concepts, when regular expressions can shine even more, including digging deeper into RegExp object that JavaScript provides. Till then, try exercising the knowledge that you have already and you will see that regular expressions really come in handy. See you next time!
Hi Marcin, thanks for sharing the knowledge! Just an important note: the range [A-z] will actually match more than letters. As you can see on the ASCII table( http://www.asciitable.com), A to z will also match the symbols [ \ ] ^ _ and `, so be careful and play safe with [A-Za-z] instead or use a flag to ignore the case.