Monday, 7 November 2011

Careful when reusing Javascript RegExp objects

I had this operation on a Javascript object that was using a complex regular expression to test for something. Usually, when you want to do that, you use the regular expression inline or as a local variable. However, given the complexity of the expression I thought it would be more efficient to cache the object and reuse it anytime.

Now, there are two gotchas when using regular expressions in Javascript. One of them is that if you want to match on a string multiple times, you need to use the global flag. For example the code
var reg=new RegExp('a',''); //the same as: var reg=/a/;
alert('aaa'.replace(reg,'b'));
will alert 'baa', because after the first match and replace, the RegExp object returns from the replace operation. That is why I normally use the global flag on all my regular expressions like this:
var reg=new RegExp('a','g'); //the same as: var reg=/a/g;
alert('aaa'.replace(reg,'b'));
(alerts 'bbb')

The second gotcha is that if you use the global flag, the lastIndex property of the RegExp object remains unchanged for the next match. So a code like this:
var reg=new RegExp('a',''); //same as: /a/;
 
reg.test('aaa');
alert(reg.lastIndex);
 
reg.test('aaa');
alert(reg.lastIndex);
will alert 0 both times. Using the global flag will lead to alerting 1 and 2.

The problem is that the solution to the first gotcha leads to the second like in my case. I used the RegExp object as a field in my object, then I used it repeatedly to test for a pattern in more strings. It would work once, then fail, then work again. Once I removed the global flag, it all worked like a charm.

The moral of the story is to be careful of constructs like _reg.test(input);
when _reg is a global regular expression. It will attempt to match from the index of the last match in any previous string.


Also, in order to use a global RegExp multiple times without redeclaring it every time, one can just manually reset the lastIndex property : reg.lastIndex=0;

Update: Here is a case that was totally weird. Imagine a javascript function that returns an array of strings based on a regular expression match inside a for loop. In FireFox it would return half the number of items that it should have. If one would enter FireBug and place a breakpoint in the loop, the list would be OK! If the breakpoint were to be placed outside the loop, the bug would occur. Here is the code. Try to see what is wrong with it:
types.forEach(function (type) {
if (type && type.name) {
var m = /(\{tag_.*\})/ig.exec(type.name);
// type is tag
if (m && m.length) {
typesDict[type.name] = m[1];
}
}
});
Click here to see the answer

3 comments:

  1. Thanks! I was struggling with this issue about an hour until I figured out that it was something wrong with regexp itself.

    ReplyDelete
  2. As I began to narrow in on a bug I had previously noticed, I quickly recognized the behavior and your article was the first result on a said for this behavior quickly confirming my suspicion and resolving the issue. Thanks!

    ReplyDelete
  3. Awesome. Super thanks. I was stuck on this just now and the lastIndex=0 was exactly what I needed!

    ReplyDelete