Am I the only one shocked to learn that to find something at the end of a string it starts at the beginning? Perhaps it’s because of the simplicity of the example but I expected it to start at the end.
Regular expressions are great and can always be matched in linear time with respect to the input string length.
The problem is that JS standard library RegExps aren’t actually regular expressions, but rather a much broader language, which is impossible to implement efficiently. If RegExp switched to proper regular expressions, they would match much faster but supporting backreferences like /(.*)x\1/ would be impossible.
If you insist on the definition as it is in formal language theory.
In practice regex is widely used to mean the pattern matching thing that also supports back references.
Wikipedia suggests using the term “regular expressions” for the language theory thing and “regex” for the programming language (PCRE) thing. I agree and would even go further and say that any time one wants to refer to the concept as it is used in formal language theory they should explicitly specify that they are talking about the theoretical concept, not the regex implementation as it is found in most programming languages.
The visualization was great! The double loops jump out immediately and make it easy to recognize problematic expressions.
Although I haven’t fully read this article
feel free to crosspost in:Ah, I didn’t realise there was a regex channel here. Thanks!
This is why we need regex licenses https://regexlicensing.org/
/s
That’s brilliant!
guide to the dangers of Javascript, no?
While this article is about JavaScript specifically, these issues certainly exist in other regex engines too.
No
Is there one thing not screwed up in this language? I mean it’s regex, there are so many good implementations for it.
JavaScript’s regex engine isn’t the only one to have these problems. There certainly are other implementations, like Re2 and Rust’s implementation, that don’t have this issue. But they also lack some of the features of the JS implementation too.
Ok thanks for the clarification.
I would argue, the gold standard of regex would be perlre or even re from python. I never heard one discouraging using them. Do you know sth I don’t?
Both Perl and Python use backtracking regex engines and are thus susceptible to similar problems as discussed in the OP.