Improve regex performance – Blancer.com Tutorials and projects

I need help to improve regex performance (speed of execution) for ECMAscript (JavaScript on NodeJS 20) of:

/[u0000-u001fu0022u005cud800-udfff]|[ud800-udbff](?![udc00-udfff])|(?:[^ud800-udbff]|^)[udc00-udfff]/

This regular expression is designed to match certain Unicode characters in a string. Let’s break it down:

/[u0000-u001fu0022u005cud800-udfff]/: This part matches any
character in the range of u0000 to u001f, the characters
u0022 (quotation mark ") and u005c (backslash ), and any
character in the range of ud800 to udfff. This range
ud800-udfff covers surrogate pairs, which are used in UTF-16
encoding for characters outside of the basic multilingual plane
(BMP).
|: OR operator
[ud800-udbff](?![udc00-udfff]): This part matches the first
part of a surrogate pair (ud800 to udbff) but only if it’s not
followed by the second part of a surrogate pair (udc00 to
udfff). This ensures that only valid surrogate pairs are matched.
|: Again, an OR operator.
(?:[^ud800-udbff]|^)[udc00-udfff]: This part matches the second
part of a surrogate pair (udc00 to udfff) but only if it’s not
preceded by the first part of a surrogate pair or if it’s at the
beginning of the string. The ^ inside the square brackets [^]
denotes negation, meaning any character other than the ones
specified.

In summary, this regular expression is used to match and handle surrogate pairs correctly in Unicode strings, ensuring proper validation and handling of UTF-16 encoded characters.

regex101

Speed test over a string of 1000 chars

const REGEX = /[u0000-u001fu0022u005cud800-udfff]|[ud800-udbff](?![udc00-udfff])|(?:[^ud800-udbff]|^)[udc00-udfff]/

const start = performance.now()
console.log(REGEX.test('A'.repeat(10000) + '"')) // true
console.log(performance.now() - start)

Are there improvements that can be made?

The purpose is to have a very fast check for strings that do not need escaping while not being much slower than JSON.stringify() in case the input requires escaping.