Improve regex performance

I need help to improve regex performance (speed of execution) for ECMAscript (JavaScript on NodeJS 20) of:

/[u0000-u001fu0022u005cud800-udfff]|[ud800-udbff](?![udc00-udfff])|(?:[^ud800-udbff]|^)[udc00-udfff]/

This regular expression is designed to match certain Unicode characters in a string. Let’s break it down:

  • /[u0000-u001fu0022u005cud800-udfff]/: This part matches any
    character in the range of u0000 to u001f, the characters
    u0022 (quotation mark ") and u005c (backslash ), and any
    character in the range of ud800 to udfff. This range
    ud800-udfff covers surrogate pairs, which are used in UTF-16
    encoding for characters outside of the basic multilingual plane
    (BMP).
  • |: OR operator
  • [ud800-udbff](?![udc00-udfff]): This part matches the first
    part of a surrogate pair (ud800 to udbff) but only if it’s not
    followed by the second part of a surrogate pair (udc00 to
    udfff). This ensures that only valid surrogate pairs are matched.
  • |: Again, an OR operator.
  • (?:[^ud800-udbff]|^)[udc00-udfff]: This part matches the second
    part of a surrogate pair (udc00 to udfff) but only if it’s not
    preceded by the first part of a surrogate pair or if it’s at the
    beginning of the string. The ^ inside the square brackets [^]
    denotes negation, meaning any character other than the ones
    specified.

In summary, this regular expression is used to match and handle surrogate pairs correctly in Unicode strings, ensuring proper validation and handling of UTF-16 encoded characters.

regex101

Speed test over a string of 1000 chars

const REGEX = /[u0000-u001fu0022u005cud800-udfff]|[ud800-udbff](?![udc00-udfff])|(?:[^ud800-udbff]|^)[udc00-udfff]/

const start = performance.now()
console.log(REGEX.test('A'.repeat(10000) + '"')) // true
console.log(performance.now() - start)

Are there improvements that can be made?

The purpose is to have a very fast check for strings that do not need escaping while not being much slower than JSON.stringify() in case the input requires escaping.