How do I improve benchmark accuracy in Javascript?

I am looking for a way to benchmark the performance of a function in Javascript, so that I can optimise it more effectively. The resulting number (average tick duration) should be accurate across runs/tests, ideally no more than 2% deviation between runs, but the code I have written frequently has a deviation of more than 30%.

Here is a simplified version of my code:

let j = 0;
while (true) {
   // Collect garbage from previous run
   for (let i = 0; i < 10; i++) {
      global.gc();
   }
   
   // Reset the board state
   Board.reset();

   // Seed the random number generator
   SRandom.seed(40404040404);
   
   Board.setup();
   SERVER.setup();

   // Warm up the JIT
   for (let i = 0; i < 50; i++) {
      SERVER.tick();
   }
   
   const numTicks = 5000;
   
   const startTime = performance.now();

   for (let i = 0; i < numTicks; i++) {
      SERVER.tick();
   }

   const timeElapsed = performance.now() - startTime;
   const averageTickTimeMS = timeElapsed / numTicks;
   console.log("(#" + (j + 1) + ") Average tick MS: " + averageTickTimeMS);
   j++;
}

The code runs on node JS, and when I run it I close as many other programs as I can, typically leaving only chrome, VSCode (what I use to run the code), and a terminal open.

This does benchmark a game, so there is a lot of random number generation involved, but before the benchmark I override the built-in Math.random with a seeded random number generator (Math.random = () => SRandom.next()) and re-seed it each run.

Below is an output of when I run the benchmark:

(#1) Average tick MS: 4.773744156002999
(#2) Average tick MS: 3.103633259952068
(#3) Average tick MS: 3.431657537043095
(#4) Average tick MS: 3.3931038970351217
(#5) Average tick MS: 3.557662303030491
(#6) Average tick MS: 3.6041946840286254
(#7) Average tick MS: 3.570515029013157
(#8) Average tick MS: 3.8610670589804648
(#9) Average tick MS: 3.758602159976959
(#10) Average tick MS: 3.6722980710268023

I am not sure why the first run takes so much longer, but excluding that run there is a deviation of 24% between run 2 and run 8.

What can I do to make the benchmark more accurate? Or is this a fundamental limitation of Javascript.