Issue with SHA mismatch despite matching Base64 content in JavaScript program

I’m working on a JavaScript app in NodeJs v22.11.0 with @octokit/rest 21.0.2 and crypto-js 4.2.0 and I’m encountering an issue where the text content of a file with accents (e.g., in Spanish) gets corrupted when pushed and pulled via the GitHub API. Specifically, if I push a file with the string "Quedar con la tía María", (“meet up with aunt María” in Spanish if you are curious), and then pull it back, I end up with incorrect characters in the decoded output. Depending on the decoding method used, I get one of these results:

"Quedar con la t�a Mar�a"
"Quedar con la tía MarÃa"
This issue occurs whether I use Buffer.from(repoFile.base64content, ‘base64’).toString(‘utf-8’) or atob(repoFile.base64content).

Additionally, the SHA hash calculated for the file after decoding is different from the original GitHub SHA. The SHA calculation works fine when there are no accented characters.

Here’s a minimal example to reproduce the issue:

import { Octokit } from "@octokit/rest";
import CryptoJS from 'crypto-js';

const octokit = new Octokit({ auth: 'personal-access-token' });

// Local file content with accented characters
const localFileContentString = 'Quedar con la tía María';
const localFile = { 
  path: 'Recordar.md', 
  sha: getSha(localFileContentString), 
  content: localFileContentString, 
  base64Content: btoa(localFileContentString) 
};

// Function to calculate SHA1 of file content
function getSha(fileContents) {
    const size = fileContents.length;
    const blobString = `blob ${size}${fileContents}`;
    return CryptoJS.SHA1(blobString).toString(CryptoJS.enc.Hex);
}

// Fetch the file content from GitHub repo
async function getRepoFile() {
    const existingFileResponse = await octokit.repos.getContent({
        owner: 'github-username',
        repo: 'vault-name',
        path: localFile.path
    });
    return { sha: existingFileResponse.data.sha, base64content: existingFileResponse.data.content.replace(/n/g, '') };
}

const repoFile = await getRepoFile();

console.log('EncodedRepoFile', repoFile, "n");
console.log('EncodedLocalFile', localFile, "n");

// Decode the base64 content from both the repo and local file
console.log('DecodedRepoFile', Buffer.from(repoFile.base64content, 'base64').toString());
console.log('DecodedLocalFile', Buffer.from(localFile.base64Content, 'base64').toString());

And the code output:

EncodedRepoFile {
  sha: '9fe35536cd6188e428ee04dcb559d69ecfb4d5d9',
  base64content: 'UXVlZGFyIGNvbiBsYSB0w61hIE1hcsOtYQoK'
} 

EncodedLocalFile {
  path: 'Recordar.md',
  sha: '9860966172762a56f5b3dec12d51d4b1fb1034e8',
  content: 'Quedar con la tía María',
  base64Content: 'UXVlZGFyIGNvbiBsYSB07WEgTWFy7WE='
} 

DecodedRepoFile Quedar con la tía María


DecodedLocalFile Quedar con la t�a Mar�a

I think the problem relies on how GitHub itself handles this special characters and I don’t know how to work my way around it. I’m using UFT-8 and I’ve tried changing the encoding to ISO-8859-1, getting the SHA of the corrupted string to at least check if I got the same SHA and checked through all the code that the encoding is consistent with libs like iconv-lite and chardet but none of that works.