I’m using btoa
to encode a Uint8Array
to a base 64 string. And I hit a strange case. This works:
export function toBase64(data: Uint8Array): string {
return btoa(String.fromCharCode(...data))
}
Whereas this does not (btoa
will often complain about an unknown character):
export function toBase64(data: Uint8Array): string {
return btoa(new TextDecoder('latin1').decode(data))
}
Question
What encoding should I use with TextDecoder
to produce the same string as via fromCharCode
?
Background
Peacing together various documentation the following should be true:
btoa
expects alatin1
encodingString.fromCharCode
will convert individual integers to the respectiveutf16
character- for the first 256 characters
latin1
andutf16
overlap
Test
Doing some experiments it is clear the two approaches yield different strings. With this setup:
const array = Array.from({ length: 256 }, (_, i) => I);
const d = new Uint8Array(array);
Running:
String.fromCharCode(...d)
will yield
x00x01x02x03x04x05x06x07btnvfrx0Ex0Fx10x11x12x13x14x15x16x17x18x19x1Ax1Bx1Cx1Dx1Ex1F !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~x7Fx80x81x82x83x84x85x86x87x88x89x8Ax8Bx8Cx8Dx8Ex8Fx90x91x92x93x94x95x96x97x98x99x9Ax9Bx9Cx9Dx9Ex9F ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Whereas running:
(new TextDecoder('latin1')).decode(d)
will yield
x00x01x02x03x04x05x06x07btnvfrx0Ex0Fx10x11x12x13x14x15x16x17x18x19x1Ax1Bx1Cx1Dx1Ex1F !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~x7F€x81‚ƒ„…†‡ˆ‰Š‹Œx8DŽx8Fx90‘’“”•–—˜™š›œx9DžŸ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Where they substantially differ in the range 7F-9F
(copied below for clearity)
x7Fx80x81x82x83x84x85x86x87x88x89x8Ax8Bx8Cx8Dx8Ex8Fx90x91x92x93x94x95x96x97x98x99x9Ax9Bx9Cx9Dx9Ex9F
x7F€x81‚ƒ„…†‡ˆ‰Š‹Œx8DŽx8Fx90‘’“”•–—˜™š›œx9DžŸ