Buffer 与字符编码


Buffer 存入或取出字符串时,需要指定字符编码。

const buf = Buffer.from('hello world', 'ascii');

console.log(buf.toString('hex'));
// 输出: 68656c6c6f20776f726c64

console.log(buf.toString('base64'));
// 输出: aGVsbG8gd29ybGQ=

console.log(Buffer.from('fhqwhgads', 'ascii'));
// 输出: <Buffer 66 68 71 77 68 67 61 64 73>
console.log(Buffer.from('fhqwhgads', 'utf16le'));
// 输出: <Buffer 66 00 68 00 71 00 77 00 68 00 67 00 61 00 64 00 73 00>

Node.js 支持的字符编码有:

  • 'ascii' - 仅支持 7 位 ASCII 数据。

  • 'utf8' - 多字节编码的 Unicode 字符。

  • 'utf16le' - 2 或 4 个字节,小端序编码的 Unicode 字符。支持代理对(U+10000 至 U+10FFFF)。

  • 'ucs2' - 'utf16le' 的别名。

  • 'base64' - Base64 编码。

  • 'latin1' - 将 Buffer 编码成单字节编码的字符串。

  • 'binary' - 'latin1' 的别名。

  • 'hex' - 将每个字节编码成两个十六进制字符。

When string data is stored in or extracted out of a Buffer instance, a character encoding may be specified.

const buf = Buffer.from('hello world', 'ascii');

console.log(buf.toString('hex'));
// Prints: 68656c6c6f20776f726c64
console.log(buf.toString('base64'));
// Prints: aGVsbG8gd29ybGQ=

console.log(Buffer.from('fhqwhgads', 'ascii'));
// Prints: <Buffer 66 68 71 77 68 67 61 64 73>
console.log(Buffer.from('fhqwhgads', 'utf16le'));
// Prints: <Buffer 66 00 68 00 71 00 77 00 68 00 67 00 61 00 64 00 73 00>

The character encodings currently supported by Node.js include:

  • 'ascii' - For 7-bit ASCII data only. This encoding is fast and will strip the high bit if set.

  • 'utf8' - Multibyte encoded Unicode characters. Many web pages and other document formats use UTF-8.

  • 'utf16le' - 2 or 4 bytes, little-endian encoded Unicode characters. Surrogate pairs (U+10000 to U+10FFFF) are supported.

  • 'ucs2' - Alias of 'utf16le'.

  • 'base64' - Base64 encoding. When creating a Buffer from a string, this encoding will also correctly accept "URL and Filename Safe Alphabet" as specified in RFC4648, Section 5.

  • 'latin1' - A way of encoding the Buffer into a one-byte encoded string (as defined by the IANA in RFC1345, page 63, to be the Latin-1 supplement block and C0/C1 control codes).

  • 'binary' - Alias for 'latin1'.

  • 'hex' - Encode each byte as two hexadecimal characters.

Modern Web browsers follow the WHATWG Encoding Standard which aliases both 'latin1' and 'ISO-8859-1' to 'win-1252'. This means that while doing something like http.get(), if the returned charset is one of those listed in the WHATWG specification it is possible that the server actually returned 'win-1252'-encoded data, and using 'latin1' encoding may incorrectly decode the characters.