缓冲区和字符编码 | Node.js API 文档

🌐 Buffers and character encodings

版本历史

版本	变更
v15.7.0, v14.18.0	Introduced `base64url` encoding.
v6.4.0	Introduced `latin1` as an alias for `binary`.
v5.0.0	Removed the deprecated `raw` and `raws` encodings.

在 Buffer 与字符串之间转换时，可以指定字符编码。如果未指定字符编码，将使用 UTF-8 作为默认值。

🌐 When converting between Buffers and strings, a character encoding may be specified. If no character encoding is specified, UTF-8 will be used as the default.

import { Buffer } from 'node:buffer';

const buf = Buffer.from('hello world', 'utf8');

console.log(buf.toString('hex'));
// Prints: 68656c6c6f20776f726c64
console.log(buf.toString('base64'));
// Prints: aGVsbG8gd29ybGQ=

console.log(Buffer.from('fhqwhgads', 'utf8'));
// Prints: <Buffer 66 68 71 77 68 67 61 64 73>
console.log(Buffer.from('fhqwhgads', 'utf16le'));
// Prints: <Buffer 66 00 68 00 71 00 77 00 68 00 67 00 61 00 64 00 73 00>const { Buffer } = require('node:buffer');

const buf = Buffer.from('hello world', 'utf8');

console.log(buf.toString('hex'));
// Prints: 68656c6c6f20776f726c64
console.log(buf.toString('base64'));
// Prints: aGVsbG8gd29ybGQ=

console.log(Buffer.from('fhqwhgads', 'utf8'));
// Prints: <Buffer 66 68 71 77 68 67 61 64 73>
console.log(Buffer.from('fhqwhgads', 'utf16le'));
// Prints: <Buffer 66 00 68 00 71 00 77 00 68 00 67 00 61 00 64 00 73 00>

Node.js 缓冲区接收它们收到的所有编码字符串的大小写变体。例如，UTF-8 可以指定为 'utf8'、'UTF8' 或 'uTf8'。

🌐 Node.js buffers accept all case variations of encoding strings that they receive. For example, UTF-8 can be specified as 'utf8', 'UTF8', or 'uTf8'.

Node.js 目前支持的字符编码如下：

🌐 The character encodings currently supported by Node.js are the following:

'utf8'（别名：'utf-8'）：多字节编码的 Unicode 字符。许多网页和其他文档格式使用 UTF-8。这是默认的字符编码。当将 Buffer 解码为字符串且该字符串不完全包含有效的 UTF-8 数据时，Unicode 替换字符 U+FFFD � 将用于表示这些错误。
'utf16le'（别名：'utf-16le'）：多字节编码的 Unicode 字符。与 'utf8' 不同，字符串中的每个字符将使用 2 或 4 个字节进行编码。Node.js 仅支持 UTF-16 的小端变体。
'latin1'：Latin-1 代表 ISO-8859-1。这种字符编码仅支持从 U+0000 到 U+00FF 的 Unicode 字符。每个字符使用单字节进行编码。不在该范围内的字符会被截断，并映射到该范围内的字符。

使用上述方法之一将 Buffer 转换为字符串称为解码，而将字符串转换为 Buffer 称为编码。

🌐 Converting a Buffer into a string using one of the above is referred to as decoding, and converting a string into a Buffer is referred to as encoding.

Node.js 还支持以下二进制到文本的编码。对于二进制到文本的编码，命名约定是相反的：将 Buffer 转换为字符串通常称为编码，而将字符串转换为 Buffer 则称为解码。

🌐 Node.js also supports the following binary-to-text encodings. For binary-to-text encodings, the naming convention is reversed: Converting a Buffer into a string is typically referred to as encoding, and converting a string into a Buffer as decoding.

'base64'：Base64 编码。当从字符串创建 Buffer 时，这种编码也能正确接受 RFC 4648，第5节中指定的“URL 和文件名安全字母表”。在 base64 编码的字符串中包含的空白字符，例如空格、制表符和换行符，将被忽略。
'base64url'：按照 RFC 4648，第5节指定的 base64url 编码。在从字符串创建 Buffer 时，这种编码也能正确接受常规的 base64 编码字符串。在将 Buffer 编码为字符串时，这种编码将省略填充。
'hex'：将每个字节编码为两个十六进制字符。当解码不完全由偶数个十六进制字符组成的字符串时，可能会发生数据截断。示例如下。

还支持以下旧版字符编码：

🌐 The following legacy character encodings are also supported:

'ascii'：仅适用于 7 位 ASCII 数据。当将字符串编码为 Buffer 时，这相当于使用 'latin1'。当将 Buffer 解码为字符串时，使用此编码还会在解码为 'latin1' 之前取消每个字节的最高位。一般来说，没有必要使用此编码，因为在编码或解码仅包含 ASCII 的文本时，'utf8'（或者，如果数据已知始终为 ASCII，仅使用 'latin1'）会是更好的选择。它仅为向后兼容而提供。
'binary'：'latin1' 的别名。这个编码的名称可能会非常容易让人误解，因为这里列出的所有编码都是在字符串和二进制数据之间转换的。要在字符串和 Buffer 之间转换，通常 'utf8' 才是正确的选择。
'ucs2'、'ucs-2'：是 'utf16le' 的别名。UCS-2 曾经指 UTF-16 的一个变体，该变体不支持代码点大于 U+FFFF 的字符。在 Node.js 中，这些代码点始终受到支持。

import { Buffer } from 'node:buffer';

Buffer.from('1ag123', 'hex');
// Prints <Buffer 1a>, data truncated when first non-hexadecimal value
// ('g') encountered.

Buffer.from('1a7', 'hex');
// Prints <Buffer 1a>, data truncated when data ends in single digit ('7').

Buffer.from('1634', 'hex');
// Prints <Buffer 16 34>, all data represented.const { Buffer } = require('node:buffer');

Buffer.from('1ag123', 'hex');
// Prints <Buffer 1a>, data truncated when first non-hexadecimal value
// ('g') encountered.

Buffer.from('1a7', 'hex');
// Prints <Buffer 1a>, data truncated when data ends in single digit ('7').

Buffer.from('1634', 'hex');
// Prints <Buffer 16 34>, all data represented.

现代网页浏览器遵循 WHATWG 编码标准，该规范将 'latin1' 和 'ISO-8859-1' 都别名为 'win-1252'。这意味着，在执行类似 http.get() 的操作时，如果返回的字符集是 WHATWG 规范中列出的某一种，服务器实际上可能返回的是 'win-1252' 编码的数据，而使用 'latin1' 编码可能会错误地解码这些字符。

🌐 Modern Web browsers follow the WHATWG Encoding Standard which aliases both 'latin1' and 'ISO-8859-1' to 'win-1252'. This means that while doing something like http.get(), if the returned charset is one of those listed in the WHATWG specification it is possible that the server actually returned 'win-1252'-encoded data, and using 'latin1' encoding may incorrectly decode the characters.