使用字符串作为加密 API 的输入


由于历史原因,Node.js 提供的许多加密 API 都接受字符串作为输入,其中底层加密算法处理字节序列。 这些实例包括明文、密文、对称密钥、初始化向量、密码、盐、认证标签和额外的认证数据。

将字符串传给加密 API 时,请考虑以下因素。

  • 并非所有字节序列都是有效的 UTF-8 字符串。 因此,当从字符串中导出长度为 n 的字节序列时,其熵通常低于随机或伪随机 n 字节序列的熵。 例如,没有 UTF-8 字符串将导致字节序列 c0 af。 秘密密钥应该几乎完全是随机或伪随机字节序列。

  • 同样,在将随机或伪随机字节序列转换为 UTF-8 字符串时,不代表有效代码点的子序列可能会被 Unicode 替换字符 (U+FFFD) 替换。 因此,生成的 Unicode 字符串的字节表示可能不等于创建字符串的字节序列。

    const original = [0xc0, 0xaf];
    const bytesAsString = Buffer.from(original).toString('utf8');
    const stringAsBytes = Buffer.from(bytesAsString, 'utf8');
    console.log(stringAsBytes);
    // 打印 '<Buffer ef bf bd ef bf bd>'

    密码、散列函数、签名算法和密钥派生函数的输出是伪随机字节序列,不应用作 Unicode 字符串。

  • 从用户输入中获取字符串时,某些 Unicode 字符可以用多种等效方式表示,从而产生不同的字节序列。 例如,将用户密码传递给密钥派生函数(例如 PBKDF2 或 scrypt)时,密钥派生函数的结果取决于字符串是使用组合字符还是分解字符。 Node.js 不会规范化字符表示。 在将用户输入传给加密 API 之前,开发人员应考虑在用户输入上使用 String.prototype.normalize()

For historical reasons, many cryptographic APIs provided by Node.js accept strings as inputs where the underlying cryptographic algorithm works on byte sequences. These instances include plaintexts, ciphertexts, symmetric keys, initialization vectors, passphrases, salts, authentication tags, and additional authenticated data.

When passing strings to cryptographic APIs, consider the following factors.

  • Not all byte sequences are valid UTF-8 strings. Therefore, when a byte sequence of length n is derived from a string, its entropy is generally lower than the entropy of a random or pseudorandom n byte sequence. For example, no UTF-8 string will result in the byte sequence c0 af. Secret keys should almost exclusively be random or pseudorandom byte sequences.

  • Similarly, when converting random or pseudorandom byte sequences to UTF-8 strings, subsequences that do not represent valid code points may be replaced by the Unicode replacement character (U+FFFD). The byte representation of the resulting Unicode string may, therefore, not be equal to the byte sequence that the string was created from.

    const original = [0xc0, 0xaf];
    const bytesAsString = Buffer.from(original).toString('utf8');
    const stringAsBytes = Buffer.from(bytesAsString, 'utf8');
    console.log(stringAsBytes);
    // Prints '<Buffer ef bf bd ef bf bd>'.

    The outputs of ciphers, hash functions, signature algorithms, and key derivation functions are pseudorandom byte sequences and should not be used as Unicode strings.

  • When strings are obtained from user input, some Unicode characters can be represented in multiple equivalent ways that result in different byte sequences. For example, when passing a user passphrase to a key derivation function, such as PBKDF2 or scrypt, the result of the key derivation function depends on whether the string uses composed or decomposed characters. Node.js does not normalize character representations. Developers should consider using String.prototype.normalize() on user inputs before passing them to cryptographic APIs.