WHATWG API
WHATWG URL 标准 使用比 Legacy API 使用的方法更具选择性和更细粒度的方法来选择编码字符。
¥The WHATWG URL Standard uses a more selective and fine grained approach to selecting encoded characters than that used by the Legacy API.
WHATWG 算法定义了四个 "百分比编码集",它们描述了必须进行百分比编码的字符范围:
¥The WHATWG algorithm defines four "percent-encode sets" that describe ranges of characters that must be percent-encoded:
-
C0 控制百分比编码集包括 U+0000 到 U+001F(含)范围内的代码点以及大于 U+007E (~) 的所有代码点。
¥The C0 control percent-encode set includes code points in range U+0000 to U+001F (inclusive) and all code points greater than U+007E (~).
-
片段百分比编码集包括 C0 控制百分比编码集和代码点 U+0020 SPACE、U+0022 (")、U+003C (<)、U+003E (>) 和 U+0060 (`) 。
¥The fragment percent-encode set includes the C0 control percent-encode set and code points U+0020 SPACE, U+0022 ("), U+003C (<), U+003E (>), and U+0060 (`).
-
路径百分比编码集包括 C0 控制百分比编码集和代码点 U+0020 SPACE、U+0022 (")、U+0023 (#)、U+003C (<)、U+003E (>)、 U+003F (?)、U+0060 (`)、U+007B ({), and U+007D (})。
¥The path percent-encode set includes the C0 control percent-encode set and code points U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), U+003E (>), U+003F (?), U+0060 (`), U+007B ({), and U+007D (}).
-
用户信息编码集包括路径百分比编码集和代码点 U+002F (/)、U+003A (:)、U+003B (;)、U+003D (=)、U+0040 (@)、U +005B ([) 至 U+005E(^) 和 U+007C (|)。
¥The userinfo encode set includes the path percent-encode set and code points U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+0040 (@), U+005B ([) to U+005E(^), and U+007C (|).
userinfo 百分比编码集专门用于在 URL 中编码的用户名和密码。路径百分比编码集用于大多数 URL 的路径。片段百分比编码集用于 URL 片段。C0 控制百分比编码集用于某些特定条件下的主机和路径,以及所有其他情况。
¥The userinfo percent-encode set is used exclusively for username and passwords encoded within the URL. The path percent-encode set is used for the path of most URLs. The fragment percent-encode set is used for URL fragments. The C0 control percent-encode set is used for host and path under certain specific conditions, in addition to all other cases.
当主机名中出现非 ASCII 字符时,主机名将使用 Punycode 算法进行编码。但是请注意,主机名可能同时包含 Punycode 编码字符和百分比编码字符:
¥When non-ASCII characters appear within a host name, the host name is encoded using the Punycode algorithm. Note, however, that a host name may contain both Punycode encoded and percent-encoded characters:
const myURL = new URL('https://%CF%80.example.com/foo');
console.log(myURL.href);
// Prints https://xn--1xa.example.com/foo
console.log(myURL.origin);
// Prints https://xn--1xa.example.com