如何使用不同的文件系统
¥How to Work with Different Filesystems
Node.js 公开了文件系统的许多功能。但是并非所有文件系统都一样。以下是建议的最佳实践,可在使用不同的文件系统时保持代码简单和安全。
¥Node.js exposes many features of the filesystem. But not all filesystems are alike. The following are suggested best practices to keep your code simple and safe when working with different filesystems.
文件系统行为
¥Filesystem Behavior
在使用文件系统之前,你需要了解它的行为方式。不同的文件系统行为不同,具有比其他文件系统更多或更少的功能:区分大小写、不区分大小写、保留大小写、Unicode 形式保留、时间戳解析、扩展属性、inode、Unix 权限、备用数据流等。
¥Before you can work with a filesystem, you need to know how it behaves. Different filesystems behave differently and have more or less features than others: case sensitivity, case insensitivity, case preservation, Unicode form preservation, timestamp resolution, extended attributes, inodes, Unix permissions, alternate data streams etc.
请谨慎从 process.platform
推断文件系统行为。例如,不要假设因为你的程序在 Darwin 上运行,所以你正在使用不区分大小写的文件系统(HFS+),因为用户可能正在使用区分大小写的文件系统(HFSX)。同样,不要以为因为你的程序在 Linux 上运行,所以你正在使用支持 Unix 权限和 inode 的文件系统,因为你可能在特定的外部驱动器、USB 或网络驱动器上,而这些驱动器不支持。
¥Be wary of inferring filesystem behavior from process.platform
. For example,
do not assume that because your program is running on Darwin that you are
therefore working on a case-insensitive filesystem (HFS+), as the user may be
using a case-sensitive filesystem (HFSX). Similarly, do not assume that because
your program is running on Linux that you are therefore working on a filesystem
which supports Unix permissions and inodes, as you may be on a particular
external drive, USB or network drive which does not.
操作系统可能不容易推断文件系统行为,但并非一切都丢失了。你可以探测文件系统以查看其实际行为,而不是保留每个已知文件系统和行为的列表(这总是不完整的)。某些易于探测的功能的存在或不存在通常足以推断其他更难探测的功能的行为。
¥The operating system may not make it easy to infer filesystem behavior, but all is not lost. Instead of keeping a list of every known filesystem and behavior (which is always going to be incomplete), you can probe the filesystem to see how it actually behaves. The presence or absence of certain features which are easy to probe, are often enough to infer the behavior of other features which are more difficult to probe.
请记住,某些用户可能在工作树的不同路径上安装了不同的文件系统。
¥Remember that some users may have different filesystems mounted at various paths in the working tree.
避免最低公分母方法
¥Avoid a Lowest Common Denominator Approach
你可能想让你的程序像最低公分母文件系统一样运行,方法是将所有文件名标准化为大写,将所有文件名标准化为 NFC Unicode 格式,并将所有文件时间戳标准化为 1 秒解析度。这将是最低公分母方法。
¥You might be tempted to make your program act like a lowest common denominator filesystem, by normalizing all filenames to uppercase, normalizing all filenames to NFC Unicode form, and normalizing all file timestamps to say 1-second resolution. This would be the lowest common denominator approach.
不要这样做。你只能与在各方面具有完全相同的最低公分母特性的文件系统进行安全交互。你将无法以用户期望的方式使用更高级的文件系统,并且会遇到文件名或时间戳冲突。你肯定会通过一系列复杂的依赖事件丢失和损坏用户数据,并且会产生难以解决甚至无法解决的错误。
¥Do not do this. You would only be able to interact safely with a filesystem which has the exact same lowest common denominator characteristics in every respect. You would be unable to work with more advanced filesystems in the way that users expect, and you would run into filename or timestamp collisions. You would most certainly lose and corrupt user data through a series of complicated dependent events, and you would create bugs that would be difficult if not impossible to solve.
当你以后需要支持只有 2 秒或 24 小时时间戳解析度的文件系统时会发生什么?当 Unicode 标准发展到包含略有不同的规范化算法时会发生什么(就像过去发生过的那样)?
¥What happens when you later need to support a filesystem that only has 2-second or 24-hour timestamp resolution? What happens when the Unicode standard advances to include a slightly different normalization algorithm (as has happened in the past)?
最低公分母方法倾向于尝试仅使用 "portable" 系统调用来创建可移植程序。这导致程序存在漏洞且实际上不可移植。
¥A lowest common denominator approach would tend to try to create a portable program by using only "portable" system calls. This leads to programs that are leaky and not in fact portable.
采用超集方法
¥Adopt a Superset Approach
通过采用超集方法充分利用你支持的每个平台。例如,可移植备份程序应在 Windows 系统之间正确同步 btimes(文件或文件夹的创建时间),并且不应破坏或更改 btimes,即使 Linux 系统不支持 btimes。相同的可移植备份程序应该在 Linux 系统之间正确同步 Unix 权限,并且不应破坏或更改 Unix 权限,即使 Windows 系统不支持 Unix 权限。
¥Make the best use of each platform you support by adopting a superset approach. For example, a portable backup program should sync btimes (the created time of a file or folder) correctly between Windows systems, and should not destroy or alter btimes, even though btimes are not supported on Linux systems. The same portable backup program should sync Unix permissions correctly between Linux systems, and should not destroy or alter Unix permissions, even though Unix permissions are not supported on Windows systems.
通过让你的程序像更高级的文件系统一样运行来处理不同的文件系统。支持所有可能功能的超集:区分大小写、保留大小写、区分 Unicode 形式、Unicode 形式保留、Unix 权限、高解析度纳秒时间戳、扩展属性等。
¥Handle different filesystems by making your program act like a more advanced filesystem. Support a superset of all possible features: case-sensitivity, case-preservation, Unicode form sensitivity, Unicode form preservation, Unix permissions, high-resolution nanosecond timestamps, extended attributes etc.
一旦你的程序中具有大小写保留功能,如果你需要与不区分大小写的文件系统进行交互,你始终可以实现不区分大小写。但如果你在程序中放弃大小写保留,则无法安全地与保留大小写的文件系统交互。Unicode 形式保存和时间戳解析保存也是如此。
¥Once you have case-preservation in your program, you can always implement case-insensitivity if you need to interact with a case-insensitive filesystem. But if you forego case-preservation in your program, you cannot interact safely with a case-preserving filesystem. The same is true for Unicode form preservation and timestamp resolution preservation.
如果文件系统为你提供了大小写混合的文件名,则将文件名保留为给定的准确大小写。如果文件系统为你提供了混合 Unicode 形式或 NFC 或 NFD(或 NFKC 或 NFKD)的文件名,则将文件名保留为给定的准确字节序列。如果文件系统为你提供了毫秒时间戳,则将时间戳保留为毫秒解析度。
¥If a filesystem provides you with a filename in a mix of lowercase and uppercase, then keep the filename in the exact case given. If a filesystem provides you with a filename in mixed Unicode form or NFC or NFD (or NFKC or NFKD), then keep the filename in the exact byte sequence given. If a filesystem provides you with a millisecond timestamp, then keep the timestamp in millisecond resolution.
当你使用较小的文件系统时,你始终可以适当地进行下采样,并使用比较函数,这是程序正在运行的文件系统的行为所要求的。如果你知道文件系统不支持 Unix 权限,那么你不应该期望读取你编写的相同 Unix 权限。如果你知道文件系统不保留大小写,那么当你的程序创建 abc
时,你应该准备在目录列表中看到 ABC
。但如果你知道文件系统确实保留了大小写,那么在检测文件重命名或文件系统是否区分大小写时,你应该将 ABC
视为与 abc
不同的文件名。
¥When you work with a lesser filesystem, you can always downsample appropriately,
with comparison functions as required by the behavior of the filesystem on which
your program is running. If you know that the filesystem does not support Unix
permissions, then you should not expect to read the same Unix permissions you
write. If you know that the filesystem does not preserve case, then you should
be prepared to see ABC
in a directory listing when your program creates abc
.
But if you know that the filesystem does preserve case, then you should consider
ABC
to be a different filename to abc
, when detecting file renames or if the
filesystem is case-sensitive.
大小写保存
¥Case Preservation
你可以创建一个名为 test/abc
的目录,有时会惊讶地发现 fs.readdir('test')
返回 ['ABC']
。这不是 Node 中的错误。Node 返回文件系统存储的文件名,但并非所有文件系统都支持大小写保留。某些文件系统将所有文件名转换为大写(或小写)。
¥You may create a directory called test/abc
and be surprised to see sometimes
that fs.readdir('test')
returns ['ABC']
. This is not a bug in Node. Node
returns the filename as the filesystem stores it, and not all filesystems
support case-preservation. Some filesystems convert all filenames to uppercase
(or lowercase).
Unicode 形式保存
¥Unicode Form Preservation
大小写保存和 Unicode 形式保存是类似的概念。要理解为什么应该保留 Unicode 形式,请确保你首先了解为什么应该保留大小写。如果理解正确,Unicode 形式保存同样简单。
¥Case preservation and Unicode form preservation are similar concepts. To understand why Unicode form should be preserved , make sure that you first understand why case should be preserved. Unicode form preservation is just as simple when understood correctly.
Unicode 可以使用几种不同的字节序列对相同的字符进行编码。几个字符串可能看起来相同,但具有不同的字节序列。使用 UTF-8 字符串时,请注意你的期望与 Unicode 的工作方式一致。正如你不会期望所有 UTF-8 字符都编码为单个字节一样,你也不应该期望几个在人眼看来相同的 UTF-8 字符串具有相同的字节表示。这可能是你对 ASCII 的期望,但不是 UTF-8。
¥Unicode can encode the same characters using several different byte sequences. Several strings may look the same, but have different byte sequences. When working with UTF-8 strings, be careful that your expectations are in line with how Unicode works. Just as you would not expect all UTF-8 characters to encode to a single byte, you should not expect several UTF-8 strings that look the same to the human eye to have the same byte representation. This may be an expectation that you can have of ASCII, but not of UTF-8.
你可以创建一个名为 test/café
的目录(NFC Unicode 形式,字节序列为 <63 61 66 c3 a9>
和 string.length === 5
),有时会惊讶地发现 fs.readdir('test')
返回 ['café']
(NFD Unicode 形式,字节序列为 <63 61 66 65 cc 81>
和 string.length === 6
)。这不是 Node 中的错误。Node.js 返回文件系统存储的文件名,但并非所有文件系统都支持 Unicode 格式保存。
¥You may create a directory called test/café
(NFC Unicode form with byte
sequence <63 61 66 c3 a9>
and string.length === 5
) and be surprised to see
sometimes that fs.readdir('test')
returns ['café']
(NFD Unicode form with
byte sequence <63 61 66 65 cc 81>
and string.length === 6
). This is not a
bug in Node. Node.js returns the filename as the filesystem stores it, and not
all filesystems support Unicode form preservation.
例如,HFS+ 会将所有文件名规范化为几乎总是与 NFD 格式相同的格式。不要期望 HFS+ 的行为与 NTFS 或 EXT4 相同,反之亦然。不要尝试通过规范化永久更改数据,因为这是一种有漏洞的抽象,可以掩盖文件系统之间的 Unicode 差异。这会产生问题而不会解决任何问题。相反,保留 Unicode 形式并仅将规范化用作比较函数。
¥HFS+, for example, will normalize all filenames to a form almost always the same as NFD form. Do not expect HFS+ to behave the same as NTFS or EXT4 and vice-versa. Do not try to change data permanently through normalization as a leaky abstraction to paper over Unicode differences between filesystems. This would create problems without solving any. Rather, preserve Unicode form and use normalization as a comparison function only.
Unicode 形式不敏感
¥Unicode Form Insensitivity
Unicode 形式不敏感和 Unicode 形式保存是两种不同的文件系统行为,经常被混淆。正如在存储和传输文件名时将文件名永久规范化为大写有时会错误地实现不区分大小写一样,在存储和传输文件名时将文件名永久规范化为某种 Unicode 格式(在 HFS+ 的情况下为 NFD)有时会错误地实现不区分 Unicode 格式。通过仅使用 Unicode 规范化进行比较,可以实现 Unicode 形式不敏感,而且效果更好,同时又不牺牲 Unicode 形式保存。
¥Unicode form insensitivity and Unicode form preservation are two different filesystem behaviors often mistaken for each other. Just as case-insensitivity has sometimes been incorrectly implemented by permanently normalizing filenames to uppercase when storing and transmitting filenames, so Unicode form insensitivity has sometimes been incorrectly implemented by permanently normalizing filenames to a certain Unicode form (NFD in the case of HFS+) when storing and transmitting filenames. It is possible and much better to implement Unicode form insensitivity without sacrificing Unicode form preservation, by using Unicode normalization for comparison only.
比较不同的 Unicode 形式
¥Comparing Different Unicode Forms
Node.js 提供 string.normalize('NFC' / 'NFD')
,你可以使用它将 UTF-8 字符串规范化为 NFC 或 NFD。你永远不应该存储此函数的输出,而应该只将其用作比较函数的一部分,以测试两个 UTF-8 字符串在用户看来是否相同。
¥Node.js provides string.normalize('NFC' / 'NFD')
which you can use to normalize a
UTF-8 string to either NFC or NFD. You should never store the output from this
function but only use it as part of a comparison function to test whether two
UTF-8 strings would look the same to the user.
你可以使用 string1.normalize('NFC') === string2.normalize('NFC')
或 string1.normalize('NFD') === string2.normalize('NFD')
作为比较函数。使用哪种形式并不重要。
¥You can use string1.normalize('NFC') === string2.normalize('NFC')
or
string1.normalize('NFD') === string2.normalize('NFD')
as your comparison
function. Which form you use does not matter.
规范化速度很快,但你可能希望使用缓存作为比较函数的输入,以避免多次规范化相同的字符串。如果缓存中不存在字符串,则对其进行规范化并缓存。注意不要存储或持久化缓存,仅将其用作缓存。
¥Normalization is fast but you may want to use a cache as input to your comparison function to avoid normalizing the same string many times over. If the string is not present in the cache then normalize it and cache it. Be careful not to store or persist the cache, use it only as a cache.
请注意,使用 normalize()
要求你的 Node.js 版本包含 ICU(否则 normalize()
将只返回原始字符串)。如果你从网站下载最新版本的 Node.js,那么它将包含 ICU。
¥Note that using normalize()
requires that your version of Node.js include ICU
(otherwise normalize()
will just return the original string). If you download
the latest version of Node.js from the website then it will include ICU.
时间戳解析
¥Timestamp Resolution
你可以将文件的 mtime
(修改时间)设置为 1444291759414
(毫秒解析度),有时会惊讶地发现 fs.stat
将新的修改时间返回为 1444291759000
(1 秒解析度)或 1444291758000
(2 秒解析度)。这不是 Node 中的错误。Node.js 返回文件系统存储的时间戳,但并非所有文件系统都支持纳秒、毫秒或 1 秒的时间戳解析度。某些文件系统甚至对 atime 时间戳的解析度非常粗略,例如某些 FAT 文件系统的解析度为 24 小时。
¥You may set the mtime
(the modified time) of a file to 1444291759414
(millisecond resolution) and be surprised to see sometimes that fs.stat
returns the new mtime as 1444291759000
(1-second resolution) or
1444291758000
(2-second resolution). This is not a bug in Node. Node.js returns
the timestamp as the filesystem stores it, and not all filesystems support
nanosecond, millisecond or 1-second timestamp resolution. Some filesystems even
have very coarse resolution for the atime timestamp in particular, e.g. 24 hours
for some FAT filesystems.
不要通过规范化破坏文件名和时间戳
¥Do Not Corrupt Filenames and Timestamps Through Normalization
文件名和时间戳是用户数据。正如你永远不会自动将用户文件数据重写为大写或将 CRLF
规范化为 LF
行尾一样,你也永远不应通过大小写/Unicode 格式/时间戳规范化来更改、干扰或破坏文件名或时间戳。规范化应该只用于比较,而不能用于更改数据。
¥Filenames and timestamps are user data. Just as you would never automatically
rewrite user file data to uppercase the data or normalize CRLF
to LF
line-endings, so you should never change, interfere or corrupt filenames or
timestamps through case / Unicode form / timestamp normalization. Normalization
should only ever be used for comparison, never for altering data.
规范化实际上是一种有损哈希码。你可以使用它来测试某些类型的等效性(例如,即使几个字符串具有不同的字节序列,它们是否看起来相同),但你永远不能使用它来替代实际数据。你的程序应按原样传递文件名和时间戳数据。
¥Normalization is effectively a lossy hash code. You can use it to test for certain kinds of equivalence (e.g. do several strings look the same even though they have different byte sequences) but you can never use it as a substitute for the actual data. Your program should pass on filename and timestamp data as is.
你的程序可以以 NFC(或它喜欢的 Unicode 形式的任何组合)或小写或大写文件名或 2 秒解析度时间戳创建新数据,但你的程序不应通过强加大小写/Unicode 形式/时间戳规范化来破坏现有用户数据。相反,采用超集方法并在程序中保留大小写、Unicode 形式和时间戳解析。这样,你将能够安全地与执行相同操作的文件系统进行交互。
¥Your program can create new data in NFC (or in any combination of Unicode form it prefers) or with a lowercase or uppercase filename, or with a 2-second resolution timestamp, but your program should not corrupt existing user data by imposing case / Unicode form / timestamp normalization. Rather, adopt a superset approach and preserve case, Unicode form and timestamp resolution in your program. That way, you will be able to interact safely with filesystems which do the same.
适当使用规范化比较函数
¥Use Normalization Comparison Functions Appropriately
确保你适当地使用大小写/Unicode 格式/时间戳比较函数。如果你正在使用区分大小写的文件系统,请不要使用不区分大小写的文件名比较函数。如果你正在使用 Unicode 形式敏感的文件系统(例如,NTFS 和大多数 Linux 文件系统,它们同时保留 NFC 和 NFD 或混合 Unicode 形式),请不要使用 Unicode 形式不敏感的比较函数。如果你正在使用纳秒时间戳解析度文件系统,请不要以 2 秒的解析度比较时间戳。
¥Make sure that you use case / Unicode form / timestamp comparison functions appropriately. Do not use a case-insensitive filename comparison function if you are working on a case-sensitive filesystem. Do not use a Unicode form insensitive comparison function if you are working on a Unicode form sensitive filesystem (e.g. NTFS and most Linux filesystems which preserve both NFC and NFD or mixed Unicode forms). Do not compare timestamps at 2-second resolution if you are working on a nanosecond timestamp resolution filesystem.
为比较函数中的细微差异做好准备
¥Be Prepared for Slight Differences in Comparison Functions
注意你的比较函数与文件系统的比较函数相匹配(或者如果可能的话探测文件系统以查看它实际上如何比较)。例如,不区分大小写比简单的 toLowerCase()
比较更复杂。事实上,toUpperCase()
通常比 toLowerCase()
更好(因为它以不同的方式处理某些外语字符)。但更好的方法是探测文件系统,因为每个文件系统都有自己的大小写比较表。
¥Be careful that your comparison functions match those of the filesystem (or
probe the filesystem if possible to see how it would actually compare).
Case-insensitivity for example is more complex than a simple toLowerCase()
comparison. In fact, toUpperCase()
is usually better than toLowerCase()
(since it handles certain foreign language characters differently). But better
still would be to probe the filesystem since every filesystem has its own case
comparison table baked in.
例如,Apple 的 HFS+ 将文件名规范化为 NFD 形式,但此 NFD 形式实际上是当前 NFD 形式的旧版本,有时可能与最新 Unicode 标准的 NFD 形式略有不同。不要期望 HFS+ NFD 始终与 Unicode NFD 完全相同。
¥As an example, Apple's HFS+ normalizes filenames to NFD form but this NFD form is actually an older version of the current NFD form and may sometimes be slightly different from the latest Unicode standard's NFD form. Do not expect HFS+ NFD to be exactly the same as Unicode NFD all the time.