注意:不要无视错误


发生错误时,域错误句柄不能替代关闭进程。

由于 throw 在 JavaScript 中的工作方式,几乎没有任何方法可以安全地“从中断的地方重新开始”,而不会泄漏引用或创建其他类型的未定义脆弱状态。

响应抛出的错误最安全的方法是关闭进程。 但是,在正常的 web 服务器中,可能有很多打开的连接,因为别人触发了错误而突然关闭这些连接是不合理的。

更好的方法是向触发错误的请求发送错误响应,同时让其他人在正常时间完成,并停止在该工作进程中监听新的请求。

通过这种方式,domain 的使用与集群模块齐头并进,因为当工作进程遇到错误时,主进程可以衍生新的工作进程。 对于扩展到多台机器的 Node.js 程序,终止的代理或服务仓库可以记录故障,并做出相应的反应。

例如,这不是一个好主意:

// XXX 警告!坏主意!

const d = require('node:domain').create();
d.on('error', (er) => {
  // 该错误不会使进程崩溃,但它所做的更糟!
  // 虽然我们已经阻止了进程突然重启,
  // 但如果发生这种情况,我们会泄漏大量资源。
  // 这并不比 process.on('uncaughtException') 好!
  console.log(`error, but oh well ${er.message}`);
});
d.run(() => {
  require('node:http').createServer((req, res) => {
    handleRequest(req, res);
  }).listen(PORT);
});

通过使用域的上下文,以及将我们的程序分成多个工作进程的弹性,我们可以做出更适当的反应,并以更高的安全性处理错误。

// 好多了!

const cluster = require('node:cluster');
const PORT = +process.env.PORT || 1337;

if (cluster.isPrimary) {
  // 更现实的情况是有 2 个以上的工作进程,
  // 并且可能不会将主进程和工作进程放在同一个文件中。
  //
  // 还可以对日志记录更感兴趣,
  // 并实现任何需要的自定义逻辑,
  // 来防止 DoS 攻击和其他不良行为。
  //
  // 请参阅集群文档中的选项。
  //
  // 重要的是,主进程做的很少,
  // 增加了我们对意外错误的恢复能力。

  cluster.fork();
  cluster.fork();

  cluster.on('disconnect', (worker) => {
    console.error('disconnect!');
    cluster.fork();
  });

} else {
  // 工作进程
  //
  // 这是我们放置缺陷的地方!

  const domain = require('node:domain');

  // 有关使用工作进程处理请求的更多详细信息,请参阅集群文档。
  // 它的工作原理、注意事项等。

  const server = require('node:http').createServer((req, res) => {
    const d = domain.create();
    d.on('error', (er) => {
      console.error(`error ${er.stack}`);

      // 我们处于危险境地!
      // 根据定义,发生了一些意想不到的事情,
      // 这可能是我们不想要的。
      // 现在什么都可能发生!小心点!

      try {
        // 确保我们在 30 秒内关闭
        const killtimer = setTimeout(() => {
          process.exit(1);
        }, 30000);
        // 但不要仅仅为此而保持进程开放!
        killtimer.unref();

        // 停止接受新的请求。
        server.close();

        // 让主进程知道我们已经死了。
        // 这将触发集群主进程中的“断开连接”,
        // 然后它会衍生新的工作进程。
        cluster.worker.disconnect();

        // 尝试向触发问题的请求发送错误
        res.statusCode = 500;
        res.setHeader('content-type', 'text/plain');
        res.end('Oops, there was a problem!\n');
      } catch (er2) {
        // 哦,好吧,目前我们无能为力。
        console.error(`Error sending 500! ${er2.stack}`);
      }
    });

    // 因为 req 和 res 是在此域存在之前创建的,
    // 所以我们需要显式地添加它们。
    // 请参阅下面对隐式与显式绑定的解释。
    d.add(req);
    d.add(res);

    // 现在在域中运行处理函数。
    d.run(() => {
      handleRequest(req, res);
    });
  });
  server.listen(PORT);
}

// 这部分并不重要。只是一个示例路由。
// 把花哨的应用程序逻辑放在这里。
function handleRequest(req, res) {
  switch (req.url) {
    case '/error':
      // 我们做一些异步的事情,然后...
      setTimeout(() => {
        // 哎呀!
        flerb.bark();
      }, timeout);
      break;
    default:
      res.end('ok');
  }
}

Domain error handlers are not a substitute for closing down a process when an error occurs.

By the very nature of how throw works in JavaScript, there is almost never any way to safely "pick up where it left off", without leaking references, or creating some other sort of undefined brittle state.

The safest way to respond to a thrown error is to shut down the process. Of course, in a normal web server, there may be many open connections, and it is not reasonable to abruptly shut those down because an error was triggered by someone else.

The better approach is to send an error response to the request that triggered the error, while letting the others finish in their normal time, and stop listening for new requests in that worker.

In this way, domain usage goes hand-in-hand with the cluster module, since the primary process can fork a new worker when a worker encounters an error. For Node.js programs that scale to multiple machines, the terminating proxy or service registry can take note of the failure, and react accordingly.

For example, this is not a good idea:

// XXX WARNING! BAD IDEA!

const d = require('node:domain').create();
d.on('error', (er) => {
  // The error won't crash the process, but what it does is worse!
  // Though we've prevented abrupt process restarting, we are leaking
  // a lot of resources if this ever happens.
  // This is no better than process.on('uncaughtException')!
  console.log(`error, but oh well ${er.message}`);
});
d.run(() => {
  require('node:http').createServer((req, res) => {
    handleRequest(req, res);
  }).listen(PORT);
});

By using the context of a domain, and the resilience of separating our program into multiple worker processes, we can react more appropriately, and handle errors with much greater safety.

// Much better!

const cluster = require('node:cluster');
const PORT = +process.env.PORT || 1337;

if (cluster.isPrimary) {
  // A more realistic scenario would have more than 2 workers,
  // and perhaps not put the primary and worker in the same file.
  //
  // It is also possible to get a bit fancier about logging, and
  // implement whatever custom logic is needed to prevent DoS
  // attacks and other bad behavior.
  //
  // See the options in the cluster documentation.
  //
  // The important thing is that the primary does very little,
  // increasing our resilience to unexpected errors.

  cluster.fork();
  cluster.fork();

  cluster.on('disconnect', (worker) => {
    console.error('disconnect!');
    cluster.fork();
  });

} else {
  // the worker
  //
  // This is where we put our bugs!

  const domain = require('node:domain');

  // See the cluster documentation for more details about using
  // worker processes to serve requests. How it works, caveats, etc.

  const server = require('node:http').createServer((req, res) => {
    const d = domain.create();
    d.on('error', (er) => {
      console.error(`error ${er.stack}`);

      // We're in dangerous territory!
      // By definition, something unexpected occurred,
      // which we probably didn't want.
      // Anything can happen now! Be very careful!

      try {
        // Make sure we close down within 30 seconds
        const killtimer = setTimeout(() => {
          process.exit(1);
        }, 30000);
        // But don't keep the process open just for that!
        killtimer.unref();

        // Stop taking new requests.
        server.close();

        // Let the primary know we're dead. This will trigger a
        // 'disconnect' in the cluster primary, and then it will fork
        // a new worker.
        cluster.worker.disconnect();

        // Try to send an error to the request that triggered the problem
        res.statusCode = 500;
        res.setHeader('content-type', 'text/plain');
        res.end('Oops, there was a problem!\n');
      } catch (er2) {
        // Oh well, not much we can do at this point.
        console.error(`Error sending 500! ${er2.stack}`);
      }
    });

    // Because req and res were created before this domain existed,
    // we need to explicitly add them.
    // See the explanation of implicit vs explicit binding below.
    d.add(req);
    d.add(res);

    // Now run the handler function in the domain.
    d.run(() => {
      handleRequest(req, res);
    });
  });
  server.listen(PORT);
}

// This part is not important. Just an example routing thing.
// Put fancy application logic here.
function handleRequest(req, res) {
  switch (req.url) {
    case '/error':
      // We do some async stuff, and then...
      setTimeout(() => {
        // Whoops!
        flerb.bark();
      }, timeout);
      break;
    default:
      res.end('ok');
  }
}