跳到内容

火焰图

🌐 Flame Graphs

火焰图有什么用?

🌐 What's a flame graph useful for?

火焰图是一种可视化显示函数中 CPU 时间消耗的方式。它们可以帮助你找出在同步操作上花费过多时间的地方。

🌐 Flame graphs are a way of visualizing CPU time spent in functions. They can help you pin down where you spend too much time doing synchronous operations.

如何创建火焰图

🌐 How to create a flame graph

你可能听说过为 Node.js 创建 flame 图很困难,但那不再是真的了。现在制作 flame 图不再需要 Solaris 虚拟机!

🌐 You might have heard creating a flame graph for Node.js is difficult, but that's not true (anymore). Solaris vms are no longer needed for flame graphs!

火焰图是由 perf 输出生成的,它并不是针对特定 node 的工具。虽然这是可视化 CPU 时间消耗的最强大方式,但在 Node.js 8 及以上版本中,它可能会遇到 JavaScript 代码优化的问题。请参见下方的 perf 输出问题 部分。

🌐 Flame graphs are generated from perf output, which is not a node-specific tool. While it's the most powerful way to visualize CPU time spent, it may have issues with how JavaScript code is optimized in Node.js 8 and above. See perf output issues section below.

使用预装工具

🌐 Use a pre-packaged tool

如果你想要一个可以在本地生成 flame 图的一步方法,可以试试 0x

🌐 If you want a single step that produces a flame graph locally, try 0x

要诊断生产部署,请阅读这些说明:0x 生产服务器

🌐 For diagnosing production deployments, read these notes: 0x production servers.

使用系统性能工具创建火焰图

🌐 Create a flame graph with system perf tools

本指南的目的是展示创建火焰图所涉及的步骤,并让你控制每个步骤。

🌐 The purpose of this guide is to show the steps involved in creating a flame graph and keep you in control of each step.

如果你想更好地理解每个步骤,请查看后面的部分,我们将在其中详细介绍。

🌐 If you want to understand each step better, take a look at the sections that follow where we go into more detail.

现在让我们开始工作。

🌐 Now let's get to work.

  1. 安装 perf(如果尚未安装,通常可以通过 linux-tools-common 软件包获取)
  2. 试着运行 perf —— 它可能会抱怨缺少内核模块,也请把它们安装上
  3. 在启用 perf 的情况下运行 node(有关针对 Node.js 版本的具体提示,请参见 perf 输出问题
perf record -e cycles:u -g -- node --perf-basic-prof --interpreted-frames-native-stack app.js
  1. 除非警告说由于缺少软件包而无法运行 perf,否则可以忽略这些警告;你可能会收到一些关于无法访问内核模块示例的警告,但你本来也不需要这些示例。
  2. 运行 perf script > perfs.out 来生成你马上将要可视化的数据文件。进行一些清理 会让图表更易读
  3. 克隆 Brendan Gregg 的 FlameGraph 工具:https://github.com/brendangregg/FlameGraph
  4. 运行 cat perfs.out | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl --colors=js > profile.svg

现在在你喜欢的浏览器中打开 flame 图文件,观看它的运行。它采用了颜色编码,因此你可以先关注颜色最深的橙色条。这些条很可能代表 CPU 密集的函数。

🌐 Now open the flame graph file in your favorite browser and watch it burn. It's color-coded so you can focus on the most saturated orange bars first. They're likely to represent CPU heavy functions.

值得一提的是 - 如果你点击火焰图上的某个元素,它会放大你点击的部分。

🌐 Worth mentioning - if you click an element of a flame graph a it will zoom-in on the section you clicked.

使用 perf 对正在运行的进程进行采样

🌐 Using perf to sample a running process

这非常适合从已经在运行的进程中记录火焰图数据,而不会中断它。想象一下一个生产过程出现难以重现的问题。

🌐 This is great for recording flame graph data from an already running process that you don't want to interrupt. Imagine a production process with a hard to reproduce issue.

perf record -F99 -p `pgrep -n node` -g -- sleep 3

等等,那个 sleep 3 是做什么的?它是为了让 perf 继续运行 - 尽管 -p 选项指向了不同的 PID,该命令仍需要在某个进程上执行并以此进程结束。perf 会随着你传给它的命令的生命周期而运行,无论你是否真的在分析该命令。sleep 3 确保 perf 运行 3 秒钟。

🌐 Wait, what is that sleep 3 for? It's there to keep the perf running - despite -p option pointing to a different pid, the command needs to be executed on a process and end with it. perf runs for the life of the command you pass to it, whether or not you're actually profiling that command. sleep 3 ensures that perf runs for 3 seconds.

-F(采样频率)为什么设置为 99?这是一个合理的默认值。如果需要,你可以调整。 -F99 告诉 perf 每秒采样 99 次,要想更精确可以增加这个值。较低的值会产生更少的输出,但精度较低。你需要的精度取决于你的 CPU 密集型函数实际运行的时间。如果你是为了找出明显的性能下降原因,99 帧每秒应该绰绰有余。

🌐 Why is -F (profiling frequency) set to 99? It's a reasonable default. You can adjust if you want. -F99 tells perf to take 99 samples per second, for more precision increase the value. Lower values should produce less output with less precise results. The precision you need depends on how long your CPU intensive functions really run. If you're looking for the reason for a noticeable slowdown, 99 frames per second should be more than enough.

获得 3 秒的性能记录后,继续使用上面的最后两个步骤生成火焰图。

🌐 After you get that 3 second perf record, proceed with generating the flame graph with the last two steps from above.

筛选出 Node.js 内部函数

🌐 Filtering out Node.js internal functions

通常,你只想查看调用的性能,因此过滤掉 Node.js 和 V8 的内部函数可以使图表更容易阅读。你可以使用以下方式清理你的 perf 文件:

🌐 Usually, you just want to look at the performance of your calls, so filtering out Node.js and V8 internal functions can make the graph much easier to read. You can clean up your perf file with:

sed -i -r \
  -e "/( __libc_start| LazyCompile | v8::internal::| Builtin:| Stub:| LoadIC:|\[unknown\]| LoadPolymorphicIC:)/d" \
  -e 's/ LazyCompile:[*~]?/ /' \
  perfs.out

如果你阅读你的火焰图时觉得奇怪,好像关键函数占用了大部分时间,但缺少了某些东西,尝试在不使用过滤器的情况下生成火焰图 - 也许你遇到了 Node.js 本身的一个罕见问题。

🌐 If you read your flame graph and it seems odd, as if something is missing in the key function taking up most time, try generating your flame graph without the filters - maybe you got a rare case of an issue with Node.js itself.

Node.js 的性能分析选项

🌐 Node.js's profiling options

--perf-basic-prof-only-functions--perf-basic-prof 是调试你的 JavaScript 代码时有用的两个选项。其他选项用于对 Node.js 本身进行性能分析,这超出了本指南的范围。

--perf-basic-prof-only-functions 产生的输出较少,因此它是开销最小的选项。

我为什么完全需要它们?

🌐 Why do I need them at all?

嗯,没有这些选项,你仍然会得到一个火焰图,但大多数条形会标记为 v8::Function::Call

🌐 Well, without these options, you'll still get a flame graph, but with most bars labeled v8::Function::Call.

perf 输出问题

🌐 perf output issues

Node.js 8.x V8 管道更改

🌐 Node.js 8.x V8 pipeline changes

Node.js 8.x 及以上版本在 V8 引擎的 JavaScript 编译管道中引入了新的优化,这有时会导致函数名称/引用在性能分析中不可达。(这被称为 Turbofan)

🌐 Node.js 8.x and above ships with new optimizations to the JavaScript compilation pipeline in the V8 engine which makes function names/references unreachable for perf sometimes. (It's called Turbofan)

结果是你可能无法在火焰图中正确获得函数名称。

🌐 The result is you might not get your function names right in the flame graph.

你会注意到 ByteCodeHandler: 出现在原本应该有函数名的地方。

🌐 You'll notice ByteCodeHandler: where you'd expect function names.

0x 内置了一些缓解措施。

详情请参阅:

🌐 For details see:

Node.js 10

🌐 Node.js 10+

Node.js 10.x 通过使用 --interpreted-frames-native-stack 标志解决了 Turbofan 的问题。

🌐 Node.js 10.x addresses the issue with Turbofan using the --interpreted-frames-native-stack flag.

运行 node --interpreted-frames-native-stack --perf-basic-prof-only-functions 可以在火焰图中获取函数名称,无论 V8 使用哪个管道来编译你的 JavaScript。

🌐 Run node --interpreted-frames-native-stack --perf-basic-prof-only-functions to get function names in the flame graph regardless of which pipeline V8 used to compile your JavaScript.

火焰图中的破损标签

🌐 Broken labels in the flame graph

如果你看到的标签看起来像这样

🌐 If you're seeing labels looking like this

node`_ZN2v88internal11interpreter17BytecodeGenerator15VisitStatementsEPNS0_8ZoneListIPNS0_9StatementEEE

这意味着你使用的 Linux perf 并没有启用符号解析支持,例如可以参见 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1396654

🌐 it means the Linux perf you're using was not compiled with demangle support, see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1396654 for example

示例

🌐 Examples

通过火焰图练习亲自练习捕获火焰图吧!

🌐 Practice capturing flame graphs yourself with a flame graph exercise!