火焰图
¥Flame Graphs
火焰图有什么用?
¥What's a flame graph useful for?
火焰图是一种可视化函数所花费的 CPU 时间的方式。它们可以帮助你确定在执行同步操作时花费了太多时间的地方。
¥Flame graphs are a way of visualizing CPU time spent in functions. They can help you pin down where you spend too much time doing synchronous operations.
如何创建火焰图
¥How to create a flame graph
你可能听说过为 Node.js 创建火焰图很困难,但事实并非如此(不再如此)。火焰图不再需要 Solaris vms!
¥You might have heard creating a flame graph for Node.js is difficult, but that's not true (anymore). Solaris vms are no longer needed for flame graphs!
火焰图是从 perf
输出生成的,这不是特定于 node 的工具。虽然它是可视化 CPU 时间花费的最有效方法,但它可能与 Node.js 8 及更高版本中 JavaScript 代码的优化方式存在问题。请参阅下面的 perf 输出问题 部分。
¥Flame graphs are generated from perf
output, which is not a node-specific tool. While it's the most powerful way to visualize CPU time spent, it may have issues with how JavaScript code is optimized in Node.js 8 and above. See perf output issues section below.
使用预打包工具
¥Use a pre-packaged tool
如果你想要一个在本地生成火焰图的单个步骤,请尝试 0x
¥If you want a single step that produces a flame graph locally, try 0x
要诊断生产部署,请阅读以下说明:0x 生产服务器。
¥For diagnosing production deployments, read these notes: 0x production servers.
使用系统性能工具创建火焰图
¥Create a flame graph with system perf tools
本指南的目的是展示创建火焰图所涉及的步骤,并让你控制每个步骤。
¥The purpose of this guide is to show the steps involved in creating a flame graph and keep you in control of each step.
如果你想更好地理解每个步骤,请查看后面的部分,我们将在其中详细介绍。
¥If you want to understand each step better, take a look at the sections that follow where we go into more detail.
现在让我们开始工作。
¥Now let's get to work.
-
安装
perf
(如果尚未安装,通常可通过 linux-tools-common 包获得)¥Install
perf
(usually available through the linux-tools-common package if not already installed) -
尝试运行
perf
- 它可能会抗诉缺少内核模块,也安装它们¥Try running
perf
- it might complain about missing kernel modules, install them too -
在启用 perf 的情况下运行 node(有关 Node.js 版本的具体提示,请参阅 perf 输出问题)
¥Run node with perf enabled (see perf output issues for tips specific to Node.js versions)
perf record -e cycles:u -g -- node --perf-basic-prof app.js
-
除非警告说由于缺少软件包而无法运行 perf,否则请忽略警告;你可能会收到一些关于无法访问你不需要的内核模块示例的警告。
¥Disregard warnings unless they're saying you can't run perf due to missing packages; you may get some warnings about not being able to access kernel module samples which you're not after anyway.
-
运行
perf script > perfs.out
以生成你稍后将看到的数据文件。对于 应用一些清理 来说,它对于获得更易读的图表很有用¥Run
perf script > perfs.out
to generate the data file you'll visualize in a moment. It's useful to apply some cleanup for a more readable graph -
如果尚未安装
npm i -g stackvis
,请安装 stackvis¥Install stackvis if not yet installed
npm i -g stackvis
-
运行
stackvis perf < perfs.out > flamegraph.htm
¥Run
stackvis perf < perfs.out > flamegraph.htm
现在在你最喜欢的浏览器中打开火焰图文件并监视它燃烧。它采用颜色编码,因此你可以首先关注最饱和的橙色条。它们很可能代表 CPU 密集型函数。
¥Now open the flame graph file in your favorite browser and watch it burn. It's color-coded so you can focus on the most saturated orange bars first. They're likely to represent CPU heavy functions.
值得一提 - 如果你单击火焰图的元素,其周围环境的放大图将显示在图表上方。
¥Worth mentioning - if you click an element of a flame graph a zoom-in of its surroundings will be displayed above the graph.
使用 perf
对正在运行的进程进行采样
¥Using perf
to sample a running process
这对于记录你不想中断的已运行进程的火焰图数据非常有用。想象一个难以重现问题的生产过程。
¥This is great for recording flame graph data from an already running process that you don't want to interrupt. Imagine a production process with a hard to reproduce issue.
perf record -F99 -p `pgrep -n node` -g -- sleep 3
等等,sleep 3
是干什么用的?它的作用是保持性能运行 - 尽管 -p
选项指向不同的 pid,但命令需要在进程上执行并以此结束。perf 会在你传递给它的命令的生命周期内运行,无论你是否实际对该命令进行性能分析。sleep 3
确保 perf 运行 3 秒。
¥Wait, what is that sleep 3
for? It's there to keep the perf running - despite -p
option pointing to a different pid, the command needs to be executed on a process and end with it.
perf runs for the life of the command you pass to it, whether or not you're actually profiling that command. sleep 3
ensures that perf runs for 3 seconds.
为什么 -F
(分析频率)设置为 99?这是一个合理的默认值。你可以根据需要进行调整。-F99
告诉 perf 每秒采集 99 个样本,为了获得更高的精度,请增加该值。较低的值应该产生较少的输出,结果也不太精确。你需要的精度取决于 CPU 密集型函数实际运行的时间。如果你正在寻找明显减速的原因,每秒 99 帧应该足够了。
¥Why is -F
(profiling frequency) set to 99? It's a reasonable default. You can adjust if you want.
-F99
tells perf to take 99 samples per second, for more precision increase the value. Lower values should produce less output with less precise results. The precision you need depends on how long your CPU intensive functions really run. If you're looking for the reason for a noticeable slowdown, 99 frames per second should be more than enough.
获得 3 秒的性能记录后,继续使用上面的最后两个步骤生成火焰图。
¥After you get that 3 second perf record, proceed with generating the flame graph with the last two steps from above.
过滤掉 Node.js 内部函数
¥Filtering out Node.js internal functions
通常,你只想查看调用的性能,因此过滤掉 Node.js 和 V8 内部函数可以使图表更易于阅读。你可以使用以下方法清理你的 perf 文件:
¥Usually, you just want to look at the performance of your calls, so filtering out Node.js and V8 internal functions can make the graph much easier to read. You can clean up your perf file with:
sed -i -r \
-e "/( __libc_start| LazyCompile | v8::internal::| Builtin:| Stub:| LoadIC:|\[unknown\]| LoadPolymorphicIC:)/d" \
-e 's/ LazyCompile:[*~]?/ /' \
perfs.out
如果你阅读火焰图并且它看起来很奇怪,好像占用最多时间的关键函数中缺少某些东西,请尝试在不使用过滤器的情况下生成火焰图 - 也许你遇到了 Node.js 本身的罕见问题。
¥If you read your flame graph and it seems odd, as if something is missing in the key function taking up most time, try generating your flame graph without the filters - maybe you got a rare case of an issue with Node.js itself.
Node.js 的分析选项
¥Node.js's profiling options
--perf-basic-prof-only-functions
和 --perf-basic-prof
是两个可用于调试 JavaScript 代码的函数。其他选项用于分析 Node.js 本身,这超出了本指南的范围。
¥--perf-basic-prof-only-functions
and --perf-basic-prof
are the two that are useful for debugging your JavaScript code. Other options are used for profiling Node.js itself, which is outside the scope of this guide.
--perf-basic-prof-only-functions
产生的输出较少,因此它是开销最小的选项。
¥--perf-basic-prof-only-functions
produces less output, so it's the option with the least overhead.
我为什么需要它们?
¥Why do I need them at all?
好吧,没有这些选项,你仍然会得到一个火焰图,但大多数柱状图都标记为 v8::Function::Call
。
¥Well, without these options, you'll still get a flame graph, but with most bars labeled v8::Function::Call
.
perf
输出问题
¥perf
output issues
Node.js 8.x V8 管道更改
¥Node.js 8.x V8 pipeline changes
Node.js 8.x 及以上版本对 V8 引擎中的 JavaScript 编译管道进行了新的优化,有时会导致 perf 无法访问函数名称/引用。(它被称为 Turbofan)
¥Node.js 8.x and above ships with new optimizations to the JavaScript compilation pipeline in the V8 engine which makes function names/references unreachable for perf sometimes. (It's called Turbofan)
结果是你可能无法在火焰图中正确获得函数名称。
¥The result is you might not get your function names right in the flame graph.
你会在期望函数名称的位置注意到 ByteCodeHandler:
。
¥You'll notice ByteCodeHandler:
where you'd expect function names.
0x 对此有一些内置的缓解措施。
¥0x has some mitigations for that built in.
有关详细信息,请参阅:
¥For details see:
Node.js 10+
Node.js 10.x 使用 --interpreted-frames-native-stack
标志解决了 Turbofan 的问题。
¥Node.js 10.x addresses the issue with Turbofan using the --interpreted-frames-native-stack
flag.
无论 V8 使用哪个管道编译 JavaScript,运行 node --interpreted-frames-native-stack --perf-basic-prof-only-functions
都可以在火焰图中获取函数名称。
¥Run node --interpreted-frames-native-stack --perf-basic-prof-only-functions
to get function names in the flame graph regardless of which pipeline V8 used to compile your JavaScript.
火焰图中的损坏标签
¥Broken labels in the flame graph
如果你看到的标签看起来像这样
¥If you're seeing labels looking like this
node`_ZN2v88internal11interpreter17BytecodeGenerator15VisitStatementsEPNS0_8ZoneListIPNS0_9StatementEEE
这意味着你使用的 Linux perf 未使用 demangle 支持进行编译,例如,请参阅 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1396654
¥it means the Linux perf you're using was not compiled with demangle support, see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1396654 for example
示例
¥Examples
使用 火焰图练习 亲自练习捕获火焰图!
¥Practice capturing flame graphs yourself with a flame graph exercise!