Lock and Context Switch Tracing with KTR on FreeBSD

Introduction

KTR(9) and ktrdump(8) are powerful tools for explaing system behavior in FreeBSD, as they allow tracing of a broad range of "interesting" kernel events. This includes scheduling decisions, lock events, and more.

The way it works is the you compile KTR support into the kernel, along with an indication of the events of interest. Masking events at compile time avoids the introduction of unnecessary cost (and therefore distortion) for tracing checks for uninteresting events. At run-time, you can select subsets of events and CPUs, and dump the results using ktrdump(8) in human-readable format.

Using the optional KTR_ALQ facility, you can have KTR write records directly to disk as they are generated, which is suitable for use when generating extremely long trace runs.

I've made a number of tweaks to KTR(9) and ktrdump(8) to make it easier to use KTR for scheduler and lock tracing; make sure you have the most recent versions of both before starting.

Configuring the Kernel

Add the following options to your kernel configuration:

options KTR
options KTR_COMPILE=(KTR_LOCK|KTR_PROC|KTR_INTR|KTR_CALLOUT|KTR_UMA|KTR_SYSC|KTR_CRITICAL)
options KTR_ENTRIES=131072

KTR_CRITICAL is no longer required in 7-CURRENT.

You can customize the set of compile-time records by tweaking the KTR_COMPILE line. Generally, you want to pick the smallest set of interest. The above captures most events of interest for analyzing the cost of locking and ithreads by instrumenting lock events, high level scheduler events, and interrupts.

Tracing at Run-Time

The "debug.ktr" sysctl tree can be used to set up tracing properties. Many of the sysctls also have tunables of the same name so they can be set prior to boot to enable early tracing. Two sysctls need to be set to start tracing: debug.ktr.mask, which sets the runtime event mask. Typically, you'll be interested in setting that mask field to the maximum set compiled in, which can be queried using debug.ktr.compile. You'll also need to set debug.ktr.cpumask to indicate which CPUs are of interest. A value like 0xff is a good starting point.

sysctl debug.ktr.cpumask=0xff
sysctl debug.ktr.mask=`sysctl -n debug.ktr.compile`

This should generate a lot of trace data.

Dumping the Trace

After running a test with the masks set to collect records of interest, suspend tracing by setting the cpumask to 0:

sysctl debug.ktr.cpumask=0

The ktrdump(8) tool dumps the trace buffer to standard output. Using a variety of flags, you can add additional optional fields to the content. Note that the records are dumped in the reverse order of generation. I find the following trace command most useful up front:

ktrdump -c -q -t | sort -n > dumpfile

This includes the CPU number, leaves out the file/line number, leaves out the header (which would get sorted poorly), and prints an absolute time stamp. It also sorts by the first field, which is the event ID, and results in an approximately chronological ordering (subject to SMP issues). You can use the "-r" parameter to get relative time stamps; be aware that they are relative to the previous event on any CPU, not necessarily the same CPU.

Some Files

20040723-ktr-dump.txt
20040804-ktr-dump.txt
20040805-ktr-dump.txt
20040805a-ktr-dump.txt
20040805b-ktr-dump.txt
20040806-ktr-dump.txt
20040806-mi_switch.ps
20040806-msleep.txt
20040811-smp-netperf.txt
20041021-dd-ar0s1-geom-ktr
20041107-netblast-crit-ktr
20050126-netblast-ktr-SMP.gz
20050126-netblast-ktr-UP.gz