cputimes_notes.txt   [plain text]


The following are additional notes on the cputimes command,


* How cputimes works

cputimes accurately measures time consumed by the kernel, idle therads
and processes, by tracking the activity of the schedular. In particular we
track on-cpu and off-cpu events for kernel therads, measuring the timestamps
at each event.


* Why cputimes?

If you are interested in how much time processes are consuming, the data 
given by "prstat" or "prstat -m" is fine. However there is no easy way to 
see kernel consumed time, which is the idea behind cputimes.


* What does it mean?

The output shows categories of threads by the sum of time, in nanoseconds.

A nanosecond is 10^-9, or 0.000000001 of a second. This program uses
nanoseconds as units, but does not have nanosecond accuracy. It would be 
reasonable to assume that this has microsecond accuracy (10^-6), so in 
practise ignore the last three digits of the times.

The sections reported are,

PROCESSES - the sum of all the process time on the CPU. 
KERNEL - the sum of the time spent in the kernel.
IDLE - the time the kernel spent in the idle thread, waiting for some work.

If your system isn't doing much, then the idle time will be quite large. If
your system is running many applications, then there may be no idle time
at all - instead most of the time appearing under processes.


* When is there a problem?

Expect to see most of the time in processes or idle, depending on how busy 
your server is. Seeing a considerable amout of time in kernel would 
definately be interesting.

The kernel generally doesn't use much CPU time, usually less than 5%. 
If it were using more, that may indicate heavy activity from an interrupt 
thread, or activity caused by DTrace. 

For example,

   # cputimes 1
   2005 Apr 27 23:49:32,
            THREADS        TIME (ns)
               IDLE         28351679
             KERNEL        436022725
            PROCESS        451304688

In this sample the kernel is using a massive amount of the CPUs, around 47%.
This sample was taken during heavy network utilisation, the time consumed 
by the TCP/IP and network driver threads (and DTrace). The "intrstat" command
could be used for further analysis of the interrupt threads responsible
for servicing the network interface.


* Problems with cputimes

The way cputimes measures schedular activity turns out to be a lot of work. 
There are many scheduling events per second where one thread steps onto a 
CPU and another leaves. It turns out that cputimes itself causes some degree 
of kernel load.

Here we run 1 cputimes,

   # cputimes 1
   2005 May 15 12:00:41,
            THREADS        TIME (ns)
             KERNEL         12621985
            PROCESS        982751579
   2005 May 15 12:00:42,
            THREADS        TIME (ns)
             KERNEL         12267577
            PROCESS        983513765
   [...]

Now a second cputimes is run at the same time,

   # cputimes 1
   2005 May 15 12:02:06,
            THREADS        TIME (ns)
             KERNEL         17366426
            PROCESS        978804165
   2005 May 15 12:02:07,
            THREADS        TIME (ns)
             KERNEL         17614829
            PROCESS        978671601
   [...]

And now a third,

   # cputimes 1
   2005 May 15 12:03:09,
            THREADS        TIME (ns)
             KERNEL         21303089
            PROCESS        974925124
   2005 May 15 12:03:10,
            THREADS        TIME (ns)
             KERNEL         21222992
            PROCESS        975152727
   [...]

Each extra cputimes is consuming an extra 4 to 5 ms of the CPU as kernel time.
Around 0.5%. This can be used as an estimate of the kernel load caused by 
running cputimes, and a similar strategy could be used to measure the kernel 
load of other DTrace scripts.

However the following CPU characteristics must be taken into consideration,

   # psrinfo -v
   Status of virtual processor 0 as of: 05/15/2005 12:06:05
     on-line since 04/30/2005 13:32:32.
     The i386 processor operates at 867 MHz,
           and has an i387 compatible floating point processor.

as well as the type of activity that was also running on the system, which
cputimes was monitoring (frequency of scheduling events).

A system with a slower CPU will use a larger proportion of kernel time to
perform the same tasks. Also, a system that is context switching more 
(switching between different processes) is likely to consume more kernel time
as well.