iotop_notes.txt   [plain text]


The following are additional notes on the iotop program,


* When using -P, how can a process exceed 100% I/O?

These percentages are based on disk time. They are in terms of a single disk.

200% could mean 2 disks @ 100%, or 4 @ 50%, or some such combination.

I could have capped it at 100% by dividing by disk count. I didn't. Disk
utilisation is an asymmetric resource (unlike CPUs, which are (mostly)
symmetric), so it's unfair to divide by all the disks capacity as an
application cannot use every disks capacity (eg, writing to a /opt disk only).

Would it be wise to report utilisation as 10% of overall capacity, if it
could mean that 1 disk was SATURATED out of ten? A value of 10% could
understate the problem.

Instead I add the utilisations and don't divide. 1 disk saturated out of 10
would be reported as 100% utilisation. This has the danger of overstating
the problem (consider all ten disks at 10% utilisation, this would also be
reported as 100%). 

Nothing is perfect when you are summarising to a single value!



* Beware of overcounting metadevices, such as SVM and Veritas.

The current version of iotop reports on anything the kernel believes to be
a block disk device. A problem happens when a metadevice contains physical
disk devices, and iotop reports on activity to both the metadevice and
the physical devices, which overcounts activity.

Consider a metadevice that contains two physical disks which are both
running at 100% utilised. iotop -P may report 300% utilisation, which is
200% for the disks + 100% for the metadevice. We'd probably want to see
a value of 200%, not 300%. Eliminating the counting of metadevices in DTrace
isn't easy (without inelegant "hardwiring" of device types), however I do
intend to find a way to fix this in future versions.