Thursday, April 3, 2014

Using blktrace, blkparser and btt to analyze block device performance in Linux

Collect blktrace
( Caution - it will increase load on system)

mount -t debugfs debugfs /sys/kernel/debug
df /sys/kernel/debug
cd /var/tmp
mkdir blktrace_data
cd blktrace_data
# It will create one file per device per CPU
blktrace -d /dev/sdbs /dev/sdq  # dump binary for one of more disk
blkrawverify sdbs  # it will create sdbs.verify.out file
less sdbs.verify.out
grep invalid  sdbs.verify.out

For many devices - use blkparser to combine blktrace data and then use btt to create report.

# combine all the files into one binary  time-ordered stream of traces
blkparse -i sdbs -d bp.sdbs.bin # .blktrace.* not required
btt -A -i bp.sdbs.bin > bp.sdbs.txt
less bp.sdbs.txt


blkparse sdbs > bp.sdbs.txt  # text file
less bp.sdbs.txt


# iosatt like output
btt -I bp.sdbs.iostat -i bp.sdbs.bin
less bp.sdbs.iostat


Interpreting Information


Note: All time is mili seconds Q2Q — time between requests sent to the block layer
Q2G — how long it takes from the time a block I/O is queued to the time it gets a request allocated for it
G2I — how long it takes from the time a request is allocated to the time it is Inserted into the device's queue
Q2M — how long it takes from the time a block I/O is queued to the time it gets merged with an existing request
I2D — how long it takes from the time a request is inserted into the device's queue to the time it is actually issued to the device
M2D — how long it takes from the time a block I/O is merged with an exiting request until the request is issued to the device
D2C — service time of the request by the device
Q2C — total time spent in the block layer for a request

Q------->G------------>I--------->M------------------->D----------------------------->C
|-Q time-|-Insert time-|
|--------- merge time ------------|-merge with other IO|
|----------------scheduler time time-------------------|---driver,adapter,storagetime--|

|----------------------- await time in iostat output ----------------------------------|








  • If Q2Q is much larger than Q2C, that means the application is not issuing I/O in rapid succession. hus, any performance problems you have may not be at all related to the I/O subsystem.
  • D2C is very high, then the device is taking a long time to service requests. This can indicate that the device is simply overloaded (which may be due to the fact that it is a shared resource), or it could be because the workload sent down to the device is sub-optimal.
  • If Q2G is very high, it means that there are a lot of requests queued concurrently. This could indicate that the storage is unable to keep up with the I/O load.
  • await in iostat output = Q2C = Q2I + I2D + D2C
  • Q2I + I2D == scheduler time
  • The I2D time can include a lot of apparent extra time due to plug and unplug events (not shown above) which are used to improve merging of io within the schedule sort queue
  • D2C time covers driver time, adapter time, transport time, and storage service time (and back)
  • So if D2C/Q2C is approaching to 1, it  means % time spent on storage component is high.
  • high D->C times, the underlying transport structure of storage needs to be examined, such as switch counters or maintenance interfaces for storage boxes themselves.

  • References:
    * A good hp doc
    * Redhat doc
    * Redhat article
    * btt user guide
    * blktrace user guide

    1 comment: