Collect blktrace
( Caution - it will increase load on system)
mount -t debugfs debugfs /sys/kernel/debug
df /sys/kernel/debug
cd /var/tmp
mkdir blktrace_data
cd blktrace_data
# It will create one file per device per CPU
blktrace -d /dev/sdbs /dev/sdq # dump binary for one of more disk
blkrawverify sdbs # it will create sdbs.verify.out file
less sdbs.verify.out
grep invalid sdbs.verify.out
For many devices - use blkparser to combine blktrace data and then use btt to create report.
# combine all the files into one binary time-ordered stream of traces
blkparse -i sdbs -d bp.sdbs.bin # .blktrace.* not required
btt -A -i bp.sdbs.bin > bp.sdbs.txt
less bp.sdbs.txt
blkparse sdbs > bp.sdbs.txt # text file
less bp.sdbs.txt
# iosatt like output
btt -I bp.sdbs.iostat -i bp.sdbs.bin
less bp.sdbs.iostat
Interpreting Information
References:
* A good hp doc
* Redhat doc
* Redhat article
* btt user guide
* blktrace user guide
( Caution - it will increase load on system)
mount -t debugfs debugfs /sys/kernel/debug
df /sys/kernel/debug
cd /var/tmp
mkdir blktrace_data
cd blktrace_data
# It will create one file per device per CPU
blktrace -d /dev/sdbs /dev/sdq # dump binary for one of more disk
blkrawverify sdbs # it will create sdbs.verify.out file
less sdbs.verify.out
grep invalid sdbs.verify.out
For many devices - use blkparser to combine blktrace data and then use btt to create report.
# combine all the files into one binary time-ordered stream of traces
blkparse -i sdbs -d bp.sdbs.bin # .blktrace.* not required
btt -A -i bp.sdbs.bin > bp.sdbs.txt
less bp.sdbs.txt
blkparse sdbs > bp.sdbs.txt # text file
less bp.sdbs.txt
# iosatt like output
btt -I bp.sdbs.iostat -i bp.sdbs.bin
less bp.sdbs.iostat
Interpreting Information
Note: All time is mili seconds
Q2Q — time between requests sent to the block layer
Q2G — how
long
it takes from the time a block I/O is queued to the time it gets a request allocated
for
it
G2I — how
long
it takes from the time a request is allocated to the time it is Inserted into the device's queue
Q2M — how
long
it takes from the time a block I/O is queued to the time it gets merged with an existing request
I2D — how
long
it takes from the time a request is inserted into the device's queue to the time it is actually issued to the device
M2D — how
long
it takes from the time a block I/O is merged with an exiting request until the request is issued to the device
D2C — service time of the request by the device
Q2C — total time spent in the block layer
for
a request
Q------->G------------>I--------->M------------------->D----------------------------->C
|-Q time-|-Insert time-|
|--------- merge time ------------|-merge with other IO|
|----------------scheduler time time-------------------|---driver,adapter,storagetime--|
|----------------------- await time in iostat output ----------------------------------|
|-Q time-|-Insert time-|
|--------- merge time ------------|-merge with other IO|
|----------------scheduler time time-------------------|---driver,adapter,storagetime--|
|----------------------- await time in iostat output ----------------------------------|
If Q2Q is much larger than Q2C, that means the application is not issuing I/O in rapid succession. hus, any performance problems you have may not be at all related to the I/O subsystem.
D2C
is very high, then the device is taking a long time to service
requests. This can indicate that the device is simply overloaded (which
may be due to the fact that it is a shared resource), or it could be
because the workload sent down to the device is sub-optimal.
If Q2G is very high, it means that there are a lot of requests queued concurrently. This could indicate that the storage is unable to keep up with the I/O load.
await in iostat output = Q2C = Q2I + I2D + D2C
Q2I + I2D == scheduler time
The I2D time can include a lot of apparent extra time due to plug and unplug events (not shown above) which are used to improve merging of io within the schedule sort queue
D2C time covers driver time, adapter time, transport time, and storage service time (and back)
So if D2C/Q2C is approaching to 1, it means % time spent on storage component is high.
high D->C times, the underlying transport structure of storage needs to be examined, such as switch counters or maintenance interfaces for storage boxes themselves.
References:
* A good hp doc
* Redhat doc
* Redhat article
* btt user guide
* blktrace user guide
very precise. thanks
ReplyDelete