It is challenge even for most seasoned system admin to explain how
the IO will behave when few paths of device fail but some paths are still active. What
is your answer ….?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Most people answer this – IO will continue via remaining
active paths.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
And…….. that is wrong !!
Let me explain, it with extreme case – say one of remote
storage port (rport) of storage frame failed. So if, there were 2 active paths of device,
now device have only one good path and one failed path.
As soon as Linux kernel scsi (sc) driver will detect a failed
path, it will quiesce HBA (yes, complete HBA) and wait for all outstanding IO to complete or
timeout [1]. Then SCSI layer will activate error
handaler. NO IO WILL BE SUBMITTED until error recovery completes – even
though one path is failed and still one good paths available. This design is there to avoid
any data corruption.
11-
Abort the command after specified scsi device
timeout value defined in /sys/block//device/timeout [1]
When the error handler is triggered, it attempts the
following operations in order (until one successfully executes or all options
exhausts):
22-
Wait several seconds to hope that remote port become
online (if device is Fiber Channel Device – not applicable for SCSI device)
33-
Activate Error Handler and do following in
sequence
a. Reset
the device
b. Reset
the bus
c. Reset
the Host
First, scsi driver try to reset device and then bus. If it
is not successful, and adapter firmware and device drive decide that adapter
has not completed full recovery, adapter will be hard reset. It means, all
paths of a disk via that adapter will be unavailable for few moment –
irrespective, if they have failed are healthy. Hard reset happens when the I/O
is black-holed with NOP response in the fabric. Since, IO had been frozen by scsi drive, there
is no change of IO request drop or data corruption.
Case-1: If all of above will fail, device will be set to
offline. It means, complete device is not available via any path. It need
manual intervention to look at system/storage logs, find problem, fix it , scan
HBA to detect active paths and make
device online/running state.
Case-2: If recovery succeed, path check heuristics of multipath will mark
dead paths as failed. Now, IO to device will continue via remaining active
path.
Reduce device recovery time
To reduce overall recovery time, upgrade kernel (to version 2.6.18-371.6.1
or higher) and device_mapper (to version
0.4.7-63 or higher) to latest release to leverage on time related parameters
such as :
11-
scsi driver Error Handaling (EH) timeout – eh_timeout
(from default 10 second to 5 seconds) [4]
22-
HBA port reset time - eh_deadline ( from
disable/0 to 5 seconds) [5]
33-
Adpater reset time e.g. Qlogic reset time [2]
Add the following to
/etc/modprobe.conf an recreate initrd
options qla2xxx
ql2xextended_error_logging=1 qlport_down_retry=10 ql2xloginretrycount=10
44-
Multipath check_timeout (reduce to 10 seconds from default 60
seconds) [3]
[3] /usr/share/doc/device-mapper-multipath-0.4.7/multipath.conf.annotated
[5] Configurable
Timeout for Unresponsive Devices Configurable
Timeout for Unresponsive Devices and this
Additional Reference
1-
http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pdf
( very good presentation)
Thanks for the very useful (and what is more important - very rare) explanation of that complex matter. Could you please explain what happens in the other case.
ReplyDeleteLet's set SCSI timeout to 120 seconds. It means SCSI error handler will start only after the very first "unanswered" SCSI command exceeds 120 seconds. What happens to multipathing within those 120 seconds during which all the SCSI commands in the presumably failed path are queued? Suppose there are 8 paths to the SAN storage. One controller goes offline unexpectedly or there is another situation that pauses processing SCSI commands on the storage. So 4 paths in fact do not work properly: they just take commands, attempt to send them out and start EH timeout for each of them. What happens to other paths before EH starts and after it gets started?