Ansari's Technical Notes: March 2015

What is crash ?

A tool to interactively analyzing the state of the crash generated by kdump, netdump, diskdump, LKCD, xendump or kvmdump

What are packages required?

yum install crash kernel-debuginfo-$(uname -r) kernel-debuginfo-common-x86_64-$(uname -r)

How to check created kernel crash file vmcore is valid?

If below command will show date when kernel crash was created and exit - then vmcore is created.

# crash -st /usr/lib/debug/lib/modules/$(uname -r)/vmlinux vmcore
Tue Mar 10 04:02:55 2015

Is vmcore created by crash is valid?

It will take couple of minutes to load kernel and show you crash prompt if crash is valid and you are using correct debug package. It will show number of cpu, system name, memory size, time when crash was created, load of system, date when system was crashed.

# crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux vmcore
crash 5.1.8-2.el5_9
Copyright (C) 2002-2011 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

KERNEL: /usr/lib/debug/lib/modules/2.6.18-308.8.2.el5/vmlinux
DUMPFILE: vmcore
CPUS: 2
DATE: Tue Mar 10 04:02:55 2015
UPTIME: 1 days, 00:09:41
LOAD AVERAGE: 9.29, 4.35, 2.80
TASKS: 205
NODENAME: dev001.example.com
RELEASE: 2.6.18-308.8.2.el5
VERSION: #1 SMP Tue May 29 11:54:17 EDT 2012
MACHINE: x86_64 (2699 Mhz)
MEMORY: 3.9 GB
PANIC: "Kernel panic - not syncing: out of memory. panic_on_oom is selected"
PID: 3419
COMMAND: "sshd"
TASK: ffff81013b2e5860 [THREAD_INFO: ffff81012f594000]
CPU: 1
STATE: TASK_RUNNING (PANIC)

How to use crash commands ?

-log: dump lernel log_buf contents in chronological order.Most interesting information will be at the end of log file. kernel thread dump, memory dump, swap usages etc.

crash>log
.....
lowmem_reserve[]: 0 3000 4010 4010
Node 0 DMA32 free:10008kB min:6052kB low:7564kB high:9076kB active:1483204kB inactive:1420176kB present:3072160kB pages_scanned:10951420 all_unreclaimable? yes
lowmem_reserve[]: 0 0 1010 1010
Node 0 Normal free:1980kB min:2036kB low:2544kB high:3052kB active:442232kB inactive:458364kB present:1034240kB pages_scanned:1716084 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 4*4kB 3*8kB 4*16kB 1*32kB 3*64kB 4*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 2*4096kB = 10056kB
Node 0 DMA32: 0*4kB 5*8kB 1*16kB 1*32kB 1*64kB 1*128kB 4*256kB 1*512kB 0*1024kB 0*2048kB 2*4096kB = 10008kB
Node 0 Normal: 5*4kB 5*8kB 0*16kB 0*32kB 2*64kB 4*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1980kB
Node 0 HighMem: empty
10029 pagecache pages
Swap cache: add 127866954, delete 127858677, find 188220155/188834341, race 0+1692
Free swap = 0kB
Total swap = 2031608kB
Free swap: 0kB
1310720 pages of RAM
332619 reserved pages
20496 pages shared
8412 pages swap cached
Kernel panic - not syncing: out of memory. panic_on_oom is selected

INFO: task kjournald:602 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald D ffff810009004420 0 602 49 631 578 (L-TLB)

INFO: task nagios:20831 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nagios D ffff810009004420 0 20831 3763 20729 (NOTLB)

-ps : process status at time of crash. Active tasks have a preceding >

crash>ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff80319b60 RU 0.0 0 0 [swapper]
0 1 1 ffff8101047460c0 RU 0.0 0 0 [swapper]
1 0 1 ffff8101047347a0 IN 0.0 10372 484 init
2 1 0 ffff810104734040 IN 0.0 0 0 [migration/0]
2979 1 1 ffff81013f5827a0 IN 0.0 28664 356 restorecond
> 3419 1 1 ffff81013b2e5860 RU 0.0 60808 456 sshd
2988 19639 0 ffff810130580820 IN 0.0 5876 388 vmstat
2990 1 1 ffff81013cb267a0 IN 0.0 5932 472 syslogd
crash>

-bt : display a kernel stack backtrace. bt -a - stack tradce of all active tasks

crash> bt
PID: 3419 TASK: ffff81013b2e5860 CPU: 1 COMMAND: "sshd"
#0 [ffff81012f595aa0] crash_kexec at ffffffff800b099c
#1 [ffff81012f595b60] panic at ffffffff80093989
#2 [ffff81012f595c50] out_of_memory at ffffffff800caa5d
#3 [ffff81012f595ca0] __alloc_pages at ffffffff8000f612
#4 [ffff81012f595d10] read_swap_cache_async at ffffffff80032415
#5 [ffff81012f595d50] swapin_readahead at ffffffff800d0777
#6 [ffff81012f595da0] __handle_mm_fault at ffffffff800092d9
#7 [ffff81012f595e60] do_page_fault at ffffffff80067202
#8 [ffff81012f595f50] error_exit at ffffffff8005dde9
RIP: 00002b3e2f66a25a RSP: 00007fff86964840 RFLAGS: 00010206
RAX: 0000000000000000 RBX: 00002b3e2f9261d8 RCX: 00002b3e2f9261d8
RDX: 00002b3e4a2363e0 RSI: 000000000000000a RDI: 0000000000000001
RBP: 00007fff86964860 R8: 00002b3e4a2363e0 R9: 000000000000000a
R10: 00007fff86964e10 R11: 0000000000000246 R12: 00002b3e4a236570
R13: 00007fff86964e50 R14: 0000000000000001 R15: 00002b3e2cd76078
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
crash>

-bt : to backtrace a PID

crash > bt
crash > bt -f

-bt -f : display all stack data contained in a frame; this option can be used to determine the arguments passed to each function

crash>bt -f
#0 [ffff81012f595aa0] crash_kexec at ffffffff800b099c
#1 [ffff81012f595b60] panic at ffffffff80093989
#2 [ffff81012f595c50] out_of_memory at ffffffff800caa5d

-kmem : details of kernel memory location e.g. For #2 above

crash> kmem ffffffff800caa5d
ffffffff800caa5d (T) out_of_memory+593 ../debug/kernel-2.6.18/linux-2.6.18-308.8.2.el5.x86_64/mm/oom_kill.c: 506

PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffff810100009c30 2ca000 0 0 1 400
crash>

-whatis : search symbol table for data or type information

crash> whatis crash_kexec
void crash_kexec(struct pt_regs *);

crash> whatis panic
void panic(const char *, ...);

crash> whatis out_of_memory
void out_of_memory(struct zonelist *, gfp_t, int, int);

-some_more_commands

crash> sys|egrep -i "cpu|date|uptime|load|tasks|node|release|machine|memory|panic"
DUMPFILE: /cores/retrace/tasks/152431766/crash/vmcore [PARTIAL DUMP]
CPUS: 24
DATE: Wed Sep 9 07:54:53 2015
UPTIME: 322 days, 11:00:25
LOAD AVERAGE: 0.13, 0.13, 0.13
TASKS: 755
NODENAME: ms00456
RELEASE: 2.6.32-358.el6.x86_64
MACHINE: x86_64 (2494 Mhz)
MEMORY: 192 GB
PANIC: "Kernel panic - not syncing: An NMI occurred, please see the Integrated Management Log for details."

crash> rd -a 0xffffffff8201c001 100
ffffffff8201c000: HP
ffffffff8201c004: P70
ffffffff8201c008: 2.8
ffffffff8201c010: 12/20/2013 <<<
ffffffff8201c01c: HP
ffffffff8201c020: ProLiant DL380p Gen8

crash> dis -rl 0xffffffffa00574ca
0xffffffffa00574c8 : callq 0xffffffff8150cf21
/usr/src/debug/kernel-2.6.32-358.el6/linux-2.6.32-358.el6.x86_64/drivers/watchdog/hpwdt.c: 495

crash> mod |grep -E "NAME|"
MODULE NAME SIZE OBJECT FILE
ffffffffa00581a0 hpwdt 7094 /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-358.el6.x86_64/kernel/drivers/watchdog/hpwdt.ko.debug

crash> px ((struct module *)0xffffffffa00581a0)->name
$2 = "hpwdt\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"

crash> px ((struct module *)0xffffffffa00581a0)->version
$3 = 0xffff88301782cb20 "1.3.0"

crash> px ((struct module *)0xffffffffa00581a0)->srcversion
$4 = 0xffff8830194fe4c0 "87D39D97B9E0A6F667C8671"

crash> px ((struct module *)0xffffffffa00581a0)->gpgsig_ok
$5 = 0x1

crash> px notify_die
notify_die = $6 =
{int (enum die_val, const char *, struct pt_regs *, long, int, int)} 0xffffffff8109cbd0

crash> eval -b 00000000000000f1

-Other crash command

help : list all supported crash command
help : to know available option and details of a crash command e.g 'help bt'
net : show network interfaces and ip address configured
mount : show mount point and mount option at time of crash
sys : system specific information excatly what you see when crash was started
swap : swap usages
task : task structure of running task. In above example, it is sshd with PID 3419
runq : displays the tasks on the run queues of each cpu
set -v : display the current state of internal crash variables.

How to run crash analysis unattended and save in a text file ?

# cat > inputfile.txt
sys
mount
net
swap
log
ps
runq
bt
bt -a
bt -g
bt -t
bt -al
bt -f
foreach bt
task
kmem
kmem -i
kmem -S
exit

# crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux vmcore -i inputfile.txt > crash-analysys.txt
# less crash-analysys.txt

- References
http://people.redhat.com/anderson/crash_whitepaper/
http://www.dedoimedo.com/computers/crash-book.html
https://codeascraft.com/2012/03/30/kernel-debugging-101/

sar

To know about all options of sar, best place is : $ man sar

ksar

I have created a page on wikipedia : http://en.wikipedia.org/wiki/Ksar_%28Unix_sar_grapher%29

Few points

Always create 1 second interval data for troubleshooting.

below command will run in background to create binary file having 1 second resolution
binary file will be around 25M
text file size will be 40M

sar -A -o /var/tmp/sar-bin-date-systemname.bin 1 3600 >/dev/null 2>&1 &
ls -l /var/tmp/sar-bin-date-systemname.bin
sar -A -f /var/tmp/sar-bin-date-systemname.bin
LC_ALL=C sar -A -p -f /var/tmp/sar-bin-date-systemname.bin /var/tmp/sar-txt-date-systemname.txt

Load sar-txt-date-systemname.txt in ksar to see granular details.

An example:

Application was loading a 3.5G file on a 4 GB system. It was being closed within 10 second but it was enough to generate huge page-out and ~100% utilization of disk !

$ sar -rpdB -s 02:27:29 -e 02:27:45 -f /var/tmp/sar-bin-date-systemname.bin
Linux 2.6.18-308.8.2.el5 (testsys001mysystem.com) 10/03/15 _x86_64_ (2 CPU)

02:27:28 pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
02:27:29 16.16 1826.26 239842.42 1.01 115282.83 0.00 0.00 0.00 0.00
02:27:30 60.00 56.00 331109.00 8.00 204665.00 0.00 0.00 0.00 0.00
02:27:31 0.00 60.00 517659.00 0.00 345388.00 0.00 0.00 0.00 0.00
02:27:32 0.00 327.27 610073.74 0.00 364861.62 0.00 0.00 0.00 0.00
02:27:33 0.00 60.61 330316.16 0.00 181757.58 0.00 0.00 0.00 0.00
02:27:34 0.00 526.53 590945.92 0.00 429371.43 0.00 0.00 0.00 0.00
02:27:35 16.00 1204.00 340554.00 0.00 232284.00 2272.00 4832.00 1080.00 15.20
02:27:36 252.00 704.00 60046.00 4.00 58837.00 3296.00 19806.00 17118.00 74.10
02:27:37 1706.93 334978.22 302981.19 71.29 304998.02 141251.49 44208.91 94489.11 50.95
02:27:38 26574.76 459071.84 349949.51 567.96 359980.58 183487.38 46633.01 115279.61 50.10
02:27:39 7526.53 446142.86 330251.02 262.24 328176.53 198889.80 24097.96 110475.51 49.54
02:27:40 1503.85 413742.31 292217.31 59.62 292017.31 181600.00 24369.23 102852.88 49.94
02:27:41 1619.80 81607.92 56802.97 43.56 979036.63 32285.15 8807.92 20769.31 50.54
02:27:42 3155.56 113.13 8978.79 82.83 1768.69 0.00 0.00 0.00 0.00
02:27:43 13457.43 190.10 23859.41 630.69 13454.46 0.00 0.00 0.00 0.00
02:27:44 1296.97 367.68 128.28 44.44 109.09 0.00 0.00 0.00 0.00
02:27:45 1979.59 36.73 7173.47 36.73 890.82 0.00 0.00 0.00 0.00
Average: 3528.90 103975.52 258296.12 107.83 248765.63 44355.97 10310.59 27579.75 50.45

02:27:28 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
02:27:29 3626448 286212 7.32 14448 109424 587200 9.88
02:27:30 3203408 709252 18.13 14460 109464 980300 16.49
02:27:31 2509876 1402784 35.85 14468 109472 1670800 28.11
02:27:32 1558540 2354120 60.17 14480 109516 2681920 45.12
02:27:33 1002236 2910424 74.38 14492 109480 3295636 55.44
02:27:34 402404 3510256 89.72 14508 109512 3777916 63.56
02:27:35 22300 3890360 99.43 14496 105340 4261476 71.69
02:27:36 22272 3890388 99.43 2832 52344 4266140 71.77
02:27:37 22888 3889772 99.42 520 12892 4640740 78.07
02:27:38 32216 3880444 99.18 872 25420 5006940 84.23
02:27:39 30604 3882056 99.22 1300 27428 5464516 91.93
02:27:40 35468 3877192 99.09 1316 28580 5850808 98.43
02:27:41 3759396 153264 3.92 1336 29276 478424 8.05
02:27:42 3744892 167768 4.29 1548 30996 551996 9.29
02:27:43 3743188 169472 4.33 1812 41488 446000 7.50
02:27:44 3742168 170492 4.36 1840 42508 446000 7.50
02:27:45 3733504 179156 4.58 1936 44268 511600 8.61
Average: 1834812 2077848 53.11 6863 64553 2642260 44.45

02:27:28 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
02:27:29 sda 56.57 32.32 3652.53 65.14 0.19 3.39 0.61 3.43
02:27:30 sda 11.00 120.00 112.00 21.09 0.04 4.00 4.00 4.40
02:27:31 sda 3.00 0.00 120.00 40.00 0.00 0.33 0.33 0.10
02:27:32 sda 3.03 0.00 80.81 26.67 0.01 2.00 2.00 0.61
02:27:33 sda 6.06 0.00 694.95 114.67 0.00 0.33 0.33 0.20
02:27:34 sda 22.45 0.00 1053.06 46.91 0.02 0.95 0.23 0.51
02:27:35 sda 207.00 32.00 2408.00 11.79 0.24 1.15 0.17 3.60
02:27:36 sda 45.00 504.00 1248.00 38.93 0.38 8.49 3.62 16.30
02:27:37 sda 2812.87 3413.86 666899.01 238.30 7.28 2.58 0.29 81.98
02:27:38 sda 3228.16 53141.75 919541.75 301.31 20.51 6.35 0.30 97.96
02:27:39 sda 4312.24 14995.92 888693.88 209.56 11.21 2.59 0.23 100.10
02:27:40 sda 8043.27 3069.23 832607.69 103.90 13.67 1.70 0.12 97.60
02:27:41 sda 899.01 3239.60 163215.84 185.15 1.58 1.75 0.32 28.51
02:27:42 sda 225.25 6044.44 226.26 27.84 0.41 1.80 1.11 24.95
02:27:43 sda 699.01 27176.24 380.20 39.42 1.44 2.08 1.20 83.76
02:27:44 sda 72.73 2585.86 735.35 45.67 0.12 1.69 1.54 11.21
02:27:45 sda 89.80 3967.35 73.47 45.00 0.15 1.66 1.26 11.33
Average: sda 1242.32 7057.80 207951.03 173.07 3.43 2.76 0.27 33.71

watcher-cron

A simple script provided by redhat to collect system performance data using below commands. Use this along with sar data. See redhat article https://access.redhat.com/articles/279063 for more detail.

iostat
vmstat
top
mpstat

Also add iotop and ps aux --forest commands in script:

sequences=`seq -s ' ' $2`
eval "for i in $sequences; do date;ps aux --forest; sleep $1;done ${_compress_all} > $3/${_time}-info-ps-forest.log${_ext_all} &" ; pspid=$!
eval "iotop -n $1 -d $2 -b ${_compress_all} > $3/${_time}-info-iotop.log${_ext_all} &" ; iotoppid=$!

References:
https://access.redhat.com/articles/1145953 - PCP
Performance monitoring tools

Ansari's Technical Notes

Tuesday, March 31, 2015

Linux kernel - crash utility to analysis vmcore