Tuesday, March 31, 2015

Linux kernel - crash utility to analysis vmcore

What is crash ?


A tool to interactively analyzing the state of the crash generated by kdump, netdump, diskdump, LKCD, xendump or kvmdump


What are packages required?


yum install crash kernel-debuginfo-$(uname -r) kernel-debuginfo-common-x86_64-$(uname -r) 

How to check created kernel crash file vmcore is valid?

If below command will show date when kernel crash was created and exit - then vmcore is created.

# crash -st /usr/lib/debug/lib/modules/$(uname -r)/vmlinux vmcore
Tue Mar 10 04:02:55 2015

Is vmcore created by crash is valid?

It will take couple of minutes to load kernel and show you crash prompt if crash is valid and you are using correct debug package. It will show number of cpu, system name, memory size, time when crash was created, load of system, date when system was crashed.

# crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux vmcore
crash 5.1.8-2.el5_9
Copyright (C) 2002-2011  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/2.6.18-308.8.2.el5/vmlinux
    DUMPFILE: vmcore
        CPUS: 2
        DATE: Tue Mar 10 04:02:55 2015
      UPTIME: 1 days, 00:09:41
LOAD AVERAGE: 9.29, 4.35, 2.80
       TASKS: 205
    NODENAME: dev001.example.com
     RELEASE: 2.6.18-308.8.2.el5
     VERSION: #1 SMP Tue May 29 11:54:17 EDT 2012
     MACHINE: x86_64  (2699 Mhz)
      MEMORY: 3.9 GB
       PANIC: "Kernel panic - not syncing: out of memory. panic_on_oom is selected"
         PID: 3419
     COMMAND: "sshd"
        TASK: ffff81013b2e5860  [THREAD_INFO: ffff81012f594000]
         CPU: 1
       STATE: TASK_RUNNING (PANIC)

How to use crash commands ?

-log: dump lernel log_buf contents in chronological order.Most interesting information will be at the end of log file. kernel thread dump, memory dump, swap usages etc.

crash>log
.....
lowmem_reserve[]: 0 3000 4010 4010
Node 0 DMA32 free:10008kB min:6052kB low:7564kB high:9076kB active:1483204kB inactive:1420176kB present:3072160kB pages_scanned:10951420 all_unreclaimable? yes
lowmem_reserve[]: 0 0 1010 1010
Node 0 Normal free:1980kB min:2036kB low:2544kB high:3052kB active:442232kB inactive:458364kB present:1034240kB pages_scanned:1716084 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 4*4kB 3*8kB 4*16kB 1*32kB 3*64kB 4*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 2*4096kB = 10056kB
Node 0 DMA32: 0*4kB 5*8kB 1*16kB 1*32kB 1*64kB 1*128kB 4*256kB 1*512kB 0*1024kB 0*2048kB 2*4096kB = 10008kB
Node 0 Normal: 5*4kB 5*8kB 0*16kB 0*32kB 2*64kB 4*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1980kB
Node 0 HighMem: empty
10029 pagecache pages
Swap cache: add 127866954, delete 127858677, find 188220155/188834341, race 0+1692
Free swap  = 0kB
Total swap = 2031608kB
Free swap:            0kB
1310720 pages of RAM
332619 reserved pages
20496 pages shared
8412 pages swap cached
Kernel panic - not syncing: out of memory. panic_on_oom is selected


INFO: task kjournald:602 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D ffff810009004420     0   602     49           631   578 (L-TLB)

INFO: task nagios:20831 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nagios        D ffff810009004420     0 20831   3763               20729 (NOTLB)



-ps : process status at time of crash. Active tasks have a preceding >

crash>ps
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
      0      0   0  ffffffff80319b60  RU   0.0       0      0  [swapper]
      0      1   1  ffff8101047460c0  RU   0.0       0      0  [swapper]
      1      0   1  ffff8101047347a0  IN   0.0   10372    484  init
      2      1   0  ffff810104734040  IN   0.0       0      0  [migration/0]
  2979      1   1  ffff81013f5827a0  IN   0.0   28664    356  restorecond
>  3419      1   1  ffff81013b2e5860  RU   0.0   60808    456  sshd
   2988  19639   0  ffff810130580820  IN   0.0    5876    388  vmstat
   2990      1   1  ffff81013cb267a0  IN   0.0    5932    472  syslogd
crash>



-bt : display a kernel stack backtrace. bt -a - stack tradce of all active tasks

crash> bt
PID: 3419   TASK: ffff81013b2e5860  CPU: 1   COMMAND: "sshd"
 #0 [ffff81012f595aa0] crash_kexec at ffffffff800b099c
 #1 [ffff81012f595b60] panic at ffffffff80093989
 #2 [ffff81012f595c50] out_of_memory at ffffffff800caa5d
 #3 [ffff81012f595ca0] __alloc_pages at ffffffff8000f612
 #4 [ffff81012f595d10] read_swap_cache_async at ffffffff80032415
 #5 [ffff81012f595d50] swapin_readahead at ffffffff800d0777
 #6 [ffff81012f595da0] __handle_mm_fault at ffffffff800092d9
 #7 [ffff81012f595e60] do_page_fault at ffffffff80067202
 #8 [ffff81012f595f50] error_exit at ffffffff8005dde9
    RIP: 00002b3e2f66a25a  RSP: 00007fff86964840  RFLAGS: 00010206
    RAX: 0000000000000000  RBX: 00002b3e2f9261d8  RCX: 00002b3e2f9261d8
    RDX: 00002b3e4a2363e0  RSI: 000000000000000a  RDI: 0000000000000001
    RBP: 00007fff86964860   R8: 00002b3e4a2363e0   R9: 000000000000000a
    R10: 00007fff86964e10  R11: 0000000000000246  R12: 00002b3e4a236570
    R13: 00007fff86964e50  R14: 0000000000000001  R15: 00002b3e2cd76078
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
crash>


-bt : to backtrace a PID

crash > bt
crash > bt -f



-bt -f : display all stack data contained in a frame; this option can be used to determine the arguments passed to each function

crash>bt -f
 #0 [ffff81012f595aa0] crash_kexec at ffffffff800b099c
 #1 [ffff81012f595b60] panic at ffffffff80093989
 #2 [ffff81012f595c50] out_of_memory at ffffffff800caa5d



-kmem : details of kernel memory location e.g. For #2 above

crash> kmem ffffffff800caa5d
ffffffff800caa5d (T) out_of_memory+593 ../debug/kernel-2.6.18/linux-2.6.18-308.8.2.el5.x86_64/mm/oom_kill.c: 506

      PAGE       PHYSICAL      MAPPING       INDEX CNT FLAGS
ffff810100009c30   2ca000                0        0  1 400
crash>




-whatis : search symbol table for data or type information

crash> whatis crash_kexec
void crash_kexec(struct pt_regs *);

crash> whatis panic
void panic(const char *, ...);

crash> whatis out_of_memory
void out_of_memory(struct zonelist *, gfp_t, int, int);



-some_more_commands

crash> sys|egrep -i "cpu|date|uptime|load|tasks|node|release|machine|memory|panic"
    DUMPFILE: /cores/retrace/tasks/152431766/crash/vmcore  [PARTIAL DUMP]
        CPUS: 24
        DATE: Wed Sep  9 07:54:53 2015
      UPTIME: 322 days, 11:00:25
LOAD AVERAGE: 0.13, 0.13, 0.13
       TASKS: 755
    NODENAME: ms00456
     RELEASE: 2.6.32-358.el6.x86_64
     MACHINE: x86_64  (2494 Mhz)
      MEMORY: 192 GB
       PANIC: "Kernel panic - not syncing: An NMI occurred, please see the Integrated Management Log for details."

crash> rd -a 0xffffffff8201c001 100
ffffffff8201c000:  HP
ffffffff8201c004:  P70
ffffffff8201c008:  2.8
ffffffff8201c010:  12/20/2013  <<<
ffffffff8201c01c:  HP
ffffffff8201c020:  ProLiant DL380p Gen8


crash> dis -rl 0xffffffffa00574ca
0xffffffffa00574c8 : callq  0xffffffff8150cf21
/usr/src/debug/kernel-2.6.32-358.el6/linux-2.6.32-358.el6.x86_64/drivers/watchdog/hpwdt.c: 495


crash> mod |grep -E "NAME|"
     MODULE       NAME                   SIZE  OBJECT FILE
ffffffffa00581a0  hpwdt                  7094  /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-358.el6.x86_64/kernel/drivers/watchdog/hpwdt.ko.debug 


crash> px ((struct module *)0xffffffffa00581a0)->name
$2 = "hpwdt\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"

crash> px ((struct module *)0xffffffffa00581a0)->version
$3 = 0xffff88301782cb20 "1.3.0"


crash> px ((struct module *)0xffffffffa00581a0)->srcversion
$4 = 0xffff8830194fe4c0 "87D39D97B9E0A6F667C8671"


crash> px ((struct module *)0xffffffffa00581a0)->gpgsig_ok
$5 = 0x1


crash> px notify_die
notify_die = $6 = 
 {int (enum die_val, const char *, struct pt_regs *, long, int, int)} 0xffffffff8109cbd0


crash> eval -b 00000000000000f1


-Other crash command

help : list all supported crash command
help : to know available option and details of a crash command e.g 'help bt'
net : show network interfaces and ip address configured
mount : show mount point and mount option at time of crash
sys : system specific information excatly what you see when crash was started
swap : swap usages
task : task structure of running task. In above example, it is sshd with PID 3419
runq : displays the tasks on the run queues of each cpu
set -v : display the current state of internal crash variables.


How to run crash analysis unattended and save in a text file ?


# cat > inputfile.txt
sys
mount
net
swap
log
ps
runq
bt
bt -a
bt -g
bt -t
bt -al
bt -f
foreach bt
task
kmem
kmem -i
kmem -S
exit

# crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux vmcore -i inputfile.txt > crash-analysys.txt
# less crash-analysys.txt

- References
http://people.redhat.com/anderson/crash_whitepaper/
http://www.dedoimedo.com/computers/crash-book.html
https://codeascraft.com/2012/03/30/kernel-debugging-101/

Tuesday, March 10, 2015

sar, ksar and watcher-cron - how to use to troubleshoot a Linux system

sar

To know about all options of sar, best place is : $ man sar

ksar


I have created a page on wikipedia : http://en.wikipedia.org/wiki/Ksar_%28Unix_sar_grapher%29

Few points

Always create 1 second interval data for troubleshooting. 

  •  below command will run in background to create binary file having 1 second resolution
  •  binary file will be around 25M
  •  text file size will be 40M


sar -A -o /var/tmp/sar-bin-date-systemname.bin  1 3600 >/dev/null 2>&1 &
ls -l /var/tmp/sar-bin-date-systemname.bin
sar -A -f /var/tmp/sar-bin-date-systemname.bin
LC_ALL=C sar -A -p -f /var/tmp/sar-bin-date-systemname.bin /var/tmp/sar-txt-date-systemname.txt

Load sar-txt-date-systemname.txt in ksar to see granular details.

An example:

Application was loading a 3.5G file on a 4 GB system. It was being closed within 10 second but it was enough to generate huge page-out and ~100% utilization of disk  !

$ sar -rpdB -s 02:27:29 -e 02:27:45 -f /var/tmp/sar-bin-date-systemname.bin
Linux 2.6.18-308.8.2.el5 (testsys001mysystem.com)      10/03/15        _x86_64_        (2 CPU)

02:27:28     pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
02:27:29        16.16   1826.26 239842.42      1.01 115282.83      0.00      0.00      0.00      0.00
02:27:30        60.00     56.00 331109.00      8.00 204665.00      0.00      0.00      0.00      0.00
02:27:31         0.00     60.00 517659.00      0.00 345388.00      0.00      0.00      0.00      0.00
02:27:32         0.00    327.27 610073.74      0.00 364861.62      0.00      0.00      0.00      0.00
02:27:33         0.00     60.61 330316.16      0.00 181757.58      0.00      0.00      0.00      0.00
02:27:34         0.00    526.53 590945.92      0.00 429371.43      0.00      0.00      0.00      0.00
02:27:35        16.00   1204.00 340554.00      0.00 232284.00   2272.00   4832.00   1080.00     15.20
02:27:36       252.00    704.00  60046.00      4.00  58837.00   3296.00  19806.00  17118.00     74.10
02:27:37      1706.93 334978.22 302981.19     71.29 304998.02 141251.49  44208.91  94489.11     50.95
02:27:38     26574.76 459071.84 349949.51    567.96 359980.58 183487.38  46633.01 115279.61     50.10
02:27:39      7526.53 446142.86 330251.02    262.24 328176.53 198889.80  24097.96 110475.51     49.54
02:27:40      1503.85 413742.31 292217.31     59.62 292017.31 181600.00  24369.23 102852.88     49.94
02:27:41      1619.80  81607.92  56802.97     43.56 979036.63  32285.15   8807.92  20769.31     50.54
02:27:42      3155.56    113.13   8978.79     82.83   1768.69      0.00      0.00      0.00      0.00
02:27:43     13457.43    190.10  23859.41    630.69  13454.46      0.00      0.00      0.00      0.00
02:27:44      1296.97    367.68    128.28     44.44    109.09      0.00      0.00      0.00      0.00
02:27:45      1979.59     36.73   7173.47     36.73    890.82      0.00      0.00      0.00      0.00
Average:      3528.90 103975.52 258296.12    107.83 248765.63  44355.97  10310.59  27579.75     50.45

02:27:28    kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
02:27:29      3626448    286212      7.32     14448    109424    587200      9.88
02:27:30      3203408    709252     18.13     14460    109464    980300     16.49
02:27:31      2509876   1402784     35.85     14468    109472   1670800     28.11
02:27:32      1558540   2354120     60.17     14480    109516   2681920     45.12
02:27:33      1002236   2910424     74.38     14492    109480   3295636     55.44
02:27:34       402404   3510256     89.72     14508    109512   3777916     63.56
02:27:35        22300   3890360     99.43     14496    105340   4261476     71.69
02:27:36        22272   3890388     99.43      2832     52344   4266140     71.77
02:27:37        22888   3889772     99.42       520     12892   4640740     78.07
02:27:38        32216   3880444     99.18       872     25420   5006940     84.23
02:27:39        30604   3882056     99.22      1300     27428   5464516     91.93
02:27:40        35468   3877192     99.09      1316     28580   5850808     98.43
02:27:41      3759396    153264      3.92      1336     29276    478424      8.05
02:27:42      3744892    167768      4.29      1548     30996    551996      9.29
02:27:43      3743188    169472      4.33      1812     41488    446000      7.50
02:27:44      3742168    170492      4.36      1840     42508    446000      7.50
02:27:45      3733504    179156      4.58      1936     44268    511600      8.61
Average:      1834812   2077848     53.11      6863     64553   2642260     44.45

02:27:28          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
02:27:29          sda     56.57     32.32   3652.53     65.14      0.19      3.39      0.61      3.43
02:27:30          sda     11.00    120.00    112.00     21.09      0.04      4.00      4.00      4.40
02:27:31          sda      3.00      0.00    120.00     40.00      0.00      0.33      0.33      0.10
02:27:32          sda      3.03      0.00     80.81     26.67      0.01      2.00      2.00      0.61
02:27:33          sda      6.06      0.00    694.95    114.67      0.00      0.33      0.33      0.20
02:27:34          sda     22.45      0.00   1053.06     46.91      0.02      0.95      0.23      0.51
02:27:35          sda    207.00     32.00   2408.00     11.79      0.24      1.15      0.17      3.60
02:27:36          sda     45.00    504.00   1248.00     38.93      0.38      8.49      3.62     16.30
02:27:37          sda   2812.87   3413.86 666899.01    238.30      7.28      2.58      0.29     81.98
02:27:38          sda   3228.16  53141.75 919541.75    301.31     20.51      6.35      0.30     97.96
02:27:39          sda   4312.24  14995.92 888693.88    209.56     11.21      2.59      0.23    100.10
02:27:40          sda   8043.27   3069.23 832607.69    103.90     13.67      1.70      0.12     97.60
02:27:41          sda    899.01   3239.60 163215.84    185.15      1.58      1.75      0.32     28.51
02:27:42          sda    225.25   6044.44    226.26     27.84      0.41      1.80      1.11     24.95
02:27:43          sda    699.01  27176.24    380.20     39.42      1.44      2.08      1.20     83.76
02:27:44          sda     72.73   2585.86    735.35     45.67      0.12      1.69      1.54     11.21
02:27:45          sda     89.80   3967.35     73.47     45.00      0.15      1.66      1.26     11.33
Average:          sda   1242.32   7057.80 207951.03    173.07      3.43      2.76      0.27     33.71


watcher-cron

A simple script provided by redhat to collect system performance data using below commands. Use this along with sar data. See redhat article https://access.redhat.com/articles/279063 for more detail.

iostat
vmstat
top
mpstat

Also add iotop and ps aux --forest commands in script:

sequences=`seq -s ' ' $2`
eval "for i in $sequences; do date;ps aux --forest; sleep $1;done ${_compress_all} >  $3/${_time}-info-ps-forest.log${_ext_all} &" ; pspid=$!
eval "iotop -n $1 -d $2 -b               ${_compress_all} >  $3/${_time}-info-iotop.log${_ext_all} &" ; iotoppid=$!

References:
https://access.redhat.com/articles/1145953 - PCP
Performance monitoring tools