Tuesday, March 31, 2015

Linux kernel - crash utility to analysis vmcore

What is crash ?


A tool to interactively analyzing the state of the crash generated by kdump, netdump, diskdump, LKCD, xendump or kvmdump


What are packages required?


yum install crash kernel-debuginfo-$(uname -r) kernel-debuginfo-common-x86_64-$(uname -r) 

How to check created kernel crash file vmcore is valid?

If below command will show date when kernel crash was created and exit - then vmcore is created.

# crash -st /usr/lib/debug/lib/modules/$(uname -r)/vmlinux vmcore
Tue Mar 10 04:02:55 2015

Is vmcore created by crash is valid?

It will take couple of minutes to load kernel and show you crash prompt if crash is valid and you are using correct debug package. It will show number of cpu, system name, memory size, time when crash was created, load of system, date when system was crashed.

# crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux vmcore
crash 5.1.8-2.el5_9
Copyright (C) 2002-2011  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/2.6.18-308.8.2.el5/vmlinux
    DUMPFILE: vmcore
        CPUS: 2
        DATE: Tue Mar 10 04:02:55 2015
      UPTIME: 1 days, 00:09:41
LOAD AVERAGE: 9.29, 4.35, 2.80
       TASKS: 205
    NODENAME: dev001.example.com
     RELEASE: 2.6.18-308.8.2.el5
     VERSION: #1 SMP Tue May 29 11:54:17 EDT 2012
     MACHINE: x86_64  (2699 Mhz)
      MEMORY: 3.9 GB
       PANIC: "Kernel panic - not syncing: out of memory. panic_on_oom is selected"
         PID: 3419
     COMMAND: "sshd"
        TASK: ffff81013b2e5860  [THREAD_INFO: ffff81012f594000]
         CPU: 1
       STATE: TASK_RUNNING (PANIC)

How to use crash commands ?

-log: dump lernel log_buf contents in chronological order.Most interesting information will be at the end of log file. kernel thread dump, memory dump, swap usages etc.

crash>log
.....
lowmem_reserve[]: 0 3000 4010 4010
Node 0 DMA32 free:10008kB min:6052kB low:7564kB high:9076kB active:1483204kB inactive:1420176kB present:3072160kB pages_scanned:10951420 all_unreclaimable? yes
lowmem_reserve[]: 0 0 1010 1010
Node 0 Normal free:1980kB min:2036kB low:2544kB high:3052kB active:442232kB inactive:458364kB present:1034240kB pages_scanned:1716084 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 4*4kB 3*8kB 4*16kB 1*32kB 3*64kB 4*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 2*4096kB = 10056kB
Node 0 DMA32: 0*4kB 5*8kB 1*16kB 1*32kB 1*64kB 1*128kB 4*256kB 1*512kB 0*1024kB 0*2048kB 2*4096kB = 10008kB
Node 0 Normal: 5*4kB 5*8kB 0*16kB 0*32kB 2*64kB 4*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1980kB
Node 0 HighMem: empty
10029 pagecache pages
Swap cache: add 127866954, delete 127858677, find 188220155/188834341, race 0+1692
Free swap  = 0kB
Total swap = 2031608kB
Free swap:            0kB
1310720 pages of RAM
332619 reserved pages
20496 pages shared
8412 pages swap cached
Kernel panic - not syncing: out of memory. panic_on_oom is selected


INFO: task kjournald:602 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D ffff810009004420     0   602     49           631   578 (L-TLB)

INFO: task nagios:20831 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nagios        D ffff810009004420     0 20831   3763               20729 (NOTLB)



-ps : process status at time of crash. Active tasks have a preceding >

crash>ps
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
      0      0   0  ffffffff80319b60  RU   0.0       0      0  [swapper]
      0      1   1  ffff8101047460c0  RU   0.0       0      0  [swapper]
      1      0   1  ffff8101047347a0  IN   0.0   10372    484  init
      2      1   0  ffff810104734040  IN   0.0       0      0  [migration/0]
  2979      1   1  ffff81013f5827a0  IN   0.0   28664    356  restorecond
>  3419      1   1  ffff81013b2e5860  RU   0.0   60808    456  sshd
   2988  19639   0  ffff810130580820  IN   0.0    5876    388  vmstat
   2990      1   1  ffff81013cb267a0  IN   0.0    5932    472  syslogd
crash>



-bt : display a kernel stack backtrace. bt -a - stack tradce of all active tasks

crash> bt
PID: 3419   TASK: ffff81013b2e5860  CPU: 1   COMMAND: "sshd"
 #0 [ffff81012f595aa0] crash_kexec at ffffffff800b099c
 #1 [ffff81012f595b60] panic at ffffffff80093989
 #2 [ffff81012f595c50] out_of_memory at ffffffff800caa5d
 #3 [ffff81012f595ca0] __alloc_pages at ffffffff8000f612
 #4 [ffff81012f595d10] read_swap_cache_async at ffffffff80032415
 #5 [ffff81012f595d50] swapin_readahead at ffffffff800d0777
 #6 [ffff81012f595da0] __handle_mm_fault at ffffffff800092d9
 #7 [ffff81012f595e60] do_page_fault at ffffffff80067202
 #8 [ffff81012f595f50] error_exit at ffffffff8005dde9
    RIP: 00002b3e2f66a25a  RSP: 00007fff86964840  RFLAGS: 00010206
    RAX: 0000000000000000  RBX: 00002b3e2f9261d8  RCX: 00002b3e2f9261d8
    RDX: 00002b3e4a2363e0  RSI: 000000000000000a  RDI: 0000000000000001
    RBP: 00007fff86964860   R8: 00002b3e4a2363e0   R9: 000000000000000a
    R10: 00007fff86964e10  R11: 0000000000000246  R12: 00002b3e4a236570
    R13: 00007fff86964e50  R14: 0000000000000001  R15: 00002b3e2cd76078
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
crash>


-bt : to backtrace a PID

crash > bt
crash > bt -f



-bt -f : display all stack data contained in a frame; this option can be used to determine the arguments passed to each function

crash>bt -f
 #0 [ffff81012f595aa0] crash_kexec at ffffffff800b099c
 #1 [ffff81012f595b60] panic at ffffffff80093989
 #2 [ffff81012f595c50] out_of_memory at ffffffff800caa5d



-kmem : details of kernel memory location e.g. For #2 above

crash> kmem ffffffff800caa5d
ffffffff800caa5d (T) out_of_memory+593 ../debug/kernel-2.6.18/linux-2.6.18-308.8.2.el5.x86_64/mm/oom_kill.c: 506

      PAGE       PHYSICAL      MAPPING       INDEX CNT FLAGS
ffff810100009c30   2ca000                0        0  1 400
crash>




-whatis : search symbol table for data or type information

crash> whatis crash_kexec
void crash_kexec(struct pt_regs *);

crash> whatis panic
void panic(const char *, ...);

crash> whatis out_of_memory
void out_of_memory(struct zonelist *, gfp_t, int, int);



-some_more_commands

crash> sys|egrep -i "cpu|date|uptime|load|tasks|node|release|machine|memory|panic"
    DUMPFILE: /cores/retrace/tasks/152431766/crash/vmcore  [PARTIAL DUMP]
        CPUS: 24
        DATE: Wed Sep  9 07:54:53 2015
      UPTIME: 322 days, 11:00:25
LOAD AVERAGE: 0.13, 0.13, 0.13
       TASKS: 755
    NODENAME: ms00456
     RELEASE: 2.6.32-358.el6.x86_64
     MACHINE: x86_64  (2494 Mhz)
      MEMORY: 192 GB
       PANIC: "Kernel panic - not syncing: An NMI occurred, please see the Integrated Management Log for details."

crash> rd -a 0xffffffff8201c001 100
ffffffff8201c000:  HP
ffffffff8201c004:  P70
ffffffff8201c008:  2.8
ffffffff8201c010:  12/20/2013  <<<
ffffffff8201c01c:  HP
ffffffff8201c020:  ProLiant DL380p Gen8


crash> dis -rl 0xffffffffa00574ca
0xffffffffa00574c8 : callq  0xffffffff8150cf21
/usr/src/debug/kernel-2.6.32-358.el6/linux-2.6.32-358.el6.x86_64/drivers/watchdog/hpwdt.c: 495


crash> mod |grep -E "NAME|"
     MODULE       NAME                   SIZE  OBJECT FILE
ffffffffa00581a0  hpwdt                  7094  /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-358.el6.x86_64/kernel/drivers/watchdog/hpwdt.ko.debug 


crash> px ((struct module *)0xffffffffa00581a0)->name
$2 = "hpwdt\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"

crash> px ((struct module *)0xffffffffa00581a0)->version
$3 = 0xffff88301782cb20 "1.3.0"


crash> px ((struct module *)0xffffffffa00581a0)->srcversion
$4 = 0xffff8830194fe4c0 "87D39D97B9E0A6F667C8671"


crash> px ((struct module *)0xffffffffa00581a0)->gpgsig_ok
$5 = 0x1


crash> px notify_die
notify_die = $6 = 
 {int (enum die_val, const char *, struct pt_regs *, long, int, int)} 0xffffffff8109cbd0


crash> eval -b 00000000000000f1


-Other crash command

help : list all supported crash command
help : to know available option and details of a crash command e.g 'help bt'
net : show network interfaces and ip address configured
mount : show mount point and mount option at time of crash
sys : system specific information excatly what you see when crash was started
swap : swap usages
task : task structure of running task. In above example, it is sshd with PID 3419
runq : displays the tasks on the run queues of each cpu
set -v : display the current state of internal crash variables.


How to run crash analysis unattended and save in a text file ?


# cat > inputfile.txt
sys
mount
net
swap
log
ps
runq
bt
bt -a
bt -g
bt -t
bt -al
bt -f
foreach bt
task
kmem
kmem -i
kmem -S
exit

# crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux vmcore -i inputfile.txt > crash-analysys.txt
# less crash-analysys.txt

- References
http://people.redhat.com/anderson/crash_whitepaper/
http://www.dedoimedo.com/computers/crash-book.html
https://codeascraft.com/2012/03/30/kernel-debugging-101/

3 comments:

  1. Hi Ansari, Thanks for the article, From where we can get usr/lib/debug/lib/modules/$(uname -r)/vmlinux file

    ReplyDelete
  2. Hi Ansari, Thanks for the article, From where we can get usr/lib/debug/lib/modules/$(uname -r)/vmlinux file. Sundaram

    ReplyDelete
  3. Hi Uthavumkarangal,

    (1) http://debuginfo.centos.org/

    (2) https://access.redhat.com/solutions/9907

    For Red Hat Enterprise Linux 5.8+, 6 and 7
    With the release of RHEL 6 the debuginfo packages are no longer provided via the Red Hat public FTP site. They have instead moved to Red Hat Network (RHN) classic or Red Hat Satellite for download.

    ReplyDelete