cancel
Showing results for 
Search instead for 
Did you mean: 

Fatal error has occured in: PCIe fabric.(0x2)(0x45)

rmmacias
Level 2

I have a cluster server (2 nodes) with VCS 5.0 MP3, solaris 10, and they are having reboots of the nodes with the following error:

 Error for Command: write(10)               Error Level: Retryable                                    
scsi: [ID 107833 kern.notice]  Requested Block: 5226880                   Error Block: 5226880        
scsi: [ID 107833 kern.notice]  Vendor: EMC                                Serial Number: 440C1000F    
scsi: [ID 107833 kern.notice]  Sense Key: Aborted_Command                                             
scsi: [ID 107833 kern.notice]  ASC: 0x44 (internal target failure), ASCQ: 0x0, FRU: 0x0               
genunix: [ID 843051 kern.info] NOTICE: SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
unix: [ID 836849 kern.notice]                                                                          
^Mpanic[cpu12]/thread=2a10139fc80:                                                                     
unix: [ID 198415 kern.notice] Fatal error has occured in: PCIe fabric.(0x2)(0x45)                      
unix: [ID 100000 kern.notice]                                                                          

 

1 ACCEPTED SOLUTION

Accepted Solutions

g_lee
Level 6

The messages provided are from the operating system:

Oct 25 15:50:02 bbdd-04 ^Mpanic[cpu0]/thread=2a10640dc80: 
Oct 25 15:50:02 bbdd-04 unix: [ID 799565 kern.notice] BAD TRAP: type=31 rp=2a10640d530 addr=3007544a000 mmu_fsr=0
Oct 25 15:50:02 bbdd-04 unix: [ID 100000 kern.notice] 
Oct 25 15:50:02 bbdd-04 unix: [ID 839527 kern.notice] sched: 
Oct 25 15:50:02 bbdd-04 unix: [ID 520581 kern.notice] trap type = 0x31
Oct 25 15:50:02 bbdd-04 unix: [ID 381800 kern.notice] addr=0x3007544a000
Oct 25 15:50:02 bbdd-04 unix: [ID 101969 kern.notice] pid=0, pc=0x1081240, sp=0x2a10640cdd1, tstate=0x80001604, context=0x0
Oct 25 15:50:02 bbdd-04 unix: [ID 743441 kern.notice] g1-g7: 84408, e, 14c, 2a10640d4dc, 0, 10, 2a10640dc80
Oct 25 15:50:02 bbdd-04 unix: [ID 100000 kern.notice] 
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d250 unix:die+9c (31, 2a10640d530, 3007544a000, 0, 2a10640d310, da2b2000)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000001fff 0000000000000031 0000000001000000 0000000000002000
Oct 25 15:50:02 bbdd-04   %l4-7: 0000000000100000 00000000018a6940 0000000000000000 00000000010bb000
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d330 unix:trap+9e4 (2a10640d530, 10000, 1fff, 6, 3007544a000, 1)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000000 00000000018a6940 0000000000000031 0000000000000000
Oct 25 15:50:02 bbdd-04   %l4-7: 0000000000001c00 0000000000000001 0000000000000006 0000000000000002
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d480 unix:ktl0+64 (300753e2000, 2a10640d6a6, 84408, 30075449ffb, 3007544a000, 3d)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 000000000180c000 0000000000000000 0000000080001604 000000000102018c
Oct 25 15:50:02 bbdd-04   %l4-7: 00000600128192d8 0000000000030008 0000000000000000 000002a10640d530
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d5d0 emlxs:emlxs_dump_table_read+1d0 (60012800000, 19570, 2a10640d858, 2a10640d854, 60012ba02e8, 2a10640d6a0)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000002 0000060012fab9c4 0000000000000000 0000060012fab9c0
Oct 25 15:50:02 bbdd-04   %l4-7: 00000300753e2000 0000000000084408 000000007b2b5e98 0000000000000000
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d7a0 emlxs:emlxs_dump_hba+44 (60012801030, 60012a0b3f0, 60012a0b408, 1030, 60012800000, 1)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000001000 0000000000019558 0000000000019400 0000000000001000
Oct 25 15:50:02 bbdd-04   %l4-7: 0000060013572480 00000000709d1c50 000000007b2b5d20 0000000000000003
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d860 emlxs:emlxs_dump_user_event+1a4 (60012800000, 20, 3, 6001281c320, 60012801030, 60012a0b438)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 00000000709cbc00 0000000000000000 0000000000000000 0000000000000000
Oct 25 15:50:02 bbdd-04   %l4-7: 000000007b2aa800 000000007b2aa800 0000000000000000 0000000000000000
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d930 emlxs:emlxs_dump_user_thread+4 (60012800000, 0, 0, 2a10640dc80, 0, 103)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000018 0000000000000003 000014893e9f477f 0000000000000008
Oct 25 15:50:02 bbdd-04   %l4-7: 0000000000000000 0000000000000000 0000000001062a88 0000060013679da8
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d9e0 emlxs:emlxs_thread+3c (600233d1988, 0, 18a6940, 18a6940, 60012800000, 19658)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 000014893e9f447f 0000000000000004 0000000000000016 0000000000000000
Oct 25 15:50:02 bbdd-04   %l4-7: 00000000001d730d 000000007b23b1cc 0000000000019400 000000000009f4b9
Oct 25 15:50:02 bbdd-04 unix: [ID 100000 kern.notice] 
Oct 25 15:50:02 bbdd-04 genunix: [ID 672855 kern.notice] syncing file systems...
Oct 28 22:23:54 bbdd-04 ^Mpanic[cpu12]/thread=2a10139fc80: 
Oct 28 22:23:54 bbdd-04 unix: [ID 198415 kern.notice] Fatal error has occured in: PCIe fabric.(0x2)(0x45)
Oct 28 22:23:54 bbdd-04 unix: [ID 100000 kern.notice] 
Oct 28 22:23:54 bbdd-04 genunix: [ID 723222 kern.notice] 000002a101417d50 px:px_err_panic+1ac (19e1c00, 13adc00, 45, 2a101417e00, 2, 0)
Oct 28 22:23:54 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000001 00000000019e1c00 0000000000000000 0000000000000001
Oct 28 22:23:54 bbdd-04   %l4-7: 0000000000000000 000000000190d400 0000000000000001 0000000000000000
Oct 28 22:23:54 bbdd-04 genunix: [ID 723222 kern.notice] 000002a101417e60 px:px_err_intr+1a0 (2, 2, 21, 2, 300038ecaa0, 2)
Oct 28 22:23:54 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 00000300038cba90 00000300038ec940 0000000000000001 0000060010885658
Oct 28 22:23:54 bbdd-04   %l4-7: 0000060010885658 0000000000000004 0000000000000001 0000000000000001
Oct 28 22:23:54 bbdd-04 genunix: [ID 723222 kern.notice] 000002a101417f50 unix:current_thread+188 (16, 1, 1, 1000, 101010101000101, 12)
Oct 28 22:23:54 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000000001009904 000002a10139efe1 000000000000000e 0000000070010140
Oct 28 22:23:54 bbdd-04   %l4-7: 00000000ffffffff 0000000000000000 0000000000000000 000002a10139f890
Oct 28 22:23:54 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10139f930 unix:cpu_halt+104 (30009290000, c, 1913b18, 19139e8, 30009290000, 0)
Oct 28 22:23:54 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000060014020e34 0000000000000001 0000000000000016 0000000000000000
Oct 28 22:23:54 bbdd-04   %l4-7: 0000000000000000 0000000000000002 0000030009290178 0000000000000001
Oct 28 22:23:54 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10139f9e0 unix:idle+128 (183ec00, 0, 30009290000, ffffffffffffffff, d, 183d400)
Oct 28 22:23:54 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000060014020e10 000000000000001b 0000000000000000 ffffffffffffffff
Oct 28 22:23:54 bbdd-04   %l4-7: 0000060014020e10 ffffffffffffffff 00000000019139e8 00000000010430b0
Oct 28 22:23:54 bbdd-04 unix: [ID 100000 kern.notice] 

The panic stack seems to indicate a problem with emlxs /Emulex (Oct 25) and PCIe fabric (Oct 28).

In both cases there are no vx-related commands/calls in the stack, which suggests this is not an issue with VCS or Storage Foundation - so if you want to investigate further, would suggest following up with the operating system vendor (Oracle) to see what is causing the operating system to panic.

View solution in original post

2 REPLIES 2

g_lee
Level 6

The messages provided are from the operating system:

Oct 25 15:50:02 bbdd-04 ^Mpanic[cpu0]/thread=2a10640dc80: 
Oct 25 15:50:02 bbdd-04 unix: [ID 799565 kern.notice] BAD TRAP: type=31 rp=2a10640d530 addr=3007544a000 mmu_fsr=0
Oct 25 15:50:02 bbdd-04 unix: [ID 100000 kern.notice] 
Oct 25 15:50:02 bbdd-04 unix: [ID 839527 kern.notice] sched: 
Oct 25 15:50:02 bbdd-04 unix: [ID 520581 kern.notice] trap type = 0x31
Oct 25 15:50:02 bbdd-04 unix: [ID 381800 kern.notice] addr=0x3007544a000
Oct 25 15:50:02 bbdd-04 unix: [ID 101969 kern.notice] pid=0, pc=0x1081240, sp=0x2a10640cdd1, tstate=0x80001604, context=0x0
Oct 25 15:50:02 bbdd-04 unix: [ID 743441 kern.notice] g1-g7: 84408, e, 14c, 2a10640d4dc, 0, 10, 2a10640dc80
Oct 25 15:50:02 bbdd-04 unix: [ID 100000 kern.notice] 
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d250 unix:die+9c (31, 2a10640d530, 3007544a000, 0, 2a10640d310, da2b2000)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000001fff 0000000000000031 0000000001000000 0000000000002000
Oct 25 15:50:02 bbdd-04   %l4-7: 0000000000100000 00000000018a6940 0000000000000000 00000000010bb000
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d330 unix:trap+9e4 (2a10640d530, 10000, 1fff, 6, 3007544a000, 1)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000000 00000000018a6940 0000000000000031 0000000000000000
Oct 25 15:50:02 bbdd-04   %l4-7: 0000000000001c00 0000000000000001 0000000000000006 0000000000000002
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d480 unix:ktl0+64 (300753e2000, 2a10640d6a6, 84408, 30075449ffb, 3007544a000, 3d)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 000000000180c000 0000000000000000 0000000080001604 000000000102018c
Oct 25 15:50:02 bbdd-04   %l4-7: 00000600128192d8 0000000000030008 0000000000000000 000002a10640d530
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d5d0 emlxs:emlxs_dump_table_read+1d0 (60012800000, 19570, 2a10640d858, 2a10640d854, 60012ba02e8, 2a10640d6a0)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000002 0000060012fab9c4 0000000000000000 0000060012fab9c0
Oct 25 15:50:02 bbdd-04   %l4-7: 00000300753e2000 0000000000084408 000000007b2b5e98 0000000000000000
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d7a0 emlxs:emlxs_dump_hba+44 (60012801030, 60012a0b3f0, 60012a0b408, 1030, 60012800000, 1)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000001000 0000000000019558 0000000000019400 0000000000001000
Oct 25 15:50:02 bbdd-04   %l4-7: 0000060013572480 00000000709d1c50 000000007b2b5d20 0000000000000003
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d860 emlxs:emlxs_dump_user_event+1a4 (60012800000, 20, 3, 6001281c320, 60012801030, 60012a0b438)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 00000000709cbc00 0000000000000000 0000000000000000 0000000000000000
Oct 25 15:50:02 bbdd-04   %l4-7: 000000007b2aa800 000000007b2aa800 0000000000000000 0000000000000000
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d930 emlxs:emlxs_dump_user_thread+4 (60012800000, 0, 0, 2a10640dc80, 0, 103)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000018 0000000000000003 000014893e9f477f 0000000000000008
Oct 25 15:50:02 bbdd-04   %l4-7: 0000000000000000 0000000000000000 0000000001062a88 0000060013679da8
Oct 25 15:50:02 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10640d9e0 emlxs:emlxs_thread+3c (600233d1988, 0, 18a6940, 18a6940, 60012800000, 19658)
Oct 25 15:50:02 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 000014893e9f447f 0000000000000004 0000000000000016 0000000000000000
Oct 25 15:50:02 bbdd-04   %l4-7: 00000000001d730d 000000007b23b1cc 0000000000019400 000000000009f4b9
Oct 25 15:50:02 bbdd-04 unix: [ID 100000 kern.notice] 
Oct 25 15:50:02 bbdd-04 genunix: [ID 672855 kern.notice] syncing file systems...
Oct 28 22:23:54 bbdd-04 ^Mpanic[cpu12]/thread=2a10139fc80: 
Oct 28 22:23:54 bbdd-04 unix: [ID 198415 kern.notice] Fatal error has occured in: PCIe fabric.(0x2)(0x45)
Oct 28 22:23:54 bbdd-04 unix: [ID 100000 kern.notice] 
Oct 28 22:23:54 bbdd-04 genunix: [ID 723222 kern.notice] 000002a101417d50 px:px_err_panic+1ac (19e1c00, 13adc00, 45, 2a101417e00, 2, 0)
Oct 28 22:23:54 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000001 00000000019e1c00 0000000000000000 0000000000000001
Oct 28 22:23:54 bbdd-04   %l4-7: 0000000000000000 000000000190d400 0000000000000001 0000000000000000
Oct 28 22:23:54 bbdd-04 genunix: [ID 723222 kern.notice] 000002a101417e60 px:px_err_intr+1a0 (2, 2, 21, 2, 300038ecaa0, 2)
Oct 28 22:23:54 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 00000300038cba90 00000300038ec940 0000000000000001 0000060010885658
Oct 28 22:23:54 bbdd-04   %l4-7: 0000060010885658 0000000000000004 0000000000000001 0000000000000001
Oct 28 22:23:54 bbdd-04 genunix: [ID 723222 kern.notice] 000002a101417f50 unix:current_thread+188 (16, 1, 1, 1000, 101010101000101, 12)
Oct 28 22:23:54 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000000001009904 000002a10139efe1 000000000000000e 0000000070010140
Oct 28 22:23:54 bbdd-04   %l4-7: 00000000ffffffff 0000000000000000 0000000000000000 000002a10139f890
Oct 28 22:23:54 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10139f930 unix:cpu_halt+104 (30009290000, c, 1913b18, 19139e8, 30009290000, 0)
Oct 28 22:23:54 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000060014020e34 0000000000000001 0000000000000016 0000000000000000
Oct 28 22:23:54 bbdd-04   %l4-7: 0000000000000000 0000000000000002 0000030009290178 0000000000000001
Oct 28 22:23:54 bbdd-04 genunix: [ID 723222 kern.notice] 000002a10139f9e0 unix:idle+128 (183ec00, 0, 30009290000, ffffffffffffffff, d, 183d400)
Oct 28 22:23:54 bbdd-04 genunix: [ID 179002 kern.notice]   %l0-3: 0000060014020e10 000000000000001b 0000000000000000 ffffffffffffffff
Oct 28 22:23:54 bbdd-04   %l4-7: 0000060014020e10 ffffffffffffffff 00000000019139e8 00000000010430b0
Oct 28 22:23:54 bbdd-04 unix: [ID 100000 kern.notice] 

The panic stack seems to indicate a problem with emlxs /Emulex (Oct 25) and PCIe fabric (Oct 28).

In both cases there are no vx-related commands/calls in the stack, which suggests this is not an issue with VCS or Storage Foundation - so if you want to investigate further, would suggest following up with the operating system vendor (Oracle) to see what is causing the operating system to panic.

rmmacias
Level 2
Thanks for this update