Discussion:
Interpreting SRR1 and OOPS
(too old to reply)
Bill
2006-10-23 17:31:05 UTC
Permalink
I am getting the OOPS message that follows and have been having a very
difficult time determining what is causing it. According to "PowerPC
Microprocessor Family: The Programming Environments for 32-Bit
Microprocessors", "When an exception occurs, bits 1-4 and 10-15 of SRR1
are loaded with exception specific information."

SRR1 is 00089032, so bits 1-4 are 0000 and bits 10-15 are 001000.
Unfortunately, I cannot find anywhere what the "exception specific
information" contained in these bits is.

Any information on this exception or interpreting an OOPS message in
general on PPC would be greatly appreciated.



Eclipse # Machine check in kernel mode.
Caused by SRR0=0xC0005D28
Caused by (from SRR1=89032): Machine check signal
Oops: machine check, sig: 7
NIP: C3095218 XER: 00000000 LR: C30951BC SP: C015E240 REGS: c015e190
TRAP: 0200 Not tainted
MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c015c470[0] 'swapper' Last syscall: 120
last math c1db4000 last altivec 00000000
GPR00: 00000000 C015E240 C015C470 C32E6EB8 00001032 000000C6 0000008C
00000000
GPR08: C3110000 C36EF000 C310FA94 C0269600 00000175 1010E944 01FFD000
00000001
GPR16: FFFFFFFF 00000000 00000000 01FF7A0C 00001032 00000002 00000002
C3110000
GPR24: 00000001 C01B0000 C0140000 C0140000 00000002 00000002 00000000
00010000
Call backtrace:
C30951BC C30A81BC C001D25C C001D008 C0006D0C C0005B20 C00071D0
C00071EC C0003948 C01705D8 000035F0
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
d***@dogav.net
2006-10-23 20:29:17 UTC
Permalink
Please specify what PowerPC processor is involved.
For instance: if it is MPC603e (or G2) than SSR1 bit 12 indicates
"Machine check signal caused exception" for vector 0x200 which is the
exception in your case.

David Gabbay
DoGav Systems
Bill
2006-10-23 22:18:14 UTC
Permalink
MPC8248.
Rob Windgassen
2006-10-23 20:47:58 UTC
Permalink
Post by Bill
I am getting the OOPS message that follows and have been having a very
difficult time determining what is causing it. According to "PowerPC
Microprocessor Family: The Programming Environments for 32-Bit
Microprocessors", "When an exception occurs, bits 1-4 and 10-15 of SRR1
are loaded with exception specific information."
SRR1 is 00089032, so bits 1-4 are 0000 and bits 10-15 are 001000.
Unfortunately, I cannot find anywhere what the "exception specific
information" contained in these bits is.
See the chapter on exception processing, chapter 6.
Post by Bill
Any information on this exception or interpreting an OOPS message in
general on PPC would be greatly appreciated.
Machine check exception is described in 6.4.2 in my copy:

<quote>
SRR1 Bit 30 is loaded from MSR[RI] if the processor is in a recoverable
state. Otherwise cleared. The setting of all other SRR1 bits is
implementation-dependent.
</quote>

So you may need to look at the user manual of your CPU.
Post by Bill
Eclipse # Machine check in kernel mode.
Caused by SRR0=0xC0005D28
Caused by (from SRR1=89032): Machine check signal
Oops: machine check, sig: 7
NIP: C3095218 XER: 00000000 LR: C30951BC SP: C015E240 REGS: c015e190
TRAP: 0200 Not tainted
MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c015c470[0] 'swapper' Last syscall: 120
last math c1db4000 last altivec 00000000
GPR00: 00000000 C015E240 C015C470 C32E6EB8 00001032 000000C6 0000008C
00000000
GPR08: C3110000 C36EF000 C310FA94 C0269600 00000175 1010E944 01FFD000
00000001
GPR16: FFFFFFFF 00000000 00000000 01FF7A0C 00001032 00000002 00000002
C3110000
GPR24: 00000001 C01B0000 C0140000 C0140000 00000002 00000002 00000000
00010000
C30951BC C30A81BC C001D25C C001D008 C0006D0C C0005B20 C00071D0
C00071EC C0003948 C01705D8 000035F0
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
Rob
Bill
2006-10-23 22:26:01 UTC
Permalink
I looked at section 6.4.2 but did not find it very helpful. My
register settings do not match those listed. I have:

POW 0 FP 0 BE 0 DR 1
ILE 0 ME 1 FE1 0 RI 1
EE 1 FE0 0 IP 0 LE 0
PR 0 SE 0 IR 1
d***@dogav.net
2006-10-23 22:40:13 UTC
Permalink
0-11 Cleared
12 core_mcp-Machine check signal caused exception
Check the SIU's register TESCR1 (offset 0x10040) for the specific
cause.

David Gabbay
DoGav Systems
Bill
2006-10-24 15:22:01 UTC
Permalink
Should I add printing the value of this register to the OOPS message?
Is there a better way to read that register before a crash?
Post by d***@dogav.net
0-11 Cleared
12 core_mcp-Machine check signal caused exception
Check the SIU's register TESCR1 (offset 0x10040) for the specific
cause.
David Gabbay
DoGav Systems
d***@dogav.net
2006-10-24 20:04:12 UTC
Permalink
I would print it
David
Bill
2006-10-24 20:45:36 UTC
Permalink
Reading the TESCR1 revealed a PCI machine check. Then, reading the ESR
showed that there was a PCI read data parity error, which had gone
undetected because the parity error response bit in the PCI Bus Command
Register was set to 0. Once this bit was set to 1, the presense of the
parity error was confirmed.

Thank you very much. Now we know what is causing the oops and can go
about fixing it.
Post by d***@dogav.net
I would print it
David
CBFalconer
2006-10-24 22:25:39 UTC
Permalink
Post by d***@dogav.net
I would print it
This is totally meaningless. Google is not usenet - it is only a
poor imitation of an interface to the system. Read the links in my
sig. below.
--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
Loading...