Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding of IBC blackbox timestamps continues to be weird #473

Open
mkeeter opened this issue Apr 10, 2024 · 0 comments
Open

Decoding of IBC blackbox timestamps continues to be weird #473

mkeeter opened this issue Apr 10, 2024 · 0 comments

Comments

@mkeeter
Copy link
Contributor

mkeeter commented Apr 10, 2024

Looking at the log from https://github.com/oxidecomputer/mfg-quality/issues/140#issuecomment-2046150254, we're still seeing timestamp weirdness.

Here's a condensed version of that log:

FAULT EVENT
  EVENT_INDEX:        0
  TIMESTAMP           0x000002d9 = 1 min, 12.9 sec
  EVENT_ID            0x0000
  STATUS_WORD         0x0010
    0x0010: Output overcurrent fault
  STATUS_IOUT         0x80
    0x80: Iout Overcurrent Fault
FAULT EVENT
  EVENT_INDEX:        1
  TIMESTAMP           0x000002e3 = 1 min, 13.9 sec
  EVENT_ID            0x0001
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x01
    0x01: INPUT_LOW_EVENT
FAULT EVENT
  EVENT_INDEX:        2
  TIMESTAMP           0x27ffb2e3 = 776 day, 16 hr, 48 min, 9.9 sec
  EVENT_ID            0x0002
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x02
    0x02: CANCEL_EVENT
FAULT EVENT
  EVENT_INDEX:        3
  TIMESTAMP           0x4fff62e3 = 1553 day, 9 hr, 35 min, 5.9 sec
  EVENT_ID            0x0003
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x02
    0x02: CANCEL_EVENT

LIFECYCLE EVENT
  EVENT_INDEX:        24
  TIMESTAMP           0xffa9b177 = 4964 day, 11 hr, 31 min, 49.5 sec
  EVENT_ID            0x0194
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x05
    0x05: ERASE_OVFL_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        25
  TIMESTAMP           0xffa9b177 = 4964 day, 11 hr, 31 min, 49.5 sec
  EVENT_ID            0x0195
  STATUS_WORD         0x2000
    0x2000: input voltage, input current, or input power fault or warning
  STATUS_INPUT        0x08
    0x08: Insufficient Vin
LIFECYCLE EVENT
  EVENT_INDEX:        26
  TIMESTAMP           0xffa9b177 = 4964 day, 11 hr, 31 min, 49.5 sec
  EVENT_ID            0x0196
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x01
    0x01: INPUT_LOW_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        27
  TIMESTAMP           0xffa9b17a = +0.0 sec
  EVENT_ID            0x0197
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x00
    0x00: BOOT_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        28
  TIMESTAMP           0xffa9b17e = +0.4 sec
  EVENT_ID            0x0198
  STATUS_WORD         0x0040
    0x0040: Unit is not providing power
  STATUS_MFR          0x05
    0x05: Unit is on due to primary remote control
LIFECYCLE EVENT
  EVENT_INDEX:        29
  TIMESTAMP           0xffa9b17e = +0.4 sec
  EVENT_ID            0x0199
  STATUS_WORD         0x0800
    0x0800: Power-Good signal
  STATUS_MFR          0x01
    0x01: Power is good
LIFECYCLE EVENT
  EVENT_INDEX:        30
  TIMESTAMP           0xffadaffe = +7 hr, 16 min, 16.4 sec
  EVENT_ID            0x019a
  STATUS_WORD         0x2000
    0x2000: input voltage, input current, or input power fault or warning
  STATUS_INPUT        0x08
    0x08: Insufficient Vin
LIFECYCLE EVENT
  EVENT_INDEX:        31
  TIMESTAMP           0xffadb000 = +7 hr, 16 min, 16.6 sec
  EVENT_ID            0x019b
  STATUS_WORD         0x0040
    0x0040: Unit is not providing power
  STATUS_MFR          0x04
    0x04: Unit is on due to internal configuration
LIFECYCLE EVENT
  EVENT_INDEX:        32
  TIMESTAMP           0xffe8eea2 = +0.0 sec
  EVENT_ID            0x019c
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x06
    0x06: TIME_ERASE_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        33
  TIMESTAMP           0xffe52922 = 4968 day, 23 hr, 47 min, 15.4 sec
  EVENT_ID            0x019d
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x05
    0x05: ERASE_OVFL_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        34
  TIMESTAMP           0xffe52927 = 4968 day, 23 hr, 47 min, 15.9 sec
  EVENT_ID            0x019e
  STATUS_WORD         0x0040
    0x0040: Unit is not providing power
  STATUS_MFR          0x05
    0x05: Unit is on due to primary remote control
LIFECYCLE EVENT
  EVENT_INDEX:        35
  TIMESTAMP           0xffe52927 = 4968 day, 23 hr, 47 min, 15.9 sec
  EVENT_ID            0x019f
  STATUS_WORD         0x0800
    0x0800: Power-Good signal
  STATUS_MFR          0x01
    0x01: Power is good
LIFECYCLE EVENT
  EVENT_INDEX:        36
  TIMESTAMP           0xffe9102b = +0.0 sec
  EVENT_ID            0x01a0
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x00
    0x00: BOOT_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        37
  TIMESTAMP           0xffe9102e = +0.3 sec
  EVENT_ID            0x01a1
  STATUS_WORD         0x0040
    0x0040: Unit is not providing power
  STATUS_MFR          0x05
    0x05: Unit is on due to primary remote control
LIFECYCLE EVENT
  EVENT_INDEX:        38
  TIMESTAMP           0xffe9102e = +0.3 sec
  EVENT_ID            0x01a2
  STATUS_WORD         0x0800
    0x0800: Power-Good signal
  STATUS_MFR          0x01
    0x01: Power is good
LIFECYCLE EVENT
  EVENT_INDEX:        39
  TIMESTAMP           0xffe91046 = +2.7 sec
  EVENT_ID            0x01a3
  STATUS_WORD         0x0002
    0x0002: CML fault
  STATUS_CML          0x02
    0x02: Other communication fault
LIFECYCLE EVENT
  EVENT_INDEX:        40
  TIMESTAMP           0x00103ffe = 1 day, 5 hr, 34 min, 55.8 sec
  EVENT_ID            0x01a4
  STATUS_WORD         0x2000
    0x2000: input voltage, input current, or input power fault or warning
  STATUS_INPUT        0x08
    0x08: Insufficient Vin
LIFECYCLE EVENT
  EVENT_INDEX:        41
  TIMESTAMP           0x00103ffe = 1 day, 5 hr, 34 min, 55.8 sec
  EVENT_ID            0x01a5
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x01
    0x01: INPUT_LOW_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        42
  TIMESTAMP           0x00104000 = +0.0 sec
  EVENT_ID            0x01a6
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x00
    0x00: BOOT_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        43
  TIMESTAMP           0x00104003 = +0.3 sec
  EVENT_ID            0x01a7
  STATUS_WORD         0x0040
    0x0040: Unit is not providing power
  STATUS_MFR          0x05
    0x05: Unit is on due to primary remote control
LIFECYCLE EVENT
  EVENT_INDEX:        44
  TIMESTAMP           0x001a1b84 = +17 hr, 56 min, 42.0 sec
  EVENT_ID            0x01a8
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x05
    0x05: ERASE_OVFL_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        45
  TIMESTAMP           0x001a1b84 = +17 hr, 56 min, 42.0 sec
  EVENT_ID            0x01a9
  STATUS_WORD         0x2000
    0x2000: input voltage, input current, or input power fault or warning
  STATUS_INPUT        0x08
    0x08: Insufficient Vin
LIFECYCLE EVENT
  EVENT_INDEX:        46
  TIMESTAMP           0x001a1b84 = +17 hr, 56 min, 42.0 sec
  EVENT_ID            0x01aa
  STATUS_WORD         0x0040
    0x0040: Unit is not providing power
  STATUS_MFR          0x04
    0x04: Unit is on due to internal configuration

Remember, there are two separate ring buffers for fault (0-23) and life-cycle events (24-48).

There are many different weird things happening here:

  • In the fault ring, I see 2x CANCEL_EVENT logs. According to the datasheet, this indicates that a 2.1-year timeout has passed, which is definitely not true.
  • In the life-cycle ring, I see an ERASE_OVFL_EVENT, which "indicates that the event recorder was overloaded while erasing old records but did recover". This... doesn't say much. The acronym of OVFL makes me think it's actually an "overflow", which makes more sense, because the timestamp is 0xffa9b177 (i.e. close to saturating a u32).
    • On the other hand, event is an OVFL at timestamp 0x001a1b84, which is far from overflowing.
  • Event 33 (another ERASE_OVFL_EVENT) actually goes backwards in time (?!). ShouldERASE_OVFL_EVENT be treated as establishing a new relative time-stamp, like BOOT_EVENT and TIME_ERASE_EVENT?
    • Event 40 also goes back in time, and is a normal system event
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant