Exception Handling & Fallback Strategies

This document defines how the system should respond to various failure scenarios. The middleware is responsible for implementing these fallback behaviors to ensure safe, continuous operation even when components fail.

Severity Levels

CriticalImmediate action, safety risk

HighUrgent, degraded operation

MediumCan operate with workaround

LowMinor impact, log and monitor

Operating Modes

CLOUD CONNECTED

Normal operation. Enexa cloud sends optimized dispatch commands in real-time. Full telemetry reporting. All features enabled.

LOCAL AUTONOMOUS

Cloud unreachable. Middleware uses cached day-ahead schedule or falls back to reactive BMS rules. Buffers telemetry for later sync.

SAFE STATE

Critical failure or E-stop. All active operations stopped. Equipment held in safe configuration. Requires manual intervention.

CLOUD CONNECTEDLOCAL AUTONOMOUSSAFE STATE

Degradation path: System progressively falls back to safer modes as failures accumulate

Cloud Connectivity Failures

Loss of communication with Enexa cloud services

Enexa Cloud Unreachable

Complete loss of connectivity between middleware and Enexa cloud services. No dispatch commands can be received.

Critical

Detection

No heartbeat response from cloud for 30 seconds

Response required within: 30 seconds

Recovery

Resume cloud commands immediately upon reconnection. Replay buffered telemetry.

Fallback Actions (in order)

1Middleware switches to LOCAL AUTONOMOUS MODE
2Use last known day-ahead schedule if available (valid for up to 24h)
3If no schedule: fall back to REACTIVE BMS rules
4Continue pushing telemetry to local buffer (replay when reconnected)
5Alert site operator via local HMI and SMS if configured

Intermittent Cloud Connectivity

Unstable connection with frequent packet loss or high latency (>2s RTT).

High

Detection

3+ failed requests in 60-second window OR latency >2000ms

Recovery

Gradually return to real-time dispatch as connection stabilizes.

Fallback Actions (in order)

1Increase command batch size to reduce round-trips
2Cache commands locally and apply on confirmation
3Switch to 5-minute dispatch intervals instead of real-time
4Prioritize critical commands (emergency stops, safety overrides)

Authentication/Authorization Failure

API credentials expired, revoked, or certificate mismatch.

High

Detection

HTTP 401/403 responses from cloud API

Recovery

Require manual credential re-provisioning by authorized personnel.

Fallback Actions (in order)

1Attempt credential refresh using refresh token
2If refresh fails: switch to LOCAL AUTONOMOUS MODE
3Log security event for audit
4Alert system administrator immediately

Middleware Controller Failures

Issues with the local control box software or hardware

Middleware Process Crash

The middleware controller software has crashed or become unresponsive.

Critical

Detection

Watchdog timer expiry (no heartbeat for 10 seconds)

Response required within: 10 seconds

Recovery

Middleware auto-recovers state from persistent storage on restart.

Fallback Actions (in order)

1Hardware watchdog triggers automatic process restart
2If restart fails 3x: reboot entire control box
3During restart: all equipment enters SAFE STATE
4Battery: hold current SOC, no charge/discharge
5EV chargers: continue active sessions at current power, no new sessions

Memory/Storage Exhaustion

Control box running low on memory or disk space for telemetry buffering.

Medium

Detection

Memory usage >90% OR disk usage >95%

Recovery

Restore normal operation after resource cleanup or hardware upgrade.

Fallback Actions (in order)

1Purge oldest telemetry data (FIFO)
2Reduce telemetry resolution (1s -> 5s intervals)
3Disable non-critical logging
4Alert for maintenance

Clock Synchronization Loss

System clock has drifted significantly, affecting schedule execution.

High

Detection

NTP sync failure for >1 hour OR time delta >30 seconds

Recovery

Re-sync via NTP and recalibrate schedule offsets.

Fallback Actions (in order)

1Use relative timestamps for immediate commands
2Pause schedule-based operations until sync restored
3Continue real-time reactive control
4Log all events with local monotonic timestamps

Battery System Failures

Issues with battery, BMS, or energy storage components

Battery BMS Communication Loss

Cannot communicate with battery management system. State of charge unknown.

Critical

Detection

No CAN/Modbus response for 5 seconds

Response required within: 5 seconds

Recovery

Manual inspection required before resuming battery operations.

Fallback Actions (in order)

1IMMEDIATELY cease all battery operations
2Open battery contactors if safe to do so
3Route all loads directly to grid
4Alert maintenance team

Invalid SOC Reading

SOC value is out of range, stuck, or changing impossibly fast.

High

Detection

SOC <0% OR >100% OR delta >10% per minute without corresponding power flow

Recovery

Recalibrate SOC after full charge cycle or manual verification.

Fallback Actions (in order)

1Mark SOC as UNTRUSTED
2Estimate SOC from power flow integration (coulomb counting)
3Apply conservative limits: assume SOC is 20% lower than estimated
4Reduce max charge/discharge rates by 50%

Battery Thermal Alarm

Battery temperature outside safe operating range.

Critical

Detection

Cell temperature >45C OR <0C OR delta >5C between cells

Response required within: Immediate

Recovery

Automatic resume after thermal stabilization. Log event for analysis.

Fallback Actions (in order)

1IMMEDIATELY reduce power to 0
2Activate cooling system if available
3If >55C: emergency disconnect
4Do not resume until temperature normalizes for 15 minutes

Contactor Failure

Battery contactor stuck open or closed, or feedback mismatch.

Critical

Detection

Command vs feedback state mismatch for >2 seconds

Recovery

Physical inspection and contactor replacement required.

Fallback Actions (in order)

1Attempt 3 close/open cycles
2If stuck closed: reduce power to 0, alert immediately
3If stuck open: battery unavailable, switch to grid-only mode
4Never force contactor - risk of welding

Grid Connection Failures

Issues with utility grid connection or power quality

Grid Power Outage

Complete loss of grid connection. Site is islanded.

Critical

Detection

Grid voltage <180V OR frequency outside 47-53Hz for >100ms

Response required within: 100 milliseconds

Recovery

Wait for stable grid (5 minutes), then soft reconnection with ramp-up.

Fallback Actions (in order)

1IMMEDIATE transition to island mode
2Battery becomes grid-forming (if capable)
3Shed non-critical loads per priority table
4Limit EV charging to minimum or suspend new sessions
5Preserve battery for critical loads

Poor Grid Power Quality

Voltage sags, swells, harmonics, or frequency deviations.

Medium

Detection

Voltage outside 207-253V OR THD >8% OR frequency outside 49.5-50.5Hz

Recovery

Resume normal operation when quality metrics return to acceptable range.

Fallback Actions (in order)

1Reduce grid import/export rates
2Use battery to buffer power quality issues
3Delay non-urgent charging operations
4Log power quality events for utility reporting

Grid Meter Communication Failure

Cannot read grid meter. Import/export values unknown.

High

Detection

No meter response for 10 seconds OR CRC errors

Recovery

Restore meter communication and verify accuracy before resuming.

Fallback Actions (in order)

1Estimate grid power from: Grid = Load - PV - Battery
2Mark grid readings as ESTIMATED
3Apply conservative limits to prevent export violations
4Reduce battery discharge to avoid accidental export

Solar PV Failures

Issues with solar generation or inverters

PV Inverter Fault

Solar inverter has tripped or is not producing power.

Medium

Detection

Inverter fault code OR production <5% of expected for irradiance

Recovery

Automatic retry after inverter self-clears, or manual reset.

Fallback Actions (in order)

1Remove PV from available sources
2Increase grid import allowance to compensate
3Adjust battery charging strategy (more grid, less PV)
4Alert maintenance

PV Monitoring Communication Loss

Cannot read PV production values. Output unknown.

Medium

Detection

No inverter data for 30 seconds

Recovery

Resume actual readings when communication restored.

Fallback Actions (in order)

1Estimate PV from irradiance sensor if available
2Otherwise use time-of-day profile as fallback
3Mark PV values as ESTIMATED
4Be conservative with grid export to avoid violations

EV Charger Failures

Issues with charging stations or vehicle communication

EV Charger Fault

Charging station has reported a fault or is non-responsive.

High

Detection

Charger fault code OR no heartbeat for 30 seconds

Recovery

Manual fault clear and test charge before returning to service.

Fallback Actions (in order)

1Mark charger as UNAVAILABLE
2If session active: attempt graceful stop
3Redistribute power to remaining operational chargers
4Update availability in user-facing systems
5Alert maintenance

EV Charging Overcurrent

Vehicle drawing more current than allowed by EVSE or cable rating.

High

Detection

Measured current >110% of setpoint for >5 seconds

Recovery

Gradual power increase if vehicle behaves correctly.

Fallback Actions (in order)

1Immediately reduce current setpoint by 20%
2If violation continues: pause charging for 30 seconds
3Log vehicle ID for pattern analysis
4Resume at reduced power level

Stuck Charging Session

Session appears complete but connector still locked or billed.

Medium

Detection

SOC 100% OR power <0.5kW for >10 minutes with connector locked

Recovery

Manual intervention or vehicle departure.

Fallback Actions (in order)

1Send unlock command to EVSE
2Stop billing if metered session
3If unlock fails: alert user via app notification
4Mark session as REQUIRES_ATTENTION

Safety System Events

Emergency stops and protective device activations

Emergency Stop Activated

Physical E-stop button pressed or safety system triggered.

Critical

Detection

E-stop input active OR safety relay open

Response required within: Immediate

Recovery

Physical E-stop reset + authorized personnel confirmation.

Fallback Actions (in order)

1IMMEDIATE all-stop: battery, chargers, inverters
2Open all contactors
3Maintain only monitoring and communication
4Alert all registered contacts
5Do NOT auto-recover - requires physical reset

Ground Fault Detected

Insulation failure or ground fault current detected.

Critical

Detection

RCD trip OR ground fault monitor alarm

Response required within: Immediate

Recovery

Professional inspection and repair required.

Fallback Actions (in order)

1Trip affected circuit immediately
2Isolate fault location if sectionalizing available
3Do not attempt auto-reclose on ground faults
4Alert electrical maintenance immediately

Arc Fault Detected

Potential arc fault in DC or AC wiring.

Critical

Detection

AFCI trip OR arc signature in current waveform

Response required within: Immediate

Recovery

Professional inspection and repair required.

Fallback Actions (in order)

1Immediate shutdown of affected circuit
2Battery disconnect if DC side
3PV rapid shutdown if applicable
4No auto-recovery

Data Quality Issues

Sensor failures, stale data, or measurement conflicts

Stale Telemetry Data

Received data has old timestamps, indicating sensor or comm issues.

Medium

Detection

Data timestamp >30 seconds old

Recovery

Resume normal operation when fresh data arrives.

Fallback Actions (in order)

1Mark affected readings as STALE
2Use last known good value with decay confidence
3Increase polling frequency to detect recovery
4Apply conservative control limits

Out-of-Range Sensor Values

Sensor reporting physically impossible values.

Medium

Detection

Value outside defined min/max bounds OR NaN/Inf

Recovery

Sensor recalibration or replacement.

Fallback Actions (in order)

1Reject invalid reading
2Use redundant sensor if available
3Otherwise use model-based estimate
4Flag for calibration check

Conflicting Sensor Readings

Multiple sensors for same measurement show significant disagreement.

High

Detection

Delta between redundant sensors >10% of range

Recovery

Sensor alignment or replacement.

Fallback Actions (in order)

1Use median/average of non-outlier values
2Identify and exclude the outlier sensor
3Reduce control authority until resolved
4Alert for sensor maintenance

Cyber Security Incidents

Authentication failures, attacks, and unauthorized access

Invalid or Expired Certificate

TLS certificate validation failure when connecting to Enexa cloud.

High

Detection

SSL handshake failure OR certificate expiry warning

Recovery

Install new valid certificate. Verify chain of trust before resuming.

Fallback Actions (in order)

1Reject connection immediately - do not proceed
2Switch to LOCAL AUTONOMOUS MODE
3Alert administrator for certificate renewal
4Log security event with certificate details

Replay Attack Detected

Received command with old timestamp or duplicate sequence number.

High

Detection

Command timestamp >60s old OR sequence number already seen

Recovery

Investigate source of replayed commands. May indicate network MITM.

Fallback Actions (in order)

1Reject command immediately
2Log security event with full command payload
3Continue with last valid command
4Alert security team

Unauthorized Command Source

Command received from unrecognized or unauthorized source.

Critical

Detection

Invalid API key OR command signed with unknown key

Recovery

Security audit required. Re-provision credentials if compromised.

Fallback Actions (in order)

1Reject command immediately
2Enter LOCAL AUTONOMOUS MODE
3Lock out remote commands until manual override
4Alert security team immediately

Denial of Service / Flooding

Excessive requests overwhelming the middleware.

Medium

Detection

Request rate >10x normal OR memory/CPU exhaustion

Recovery

Block attacking sources. Review firewall rules.

Fallback Actions (in order)

1Enable rate limiting (drop excess requests)
2Prioritize local safety functions
3Reduce telemetry frequency
4Log source IPs for analysis

Firmware & Configuration

Update failures, version mismatches, and configuration issues

Firmware Update Failure

OTA firmware update did not complete successfully.

High

Detection

Update process timeout OR checksum mismatch OR boot failure

Recovery

Manual firmware re-installation via local interface.

Fallback Actions (in order)

1Roll back to previous firmware version
2If rollback fails: enter SAFE STATE
3Alert maintenance team
4Do not attempt another update until diagnosed

Firmware Version Incompatibility

Middleware firmware incompatible with Enexa API version.

Medium

Detection

API returns 426 Upgrade Required OR schema validation failures

Recovery

Update middleware firmware to compatible version.

Fallback Actions (in order)

1Continue with reduced functionality
2Use last compatible command format
3Schedule firmware update
4Alert administrator

Configuration Corruption

Stored configuration is invalid or corrupted.

High

Detection

Config parse failure OR CRC mismatch

Recovery

Re-provision site configuration from Enexa admin portal.

Fallback Actions (in order)

1Load factory default configuration
2Request fresh configuration from Enexa cloud
3If cloud unavailable: use safe defaults
4Alert administrator for re-provisioning

Middleware Implementation Checklist

Requirements for Amperio Team middleware to achieve robust exception handling

Hardware Requirements

Hardware watchdog timer (10s timeout)
Persistent storage for schedule cache (min 100MB)
Real-time clock with battery backup
Redundant network interfaces (LTE + Ethernet)
UPS for controller (min 5 minute runtime)

Software Requirements

Telemetry ring buffer (min 24h at 1s resolution)
Reactive BMS algorithm implementation (fallback)
State machine for operating mode transitions
Structured logging with severity levels
Alert notification system (local HMI + remote)

Required Failure Injection Tests

All scenarios must be tested before production deployment

Before deploying the middleware, Amperio Team must demonstrate successful handling of:

Cloud disconnect for 1 hour (simulate network drop)
Middleware process kill and auto-restart
Battery BMS communication interruption
Grid power outage and island transition
E-stop activation and recovery sequence
Simultaneous multi-charger faults
Clock drift simulation (NTP block)
Memory exhaustion under load