Challenge 4: Anomaly detection deployment

Objective: Deploy behavioural anomaly detection to identify abnormal turbine behaviour that looks normal at the protocol level.

Category: Detection & Threat Hunting

Difficulty: Intermediate

Time Required: 40-50 minutes

Learning outcomes

By completing this challenge, you will:

  1. Establish statistical baselines for normal system behaviour

  2. Configure range limits for safety-critical parameters

  3. Set rate-of-change limits to detect sudden attacks

  4. Detect attacks that bypass protocol-level controls

  5. Tune detection sensitivity to balance false positives vs false negatives

  6. Understand when detection is more important than prevention

Background: Why behavioural detection?

The Problem: Attacks look like normal operations at the protocol level.

A Modbus write is just a Modbus write. You can’t tell if speed_setpoint = 1850 is:

  • Legitimate operation by authorised engineer

  • Attack by compromised credentials

  • Malware manipulating process values

Solution: Behavioural anomaly detection looks at what is being written, not just who is writing it.

Detection Methods

  1. Statistical Baselines:

  • Learn normal behaviour over time (mean, standard deviation)

  • Detect values that deviate significantly (e.g., 3 sigma from mean)

  • Good for: Detecting unusual but valid values

  1. Range Limits:

  • Hard min/max values for safety-critical parameters

  • Any value outside range is anomalous

  • Good for: Enforcing physical safety limits

  1. Rate-of-Change:

  • Maximum allowed rate of change per second

  • Detects sudden jumps or rapid increases

  • Good for: Detecting abrupt attacks or sensor failures

Current state (no detection)

Before hardening, the simulation has:

  • ✅ Turbine operating at normal speed (1500 RPM)

  • ❌ No anomaly detection: Abnormal behaviour undetected

  • ❌ No baseline learning

  • ❌ No statistical analysis

Result: Gradual overspeed attacks go unnoticed until physical damage occurs.

Part 1: Configuration changes (require restart)

Configuration changes establish persistent detection baselines.

Step 1.1: Enable anomaly detection

Edit config/anomaly_detection.yml:

# Enable detection globally
enabled: true

# Detection thresholds
sigma_threshold: 3.0  # 3 standard deviations (99.7% of values)
learning_window: 1000  # Samples needed to establish baseline

Sigma Threshold Trade-offs:

  • 2.0 = 95% coverage (more sensitive, more false positives)

  • 3.0 = 99.7% coverage (balanced - recommended)

  • 4.0 = 99.99% coverage (less sensitive, may miss attacks)

Step 1.2: Configure baselines

Add device parameters to monitor:

baselines:
  # Turbine speed monitoring
  - device: turbine_plc_1
    parameter: speed
    learning_window: 1000
    enabled: true

  # Turbine temperature
  - device: turbine_plc_1
    parameter: temperature
    learning_window: 1000
    enabled: true

  # Reactor core temperature
  - device: reactor_plc_1
    parameter: core_temperature
    learning_window: 1000
    enabled: true

Step 1.3: Set range limits

Define hard safety limits:

range_limits:
  # Turbine speed: 800-1800 RPM
  - device: turbine_plc_1
    parameter: speed
    min_value: 800.0
    max_value: 1800.0
    severity: high  # Overspeed is critical safety issue

  # Reactor temperature: 250-350°C
  - device: reactor_plc_1
    parameter: core_temperature
    min_value: 250.0
    max_value: 350.0
    severity: critical

Step 1.4: Set rate-of-change limits

Prevent sudden attacks:

rate_limits:
  # Turbine speed: max 10 RPM/second
  - device: turbine_plc_1
    parameter: speed
    max_rate: 10.0
    severity: high

  # Reactor temperature: max 5°C/second
  - device: reactor_plc_1
    parameter: core_temperature
    max_rate: 5.0
    severity: high

Step 1.5: Restart simulation

# Restart to apply config changes
python tools/simulator_manager.py

Configuration changes are now persistent and active on every startup.

Part 2: Runtime operations (immediate, temporary)

Runtime changes take effect immediately but are lost on restart.

Step 2.1: Enable detection (runtime)

python tools/blue_team.py anomaly enable

Step 2.2: Add Baseline Monitoring (runtime)

# Monitor turbine speed
python tools/blue_team.py anomaly add-baseline \
  --device turbine_plc_1 \
  --parameter speed \
  --learning-window 1000

# Monitor turbine temperature
python tools/blue_team.py anomaly add-baseline \
  --device turbine_plc_1 \
  --parameter temperature \
  --learning-window 1000

Learning Period: The detector needs learning_window samples before statistical detection activates.

  • During learning: Only range/rate limits active

  • After learning: Statistical anomaly detection also active

Step 2.3: Set range limits (runtime)

# Turbine speed limits
python tools/blue_team.py anomaly set-range \
  --device turbine_plc_1 \
  --parameter speed \
  --min 800 \
  --max 1800

# Reactor temperature limits
python tools/blue_team.py anomaly set-range \
  --device reactor_plc_1 \
  --parameter core_temperature \
  --min 250 \
  --max 350

Step 2.4: Set rate limits (runtime)

# Turbine speed rate limit
python tools/blue_team.py anomaly set-rate \
  --device turbine_plc_1 \
  --parameter speed \
  --max-rate 10.0

# Reactor temperature rate limit
python tools/blue_team.py anomaly set-rate \
  --device reactor_plc_1 \
  --parameter core_temperature \
  --max-rate 5.0

Part 3: Testing detection capabilities

Step 3.1: Run the demonstration

python examples/anomaly_detection_demo.py

This demonstrates:

  1. Baseline establishment (learning normal behaviour)

  2. Normal operations (no anomalies)

  3. Gradual attack (rate limit violation)

  4. Sudden attack (range limit violation)

Watch the Output:

  • Which attacks are detected?

  • Which detection method caught each attack?

  • What’s the deviation magnitude?

Step 3.2: Test overspeed attack detection

# Terminal 1: Start simulation
python simulation.py

# Wait for baseline to establish (1000 samples)
# Monitor: python tools/blue_team.py anomaly stats

# Terminal 2: Run attack
python scripts/exploitation/turbine_overspeed_attack.py --target-speed 1850

Check detection:

# View detected anomalies
python tools/blue_team.py anomaly list

# Check audit logs
python tools/blue_team.py audit search "anomaly|ANOMALY"

# View statistics
python tools/blue_team.py anomaly stats

Questions:

  • Was the attack detected?

  • Which detection method triggered?

  • At what speed did detection occur?

  • How quickly was it detected?

Step 3.3: Test gradual attack

# Gradual attack (slow increase to avoid rate limit)
python scripts/exploitation/turbine_overspeed_attack.py \
  --target-speed 1850 \
  --step-size 2 \
  --delay 1.0

Questions:

  • Does gradual attack evade rate-of-change detection?

  • Does it eventually trigger range limit?

  • Does statistical baseline detect it?

  • How long until detection?

Part 4: Tuning detection sensitivity

Step 4.1: Understanding false positives

Run normal operations and count anomalies:

# Run simulation for 1 hour (simulation time)
# Let system operate normally

# Check anomaly count
python tools/blue_team.py anomaly stats

Calculate False Positive Rate:

False Positive Rate = (Anomalies / Total Samples) * 100

Acceptable rates:

  • < 1%: Excellent (operators won’t ignore alerts)

  • 1-5%: Good (manageable)

  • 5-10%: Marginal (alarm fatigue risk)

  • 10%: Poor (operators will ignore alerts)

Step 4.2: Adjusting sigma threshold

Edit config/anomaly_detection.yml:

# More sensitive (more false positives)
sigma_threshold: 2.0  # Detects 95% deviations

# Less sensitive (fewer false positives, may miss attacks)
sigma_threshold: 4.0  # Detects 99.99% deviations

# Balanced (recommended)
sigma_threshold: 3.0  # Detects 99.7% deviations

Test impact:

  1. Set threshold to 2.0, run normal operations, count anomalies

  2. Set threshold to 4.0, run attack, check if detected

  3. Find optimal balance for your environment

Step 4.3: Adjusting learning window

# Faster baseline, less stable
learning_window: 500

# Slower baseline, more stable
learning_window: 2000

# Balanced
learning_window: 1000

Trade-offs:

  • Large window: Stable baseline, but slow to adapt to operational changes

  • Small window: Fast adaptation, but noisy and less reliable

Step 4.4: Adjusting rate limits

# Strict (may catch legitimate fast changes)
rate_limits:
  - device: turbine_plc_1
    parameter: speed
    max_rate: 5.0  # Very slow changes only

# Lenient (may miss gradual attacks)
rate_limits:
  - device: turbine_plc_1
    parameter: speed
    max_rate: 20.0  # Allows faster changes

# Balanced
rate_limits:
  - device: turbine_plc_1
    parameter: speed
    max_rate: 10.0  # Moderate rate

Finding the Right Rate:

  1. Observe normal operations during setpoint changes

  2. Measure actual rate of change during normal ops

  3. Set limit slightly above normal maximum

  4. Test with attacks to ensure detection

Part 5: Attack detection scenarios

Scenario 1: Sudden overspeed attack

Attack: Attacker suddenly sets turbine to 1900 RPM.

Detection:

python tools/blue_team.py anomaly list --limit 10

Expected Results:

  • ✓ Range limit violation (1900 > 1800 max)

  • ✓ Rate-of-change violation (sudden jump)

  • ✓ Statistical anomaly (far from baseline mean)

  • Severity: HIGH or CRITICAL

Response:

  • Investigate source of command

  • Check authentication logs

  • Verify if authorized operation

  • Initiate emergency shutdown if needed

Scenario 2: Gradual attack (boiling frog)

Attack: Slowly increase speed 2 RPM every 10 seconds.

Detection Challenge:

  • ✗ Rate limit not violated (2 RPM/10sec = 0.2 RPM/sec < 10 limit)

  • ? Statistical baseline may detect (if increase continues)

  • ✓ Range limit eventually violated (at 1800 RPM)

Detection Timeline:

  • t=0: Speed 1500 RPM (normal)

  • t=1500s: Speed 1800 RPM (range limit reached)

  • Result: Attack detected after 25 minutes

Mitigation:

  • Tighter rate limits (catches gradual changes)

  • Alarm on sustained one-direction trends

  • Require authorisation for setpoint changes

Scenario 3: Reconnaissance (parameter reading)

Attack: Attacker reads all parameters to map system.

Detection Challenge:

  • ✗ Read operations don’t affect process values

  • ✗ Anomaly detection doesn’t see reads

  • Need: IDS/IPS or protocol analysis

Lesson: Anomaly detection complements but doesn’t replace other controls.

Scenario 4: Sensor manipulation

Attack: Attacker falsifies sensor readings instead of control outputs.

Detection:

  • Statistical anomaly if sensor value deviates

  • But actual process may be unaffected

  • Physical sensors may show different values

Response:

  • Compare sensor readings to expected physics

  • Cross-check multiple sensors

  • Verify sensor calibration

Part 6: Operational challenges

1: Startup anomalies

Problem: System startup creates anomalies (not attacks).

Solution:

# Disable detection during startup
# Or accept high false positive rate
# Or have separate startup baseline

Best Practice:

  • Suppress anomalies during known maintenance windows

  • Log as INFO instead of WARNING during startup

  • Resume normal detection after stabilisation period

2: Mode changes

Problem: Operating modes have different normal ranges.

Example:

  • Startup mode: 0-1000 RPM (ramping up)

  • Normal mode: 1400-1600 RPM (steady state)

  • Peak demand: 1600-1800 RPM (high output)

Solution:

  • Mode-aware baselines

  • Switch detection parameters based on current mode

  • Or wider limits that cover all modes (less sensitive)

3: Seasonal variations

Problem: Load patterns change with seasons (winter vs summer demand).

Solution:

  • Periodic baseline retraining

  • Seasonal adjustment factors

  • Longer learning windows to capture variations

4: Maintenance operations

Problem: Legitimate maintenance violates baselines.

Solution:

  • Maintenance mode flag (disables/relaxes detection)

  • Require authorisation + justification

  • Enhanced logging during maintenance

  • Resume normal detection after maintenance

Part 7: Combining detection methods

Defence in depth: All detection layers

Challenge 4 (Anomaly Detection) works with other challenges:

Layer 1: Protocol Filtering (Challenge 5)

  • Blocks dangerous function codes

  • Prevents batch writes (FC 15/16)

  • Blocks diagnostics (FC 08)

Layer 2: RBAC (Challenge 2)

  • Verifies user permissions

  • Blocks unauthorised writes

  • Enforces role separation

Layer 3: Anomaly Detection (Challenge 4) ← You are here

  • Detects abnormal values (statistical)

  • Enforces safety limits (range)

  • Detects rapid changes (rate)

Layer 4: Audit Logging (Challenge 3)

  • Records all operations

  • Enables forensics

  • Detects patterns over time

Attack detection matrix

Attack Type

Protocol

RBAC

Anomaly

Audit

Result

External Overspeed

Blocked

Insider Overspeed

?

Detected

Gradual Attack

?

?

Delayed Det

Authorised Abuse

Detected

Reconnaissance

?

Logged

Sensor Manipulation

?

Detected

Key:

  • ✓ = Detected/Blocked

  • ✗ = Not detected

  • ? = Depends on permissions

Lesson: No single layer catches everything. Defence in depth essential.

Part 8: Advanced topics

Time-series forecasting

Predict expected values based on historical patterns:

# Predict next value based on trend
expected = detector.forecast_next_value(
    device="turbine_plc_1",
    parameter="speed",
    window=100,
)

# Compare actual vs predicted
if abs(actual - expected) > threshold:
    # Anomaly: Value doesn't match trend

Pattern recognition

Detect specific attack patterns:

# Detect sawtooth pattern (repeated increases then sudden drops)
# Typical of attacker testing limits

# Detect sustained one-direction trend
# Gradual attack or sensor drift

# Detect oscillation increase
# System instability or control loop attack

Correlation across systems

Detect coordinated attacks:

# If turbine speed anomalous AND reactor power anomalous
# => Coordinated attack on multiple systems

# If multiple devices show anomalies at same time
# => Potential widespread attack or infrastructure failure

Machine learning integration

Beyond statistical methods:

  • Neural networks for pattern recognition

  • Clustering for operational mode detection

  • Reinforcement learning for adaptive thresholds

  • Ensemble methods combining multiple detectors

Learning reflection

What?

  1. Detection Methods:

    • Statistical baselines catch deviations from normal

    • Range limits enforce physical safety bounds

    • Rate limits detect sudden or rapid changes

    • Each method has strengths and weaknesses

  2. Tuning Trade-offs:

    • Sensitivity vs false positives

    • Learning speed vs baseline stability

    • Coverage vs alarm fatigue

    • No perfect threshold exists

  3. Operational Realities:

    • Startups and shutdowns create anomalies

    • Maintenance operations violate baselines

    • Mode changes complicate detection

    • Context matters (not just the value)

  4. Defence in Depth:

    • Anomaly detection complements other controls

    • Catches attacks that bypass protocol filtering

    • Detects insider threats with valid credentials

    • Enables detection when prevention fails

Discussion

  1. False Positives:

    • How many false alarms are acceptable?

    • What happens when operators ignore alerts?

    • How to reduce false positives without missing attacks?

  2. Attack evasion:

    • Can sophisticated attackers evade anomaly detection?

    • How slow must a gradual attack be to evade rate limits?

    • Can attackers learn the baseline and stay within it?

  3. Operational impact:

    • Does anomaly detection slow operations?

    • How to handle detection during emergencies?

    • When to disable detection for maintenance?

  4. Detection vs Prevention:

    • When is detection more important than prevention?

    • Can all attacks be prevented?

    • How fast must detection be to be useful?

Challenge success criteria

  • You can establish statistical baselines

  • You can set range limits for safety-critical parameters

  • You can configure rate-of-change detection

  • You can detect sudden overspeed attacks

  • You understand the challenge of detecting gradual attacks

  • You can tune detection sensitivity

  • You understand defence in depth with other challenges

Next steps

Combine with Other Challenges:

Advanced integration:

  • Connect anomaly detection to IDS/SIEM

  • Create automated response playbooks

  • Implement machine learning detectors

  • Deploy behaviour analytics for insiders

References

Standards:

  • ISA 62443-3-3: Security technologies for IACS

  • NIST SP 800-82: Guide to Industrial Control Systems (ICS) Security

  • IEC 62443-3-3: System security requirements and security levels

Tools:

  • Blue Team CLI: python tools/blue_team.py anomaly --help

  • Demo Script: python examples/anomaly_detection_demo.py

  • AnomalyDetector API: components/security/anomaly_detector.py

Further reading:

  • Statistical Process Control (SPC) for manufacturing

  • CUSUM (Cumulative Sum) control charts

  • Behaviour-based intrusion detection

  • Time-series anomaly detection