Challenge 4: Anomaly detection deployment¶

Objective: Deploy behavioural anomaly detection to identify abnormal turbine behaviour that looks normal at the protocol level.

Category: Detection & Threat Hunting

Difficulty: Intermediate

Time Required: 40-50 minutes

Learning outcomes¶

By completing this challenge, you will:

Establish statistical baselines for normal system behaviour
Configure range limits for safety-critical parameters
Set rate-of-change limits to detect sudden attacks
Detect attacks that bypass protocol-level controls
Tune detection sensitivity to balance false positives vs false negatives
Understand when detection is more important than prevention

Background: Why behavioural detection?¶

The Problem: Attacks look like normal operations at the protocol level.

A Modbus write is just a Modbus write. You can’t tell if speed_setpoint = 1850 is:

Legitimate operation by authorised engineer
Attack by compromised credentials
Malware manipulating process values

Solution: Behavioural anomaly detection looks at what is being written, not just who is writing it.

Detection Methods¶

Statistical Baselines:

Learn normal behaviour over time (mean, standard deviation)
Detect values that deviate significantly (e.g., 3 sigma from mean)
Good for: Detecting unusual but valid values

Range Limits:

Hard min/max values for safety-critical parameters
Any value outside range is anomalous
Good for: Enforcing physical safety limits

Rate-of-Change:

Maximum allowed rate of change per second
Detects sudden jumps or rapid increases
Good for: Detecting abrupt attacks or sensor failures

Current state (no detection)¶

Before hardening, the simulation has:

✅ Turbine operating at normal speed (1500 RPM)
❌ No anomaly detection: Abnormal behaviour undetected
❌ No baseline learning
❌ No statistical analysis

Result: Gradual overspeed attacks go unnoticed until physical damage occurs.

Part 1: Configuration changes (require restart)¶

Configuration changes establish persistent detection baselines.

Step 1.1: Enable anomaly detection¶

Edit config/anomaly_detection.yml:

# Enable detection globally
enabled: true

# Detection thresholds
sigma_threshold: 3.0  # 3 standard deviations (99.7% of values)
learning_window: 1000  # Samples needed to establish baseline

Sigma Threshold Trade-offs:

2.0 = 95% coverage (more sensitive, more false positives)
3.0 = 99.7% coverage (balanced - recommended)
4.0 = 99.99% coverage (less sensitive, may miss attacks)

Step 1.2: Configure baselines¶

Add device parameters to monitor:

baselines:
  # Turbine speed monitoring
  - device: turbine_plc_1
    parameter: speed
    learning_window: 1000
    enabled: true

  # Turbine temperature
  - device: turbine_plc_1
    parameter: temperature
    learning_window: 1000
    enabled: true

  # Reactor core temperature
  - device: reactor_plc_1
    parameter: core_temperature
    learning_window: 1000
    enabled: true

Step 1.3: Set range limits¶

Define hard safety limits:

range_limits:
  # Turbine speed: 800-1800 RPM
  - device: turbine_plc_1
    parameter: speed
    min_value: 800.0
    max_value: 1800.0
    severity: high  # Overspeed is critical safety issue

  # Reactor temperature: 250-350°C
  - device: reactor_plc_1
    parameter: core_temperature
    min_value: 250.0
    max_value: 350.0
    severity: critical

Step 1.4: Set rate-of-change limits¶

Prevent sudden attacks:

rate_limits:
  # Turbine speed: max 10 RPM/second
  - device: turbine_plc_1
    parameter: speed
    max_rate: 10.0
    severity: high

  # Reactor temperature: max 5°C/second
  - device: reactor_plc_1
    parameter: core_temperature
    max_rate: 5.0
    severity: high

Step 1.5: Restart simulation¶

# Restart to apply config changes
python tools/simulator_manager.py

Configuration changes are now persistent and active on every startup.

Part 2: Runtime operations (immediate, temporary)¶

Runtime changes take effect immediately but are lost on restart.

Step 2.1: Enable detection (runtime)¶

python tools/blue_team.py anomaly enable

Step 2.2: Add Baseline Monitoring (runtime)¶

# Monitor turbine speed
python tools/blue_team.py anomaly add-baseline \
  --device turbine_plc_1 \
  --parameter speed \
  --learning-window 1000

# Monitor turbine temperature
python tools/blue_team.py anomaly add-baseline \
  --device turbine_plc_1 \
  --parameter temperature \
  --learning-window 1000

Learning Period: The detector needs learning_window samples before statistical detection activates.

During learning: Only range/rate limits active
After learning: Statistical anomaly detection also active

Step 2.3: Set range limits (runtime)¶

# Turbine speed limits
python tools/blue_team.py anomaly set-range \
  --device turbine_plc_1 \
  --parameter speed \
  --min 800 \
  --max 1800

# Reactor temperature limits
python tools/blue_team.py anomaly set-range \
  --device reactor_plc_1 \
  --parameter core_temperature \
  --min 250 \
  --max 350

Step 2.4: Set rate limits (runtime)¶

# Turbine speed rate limit
python tools/blue_team.py anomaly set-rate \
  --device turbine_plc_1 \
  --parameter speed \
  --max-rate 10.0

# Reactor temperature rate limit
python tools/blue_team.py anomaly set-rate \
  --device reactor_plc_1 \
  --parameter core_temperature \
  --max-rate 5.0

Part 3: Testing detection capabilities¶

Step 3.1: Run the demonstration¶

python examples/anomaly_detection_demo.py

This demonstrates:

Baseline establishment (learning normal behaviour)
Normal operations (no anomalies)
Gradual attack (rate limit violation)
Sudden attack (range limit violation)

Watch the Output:

Which attacks are detected?
Which detection method caught each attack?
What’s the deviation magnitude?

Step 3.2: Test overspeed attack detection¶

# Terminal 1: Start simulation
python simulation.py

# Wait for baseline to establish (1000 samples)
# Monitor: python tools/blue_team.py anomaly stats

# Terminal 2: Run attack
python scripts/exploitation/turbine_overspeed_attack.py --target-speed 1850

Check detection:

# View detected anomalies
python tools/blue_team.py anomaly list

# Check audit logs
python tools/blue_team.py audit search "anomaly|ANOMALY"

# View statistics
python tools/blue_team.py anomaly stats

Questions:

Was the attack detected?
Which detection method triggered?
At what speed did detection occur?
How quickly was it detected?

Step 3.3: Test gradual attack¶

# Gradual attack (slow increase to avoid rate limit)
python scripts/exploitation/turbine_overspeed_attack.py \
  --target-speed 1850 \
  --step-size 2 \
  --delay 1.0

Questions:

Does gradual attack evade rate-of-change detection?
Does it eventually trigger range limit?
Does statistical baseline detect it?
How long until detection?

Part 4: Tuning detection sensitivity¶

Step 4.1: Understanding false positives¶

Run normal operations and count anomalies:

# Run simulation for 1 hour (simulation time)
# Let system operate normally

# Check anomaly count
python tools/blue_team.py anomaly stats

Calculate False Positive Rate:

False Positive Rate = (Anomalies / Total Samples) * 100

Acceptable rates:

< 1%: Excellent (operators won’t ignore alerts)
1-5%: Good (manageable)
5-10%: Marginal (alarm fatigue risk)
10%: Poor (operators will ignore alerts)

Step 4.2: Adjusting sigma threshold¶

Edit config/anomaly_detection.yml:

# More sensitive (more false positives)
sigma_threshold: 2.0  # Detects 95% deviations

# Less sensitive (fewer false positives, may miss attacks)
sigma_threshold: 4.0  # Detects 99.99% deviations

# Balanced (recommended)
sigma_threshold: 3.0  # Detects 99.7% deviations

Test impact:

Set threshold to 2.0, run normal operations, count anomalies
Set threshold to 4.0, run attack, check if detected
Find optimal balance for your environment

Step 4.3: Adjusting learning window¶

# Faster baseline, less stable
learning_window: 500

# Slower baseline, more stable
learning_window: 2000

# Balanced
learning_window: 1000

Trade-offs:

Large window: Stable baseline, but slow to adapt to operational changes
Small window: Fast adaptation, but noisy and less reliable

Step 4.4: Adjusting rate limits¶

# Strict (may catch legitimate fast changes)
rate_limits:
  - device: turbine_plc_1
    parameter: speed
    max_rate: 5.0  # Very slow changes only

# Lenient (may miss gradual attacks)
rate_limits:
  - device: turbine_plc_1
    parameter: speed
    max_rate: 20.0  # Allows faster changes

# Balanced
rate_limits:
  - device: turbine_plc_1
    parameter: speed
    max_rate: 10.0  # Moderate rate

Finding the Right Rate:

Observe normal operations during setpoint changes
Measure actual rate of change during normal ops
Set limit slightly above normal maximum
Test with attacks to ensure detection

Part 5: Attack detection scenarios¶

Scenario 1: Sudden overspeed attack¶

Attack: Attacker suddenly sets turbine to 1900 RPM.

Detection:

python tools/blue_team.py anomaly list --limit 10

Expected Results:

✓ Range limit violation (1900 > 1800 max)
✓ Rate-of-change violation (sudden jump)
✓ Statistical anomaly (far from baseline mean)
Severity: HIGH or CRITICAL

Response:

Investigate source of command
Check authentication logs
Verify if authorized operation
Initiate emergency shutdown if needed

Scenario 2: Gradual attack (boiling frog)¶

Attack: Slowly increase speed 2 RPM every 10 seconds.

Detection Challenge:

✗ Rate limit not violated (2 RPM/10sec = 0.2 RPM/sec < 10 limit)
? Statistical baseline may detect (if increase continues)
✓ Range limit eventually violated (at 1800 RPM)

Detection Timeline:

t=0: Speed 1500 RPM (normal)
t=1500s: Speed 1800 RPM (range limit reached)
Result: Attack detected after 25 minutes

Mitigation:

Tighter rate limits (catches gradual changes)
Alarm on sustained one-direction trends
Require authorisation for setpoint changes

Scenario 3: Reconnaissance (parameter reading)¶

Attack: Attacker reads all parameters to map system.

Detection Challenge:

✗ Read operations don’t affect process values
✗ Anomaly detection doesn’t see reads
Need: IDS/IPS or protocol analysis

Lesson: Anomaly detection complements but doesn’t replace other controls.

Scenario 4: Sensor manipulation¶

Attack: Attacker falsifies sensor readings instead of control outputs.

Detection:

Statistical anomaly if sensor value deviates
But actual process may be unaffected
Physical sensors may show different values

Response:

Compare sensor readings to expected physics
Cross-check multiple sensors
Verify sensor calibration

Part 6: Operational challenges¶

1: Startup anomalies¶

Problem: System startup creates anomalies (not attacks).

Solution:

# Disable detection during startup
# Or accept high false positive rate
# Or have separate startup baseline

Best Practice:

Suppress anomalies during known maintenance windows
Log as INFO instead of WARNING during startup
Resume normal detection after stabilisation period

2: Mode changes¶

Problem: Operating modes have different normal ranges.

Example:

Startup mode: 0-1000 RPM (ramping up)
Normal mode: 1400-1600 RPM (steady state)
Peak demand: 1600-1800 RPM (high output)

Solution:

Mode-aware baselines
Switch detection parameters based on current mode
Or wider limits that cover all modes (less sensitive)

3: Seasonal variations¶

Problem: Load patterns change with seasons (winter vs summer demand).

Solution:

Periodic baseline retraining
Seasonal adjustment factors
Longer learning windows to capture variations

4: Maintenance operations¶

Problem: Legitimate maintenance violates baselines.

Solution:

Maintenance mode flag (disables/relaxes detection)
Require authorisation + justification
Enhanced logging during maintenance
Resume normal detection after maintenance

Part 7: Combining detection methods¶

Defence in depth: All detection layers¶

Challenge 4 (Anomaly Detection) works with other challenges:

Layer 1: Protocol Filtering (Challenge 5)

Blocks dangerous function codes
Prevents batch writes (FC 15/16)
Blocks diagnostics (FC 08)

Layer 2: RBAC (Challenge 2)

Verifies user permissions
Blocks unauthorised writes
Enforces role separation

Layer 3: Anomaly Detection (Challenge 4) ← You are here

Detects abnormal values (statistical)
Enforces safety limits (range)
Detects rapid changes (rate)

Layer 4: Audit Logging (Challenge 3)

Records all operations
Enables forensics
Detects patterns over time

Attack detection matrix¶

Attack Type	Protocol	RBAC	Anomaly	Audit	Result
External Overspeed	✓	✗	✓	✓	Blocked
Insider Overspeed	✗	?	✓	✓	Detected
Gradual Attack	✗	?	?	✓	Delayed Det
Authorised Abuse	✗	✗	✓	✓	Detected
Reconnaissance	✗	?	✗	✓	Logged
Sensor Manipulation	✗	?	✓	✓	Detected

Key:

✓ = Detected/Blocked
✗ = Not detected
? = Depends on permissions

Lesson: No single layer catches everything. Defence in depth essential.

Part 8: Advanced topics¶

Time-series forecasting¶

Predict expected values based on historical patterns:

# Predict next value based on trend
expected = detector.forecast_next_value(
    device="turbine_plc_1",
    parameter="speed",
    window=100,
)

# Compare actual vs predicted
if abs(actual - expected) > threshold:
    # Anomaly: Value doesn't match trend

Pattern recognition¶

Detect specific attack patterns:

# Detect sawtooth pattern (repeated increases then sudden drops)
# Typical of attacker testing limits

# Detect sustained one-direction trend
# Gradual attack or sensor drift

# Detect oscillation increase
# System instability or control loop attack

Correlation across systems¶

Detect coordinated attacks:

# If turbine speed anomalous AND reactor power anomalous
# => Coordinated attack on multiple systems

# If multiple devices show anomalies at same time
# => Potential widespread attack or infrastructure failure

Machine learning integration¶

Beyond statistical methods:

Neural networks for pattern recognition
Clustering for operational mode detection
Reinforcement learning for adaptive thresholds
Ensemble methods combining multiple detectors

Learning reflection¶

What?¶

Detection Methods:
- Statistical baselines catch deviations from normal
- Range limits enforce physical safety bounds
- Rate limits detect sudden or rapid changes
- Each method has strengths and weaknesses
Tuning Trade-offs:
- Sensitivity vs false positives
- Learning speed vs baseline stability
- Coverage vs alarm fatigue
- No perfect threshold exists
Operational Realities:
- Startups and shutdowns create anomalies
- Maintenance operations violate baselines
- Mode changes complicate detection
- Context matters (not just the value)
Defence in Depth:
- Anomaly detection complements other controls
- Catches attacks that bypass protocol filtering
- Detects insider threats with valid credentials
- Enables detection when prevention fails

Discussion¶

False Positives:
- How many false alarms are acceptable?
- What happens when operators ignore alerts?
- How to reduce false positives without missing attacks?
Attack evasion:
- Can sophisticated attackers evade anomaly detection?
- How slow must a gradual attack be to evade rate limits?
- Can attackers learn the baseline and stay within it?
Operational impact:
- Does anomaly detection slow operations?
- How to handle detection during emergencies?
- When to disable detection for maintenance?
Detection vs Prevention:
- When is detection more important than prevention?
- Can all attacks be prevented?
- How fast must detection be to be useful?

Challenge success criteria¶

You can establish statistical baselines
You can set range limits for safety-critical parameters
You can configure rate-of-change detection
You can detect sudden overspeed attacks
You understand the challenge of detecting gradual attacks
You can tune detection sensitivity
You understand defence in depth with other challenges

Next steps¶

Combine with Other Challenges:

Challenge 2 (RBAC): Block unauthorised operations before anomalies occur
Challenge 3 (Logging): Analyse anomaly patterns over time
Challenge 5 (Protocol Filtering): Block dangerous operations at protocol level

Advanced integration:

Connect anomaly detection to IDS/SIEM
Create automated response playbooks
Implement machine learning detectors
Deploy behaviour analytics for insiders

References¶

Standards:

ISA 62443-3-3: Security technologies for IACS
NIST SP 800-82: Guide to Industrial Control Systems (ICS) Security
IEC 62443-3-3: System security requirements and security levels

Tools:

Blue Team CLI: python tools/blue_team.py anomaly --help
Demo Script: python examples/anomaly_detection_demo.py
AnomalyDetector API: components/security/anomaly_detector.py