Challenge 4: Anomaly detection deployment¶
Objective: Deploy behavioural anomaly detection to identify abnormal turbine behaviour that looks normal at the protocol level.
Category: Detection & Threat Hunting
Difficulty: Intermediate
Time Required: 40-50 minutes
Learning outcomes¶
By completing this challenge, you will:
Establish statistical baselines for normal system behaviour
Configure range limits for safety-critical parameters
Set rate-of-change limits to detect sudden attacks
Detect attacks that bypass protocol-level controls
Tune detection sensitivity to balance false positives vs false negatives
Understand when detection is more important than prevention
Background: Why behavioural detection?¶
The Problem: Attacks look like normal operations at the protocol level.
A Modbus write is just a Modbus write. You can’t tell if speed_setpoint = 1850 is:
Legitimate operation by authorised engineer
Attack by compromised credentials
Malware manipulating process values
Solution: Behavioural anomaly detection looks at what is being written, not just who is writing it.
Detection Methods¶
Statistical Baselines:
Learn normal behaviour over time (mean, standard deviation)
Detect values that deviate significantly (e.g., 3 sigma from mean)
Good for: Detecting unusual but valid values
Range Limits:
Hard min/max values for safety-critical parameters
Any value outside range is anomalous
Good for: Enforcing physical safety limits
Rate-of-Change:
Maximum allowed rate of change per second
Detects sudden jumps or rapid increases
Good for: Detecting abrupt attacks or sensor failures
Current state (no detection)¶
Before hardening, the simulation has:
✅ Turbine operating at normal speed (1500 RPM)
❌ No anomaly detection: Abnormal behaviour undetected
❌ No baseline learning
❌ No statistical analysis
Result: Gradual overspeed attacks go unnoticed until physical damage occurs.
Part 1: Configuration changes (require restart)¶
Configuration changes establish persistent detection baselines.
Step 1.1: Enable anomaly detection¶
Edit config/anomaly_detection.yml:
# Enable detection globally
enabled: true
# Detection thresholds
sigma_threshold: 3.0 # 3 standard deviations (99.7% of values)
learning_window: 1000 # Samples needed to establish baseline
Sigma Threshold Trade-offs:
2.0= 95% coverage (more sensitive, more false positives)3.0= 99.7% coverage (balanced - recommended)4.0= 99.99% coverage (less sensitive, may miss attacks)
Step 1.2: Configure baselines¶
Add device parameters to monitor:
baselines:
# Turbine speed monitoring
- device: turbine_plc_1
parameter: speed
learning_window: 1000
enabled: true
# Turbine temperature
- device: turbine_plc_1
parameter: temperature
learning_window: 1000
enabled: true
# Reactor core temperature
- device: reactor_plc_1
parameter: core_temperature
learning_window: 1000
enabled: true
Step 1.3: Set range limits¶
Define hard safety limits:
range_limits:
# Turbine speed: 800-1800 RPM
- device: turbine_plc_1
parameter: speed
min_value: 800.0
max_value: 1800.0
severity: high # Overspeed is critical safety issue
# Reactor temperature: 250-350°C
- device: reactor_plc_1
parameter: core_temperature
min_value: 250.0
max_value: 350.0
severity: critical
Step 1.4: Set rate-of-change limits¶
Prevent sudden attacks:
rate_limits:
# Turbine speed: max 10 RPM/second
- device: turbine_plc_1
parameter: speed
max_rate: 10.0
severity: high
# Reactor temperature: max 5°C/second
- device: reactor_plc_1
parameter: core_temperature
max_rate: 5.0
severity: high
Step 1.5: Restart simulation¶
# Restart to apply config changes
python tools/simulator_manager.py
Configuration changes are now persistent and active on every startup.
Part 2: Runtime operations (immediate, temporary)¶
Runtime changes take effect immediately but are lost on restart.
Step 2.1: Enable detection (runtime)¶
python tools/blue_team.py anomaly enable
Step 2.2: Add Baseline Monitoring (runtime)¶
# Monitor turbine speed
python tools/blue_team.py anomaly add-baseline \
--device turbine_plc_1 \
--parameter speed \
--learning-window 1000
# Monitor turbine temperature
python tools/blue_team.py anomaly add-baseline \
--device turbine_plc_1 \
--parameter temperature \
--learning-window 1000
Learning Period:
The detector needs learning_window samples before statistical detection activates.
During learning: Only range/rate limits active
After learning: Statistical anomaly detection also active
Step 2.3: Set range limits (runtime)¶
# Turbine speed limits
python tools/blue_team.py anomaly set-range \
--device turbine_plc_1 \
--parameter speed \
--min 800 \
--max 1800
# Reactor temperature limits
python tools/blue_team.py anomaly set-range \
--device reactor_plc_1 \
--parameter core_temperature \
--min 250 \
--max 350
Step 2.4: Set rate limits (runtime)¶
# Turbine speed rate limit
python tools/blue_team.py anomaly set-rate \
--device turbine_plc_1 \
--parameter speed \
--max-rate 10.0
# Reactor temperature rate limit
python tools/blue_team.py anomaly set-rate \
--device reactor_plc_1 \
--parameter core_temperature \
--max-rate 5.0
Part 3: Testing detection capabilities¶
Step 3.1: Run the demonstration¶
python examples/anomaly_detection_demo.py
This demonstrates:
Baseline establishment (learning normal behaviour)
Normal operations (no anomalies)
Gradual attack (rate limit violation)
Sudden attack (range limit violation)
Watch the Output:
Which attacks are detected?
Which detection method caught each attack?
What’s the deviation magnitude?
Step 3.2: Test overspeed attack detection¶
# Terminal 1: Start simulation
python simulation.py
# Wait for baseline to establish (1000 samples)
# Monitor: python tools/blue_team.py anomaly stats
# Terminal 2: Run attack
python scripts/exploitation/turbine_overspeed_attack.py --target-speed 1850
Check detection:
# View detected anomalies
python tools/blue_team.py anomaly list
# Check audit logs
python tools/blue_team.py audit search "anomaly|ANOMALY"
# View statistics
python tools/blue_team.py anomaly stats
Questions:
Was the attack detected?
Which detection method triggered?
At what speed did detection occur?
How quickly was it detected?
Step 3.3: Test gradual attack¶
# Gradual attack (slow increase to avoid rate limit)
python scripts/exploitation/turbine_overspeed_attack.py \
--target-speed 1850 \
--step-size 2 \
--delay 1.0
Questions:
Does gradual attack evade rate-of-change detection?
Does it eventually trigger range limit?
Does statistical baseline detect it?
How long until detection?
Part 4: Tuning detection sensitivity¶
Step 4.1: Understanding false positives¶
Run normal operations and count anomalies:
# Run simulation for 1 hour (simulation time)
# Let system operate normally
# Check anomaly count
python tools/blue_team.py anomaly stats
Calculate False Positive Rate:
False Positive Rate = (Anomalies / Total Samples) * 100
Acceptable rates:
< 1%: Excellent (operators won’t ignore alerts)
1-5%: Good (manageable)
5-10%: Marginal (alarm fatigue risk)
10%: Poor (operators will ignore alerts)
Step 4.2: Adjusting sigma threshold¶
Edit config/anomaly_detection.yml:
# More sensitive (more false positives)
sigma_threshold: 2.0 # Detects 95% deviations
# Less sensitive (fewer false positives, may miss attacks)
sigma_threshold: 4.0 # Detects 99.99% deviations
# Balanced (recommended)
sigma_threshold: 3.0 # Detects 99.7% deviations
Test impact:
Set threshold to 2.0, run normal operations, count anomalies
Set threshold to 4.0, run attack, check if detected
Find optimal balance for your environment
Step 4.3: Adjusting learning window¶
# Faster baseline, less stable
learning_window: 500
# Slower baseline, more stable
learning_window: 2000
# Balanced
learning_window: 1000
Trade-offs:
Large window: Stable baseline, but slow to adapt to operational changes
Small window: Fast adaptation, but noisy and less reliable
Step 4.4: Adjusting rate limits¶
# Strict (may catch legitimate fast changes)
rate_limits:
- device: turbine_plc_1
parameter: speed
max_rate: 5.0 # Very slow changes only
# Lenient (may miss gradual attacks)
rate_limits:
- device: turbine_plc_1
parameter: speed
max_rate: 20.0 # Allows faster changes
# Balanced
rate_limits:
- device: turbine_plc_1
parameter: speed
max_rate: 10.0 # Moderate rate
Finding the Right Rate:
Observe normal operations during setpoint changes
Measure actual rate of change during normal ops
Set limit slightly above normal maximum
Test with attacks to ensure detection
Part 5: Attack detection scenarios¶
Scenario 1: Sudden overspeed attack¶
Attack: Attacker suddenly sets turbine to 1900 RPM.
Detection:
python tools/blue_team.py anomaly list --limit 10
Expected Results:
✓ Range limit violation (1900 > 1800 max)
✓ Rate-of-change violation (sudden jump)
✓ Statistical anomaly (far from baseline mean)
Severity: HIGH or CRITICAL
Response:
Investigate source of command
Check authentication logs
Verify if authorized operation
Initiate emergency shutdown if needed
Scenario 2: Gradual attack (boiling frog)¶
Attack: Slowly increase speed 2 RPM every 10 seconds.
Detection Challenge:
✗ Rate limit not violated (2 RPM/10sec = 0.2 RPM/sec < 10 limit)
? Statistical baseline may detect (if increase continues)
✓ Range limit eventually violated (at 1800 RPM)
Detection Timeline:
t=0: Speed 1500 RPM (normal)
t=1500s: Speed 1800 RPM (range limit reached)
Result: Attack detected after 25 minutes
Mitigation:
Tighter rate limits (catches gradual changes)
Alarm on sustained one-direction trends
Require authorisation for setpoint changes
Scenario 3: Reconnaissance (parameter reading)¶
Attack: Attacker reads all parameters to map system.
Detection Challenge:
✗ Read operations don’t affect process values
✗ Anomaly detection doesn’t see reads
Need: IDS/IPS or protocol analysis
Lesson: Anomaly detection complements but doesn’t replace other controls.
Scenario 4: Sensor manipulation¶
Attack: Attacker falsifies sensor readings instead of control outputs.
Detection:
Statistical anomaly if sensor value deviates
But actual process may be unaffected
Physical sensors may show different values
Response:
Compare sensor readings to expected physics
Cross-check multiple sensors
Verify sensor calibration
Part 6: Operational challenges¶
1: Startup anomalies¶
Problem: System startup creates anomalies (not attacks).
Solution:
# Disable detection during startup
# Or accept high false positive rate
# Or have separate startup baseline
Best Practice:
Suppress anomalies during known maintenance windows
Log as INFO instead of WARNING during startup
Resume normal detection after stabilisation period
2: Mode changes¶
Problem: Operating modes have different normal ranges.
Example:
Startup mode: 0-1000 RPM (ramping up)
Normal mode: 1400-1600 RPM (steady state)
Peak demand: 1600-1800 RPM (high output)
Solution:
Mode-aware baselines
Switch detection parameters based on current mode
Or wider limits that cover all modes (less sensitive)
3: Seasonal variations¶
Problem: Load patterns change with seasons (winter vs summer demand).
Solution:
Periodic baseline retraining
Seasonal adjustment factors
Longer learning windows to capture variations
4: Maintenance operations¶
Problem: Legitimate maintenance violates baselines.
Solution:
Maintenance mode flag (disables/relaxes detection)
Require authorisation + justification
Enhanced logging during maintenance
Resume normal detection after maintenance
Part 7: Combining detection methods¶
Defence in depth: All detection layers¶
Challenge 4 (Anomaly Detection) works with other challenges:
Layer 1: Protocol Filtering (Challenge 5)
Blocks dangerous function codes
Prevents batch writes (FC 15/16)
Blocks diagnostics (FC 08)
Layer 2: RBAC (Challenge 2)
Verifies user permissions
Blocks unauthorised writes
Enforces role separation
Layer 3: Anomaly Detection (Challenge 4) ← You are here
Detects abnormal values (statistical)
Enforces safety limits (range)
Detects rapid changes (rate)
Layer 4: Audit Logging (Challenge 3)
Records all operations
Enables forensics
Detects patterns over time
Attack detection matrix¶
Attack Type |
Protocol |
RBAC |
Anomaly |
Audit |
Result |
|---|---|---|---|---|---|
External Overspeed |
✓ |
✗ |
✓ |
✓ |
Blocked |
Insider Overspeed |
✗ |
? |
✓ |
✓ |
Detected |
Gradual Attack |
✗ |
? |
? |
✓ |
Delayed Det |
Authorised Abuse |
✗ |
✗ |
✓ |
✓ |
Detected |
Reconnaissance |
✗ |
? |
✗ |
✓ |
Logged |
Sensor Manipulation |
✗ |
? |
✓ |
✓ |
Detected |
Key:
✓ = Detected/Blocked
✗ = Not detected
? = Depends on permissions
Lesson: No single layer catches everything. Defence in depth essential.
Part 8: Advanced topics¶
Time-series forecasting¶
Predict expected values based on historical patterns:
# Predict next value based on trend
expected = detector.forecast_next_value(
device="turbine_plc_1",
parameter="speed",
window=100,
)
# Compare actual vs predicted
if abs(actual - expected) > threshold:
# Anomaly: Value doesn't match trend
Pattern recognition¶
Detect specific attack patterns:
# Detect sawtooth pattern (repeated increases then sudden drops)
# Typical of attacker testing limits
# Detect sustained one-direction trend
# Gradual attack or sensor drift
# Detect oscillation increase
# System instability or control loop attack
Correlation across systems¶
Detect coordinated attacks:
# If turbine speed anomalous AND reactor power anomalous
# => Coordinated attack on multiple systems
# If multiple devices show anomalies at same time
# => Potential widespread attack or infrastructure failure
Machine learning integration¶
Beyond statistical methods:
Neural networks for pattern recognition
Clustering for operational mode detection
Reinforcement learning for adaptive thresholds
Ensemble methods combining multiple detectors
Learning reflection¶
What?¶
Detection Methods:
Statistical baselines catch deviations from normal
Range limits enforce physical safety bounds
Rate limits detect sudden or rapid changes
Each method has strengths and weaknesses
Tuning Trade-offs:
Sensitivity vs false positives
Learning speed vs baseline stability
Coverage vs alarm fatigue
No perfect threshold exists
Operational Realities:
Startups and shutdowns create anomalies
Maintenance operations violate baselines
Mode changes complicate detection
Context matters (not just the value)
Defence in Depth:
Anomaly detection complements other controls
Catches attacks that bypass protocol filtering
Detects insider threats with valid credentials
Enables detection when prevention fails
Discussion¶
False Positives:
How many false alarms are acceptable?
What happens when operators ignore alerts?
How to reduce false positives without missing attacks?
Attack evasion:
Can sophisticated attackers evade anomaly detection?
How slow must a gradual attack be to evade rate limits?
Can attackers learn the baseline and stay within it?
Operational impact:
Does anomaly detection slow operations?
How to handle detection during emergencies?
When to disable detection for maintenance?
Detection vs Prevention:
When is detection more important than prevention?
Can all attacks be prevented?
How fast must detection be to be useful?
Challenge success criteria¶
You can establish statistical baselines
You can set range limits for safety-critical parameters
You can configure rate-of-change detection
You can detect sudden overspeed attacks
You understand the challenge of detecting gradual attacks
You can tune detection sensitivity
You understand defence in depth with other challenges
Next steps¶
Combine with Other Challenges:
Challenge 2 (RBAC): Block unauthorised operations before anomalies occur
Challenge 3 (Logging): Analyse anomaly patterns over time
Challenge 5 (Protocol Filtering): Block dangerous operations at protocol level
Advanced integration:
Connect anomaly detection to IDS/SIEM
Create automated response playbooks
Implement machine learning detectors
Deploy behaviour analytics for insiders
References¶
Standards:
ISA 62443-3-3: Security technologies for IACS
NIST SP 800-82: Guide to Industrial Control Systems (ICS) Security
IEC 62443-3-3: System security requirements and security levels
Tools:
Blue Team CLI:
python tools/blue_team.py anomaly --helpDemo Script:
python examples/anomaly_detection_demo.pyAnomalyDetector API:
components/security/anomaly_detector.py
Further reading:
Statistical Process Control (SPC) for manufacturing
CUSUM (Cumulative Sum) control charts
Behaviour-based intrusion detection
Time-series anomaly detection