Detection capability testing: Finding out if anyone’s watching¶

Or: How Ponder tested whether security monitoring actually worked

The appearance of security¶

The City Watch of Ankh-Morpork has, over the years, developed from an organisation primarily concerned with preventing crime into one primarily concerned with creating the appearance of preventing crime. The distinction is subtle but important. The Watch patrols the streets, maintains a visible presence, and responds to incidents with impressive speed. Whether they actually prevent crime is another matter entirely, but they certainly make everyone feel safer, which is worth something.

Many OT security monitoring systems work on similar principles, Ponder discovered. There are intrusion detection systems faithfully logging every packet. There are SIEMs (Security Information and Event Management systems) collecting logs from dozens of sources. There are alarms configured to trigger on suspicious activity. The question is whether anyone is actually watching these systems, whether anyone responds when they alert, and whether the alerts are tuned well enough to represent actual problems rather than operational noise.

Detection capability testing is the process of determining whether security monitoring actually works. Not whether it could work in theory, or whether it worked when first commissioned, but whether it works right now, today, in the actual operational environment with actual people responsible for actually responding to actual alerts.

This is distinctly different from penetration testing. In penetration testing, you’re trying to compromise systems whilst avoiding detection. In detection capability testing, you’re trying to be detected, specifically to verify that detection works. It’s like testing a burglar alarm by actually breaking in and seeing if anyone shows up.

The simulator’s detection testing scripts¶

The UU P&L simulator includes scripts specifically designed to test detection capabilities:

IDS detection testing¶

ids_detection_test.py

This script generates various types of network traffic to test whether intrusion detection systems are functioning:

What it tests:

Protocol scanning detection (rapid connection attempts)
Function code anomalies (unusual Modbus commands)
High-frequency polling (abnormal traffic rates)
Out-of-specification protocol behaviour

Expected detections:

IDS alerts on port scanning
Alerts on unusual protocol patterns
Rate-based anomaly detection
Protocol validation failures

The script saves results to reports/ids_test_<timestamp>.json, documenting which attack patterns were detected and which weren’t.

SIEM correlation testing¶

siem_correlation_test.py

This script generates correlated events across multiple systems to test whether SIEM systems can identify multi-stage attacks:

What it tests:

Sequential authentication attempts (reconnaissance followed by access)
Multi-protocol attacks (Modbus scan followed by S7 connection)
Time-based correlation (events that should be suspicious when seen together)
Cross-device patterns (accessing multiple PLCs in sequence)

Expected correlations:

SIEM rules detecting attack sequences
Anomaly detection identifying unusual patterns
Alert escalation for related events
Incident creation for confirmed attacks

The goal is to determine whether security monitoring can connect the dots between individual events to identify actual attack campaigns.

Anomaly bypass testing¶

anomaly_bypass_test.py

This script tests whether attacks can evade anomaly-based detection by staying within normal operational parameters:

What it tests:

Slow reconnaissance (spread over hours/days)
Commands that mimic normal operations
Gradual parameter changes (avoiding rate-of-change detection)
Timing attacks during expected maintenance windows

Detection challenges:

Differentiating malicious from legitimate slow changes
Identifying attacks during high-activity periods
Detecting patient, methodical reconnaissance

This demonstrates that sophisticated attackers can evade many detection systems by simply being patient.

Logging gap identification¶

logging_gap_test.py

This script identifies what events are not being logged or monitored:

What it discovers:

Protocols without logging (which industrial protocols generate no logs?)
Commands without audit trails (which operations leave no evidence?)
Time periods without monitoring (when is nobody watching?)
Systems excluded from SIEM (what devices aren’t monitored?)

Impact of gaps:

Attacks through unmonitored channels go undetected
Forensic analysis has blind spots
Compliance requirements may not be met

This is often the most valuable detection test: showing what isn’t being watched.

What Ponder’s testing revealed¶

Testing the simulator’s detection capabilities revealed several uncomfortable patterns:

Detection exists but nobody’s watching¶

Many facilities have excellent detection technology that nobody monitors. The IDS generates alerts. The SIEM correlates events. The logs are comprehensive. But the alerts go to an unread mailbox, the SIEM dashboards are never opened, and the logs are only reviewed after incidents (if then).

The simulator’s detection tests generate events that should trigger alarms. The question isn’t whether the technology works (it usually does), but whether humans respond (they often don’t).

Alert fatigue makes everything invisible¶

Detection systems often generate thousands of alerts per day, most of them false positives. After weeks of this, operators stop looking at alerts entirely. A real attack generates the same alert as 47 false positives that day, and all 48 alerts are ignored.

The simulator’s tests help identify this problem by generating both obvious attacks (which should be detected) and normal operations (which shouldn’t). If the detection system can’t distinguish between them, alert fatigue is inevitable.

Slow attacks evade detection¶

Most detection systems look for rapid, obvious attacks. Port scanning, connection flooding, abnormal traffic spikes. These are easy to detect because they’re obviously different from normal operations.

Patient attackers evade these systems by simply moving slowly. The simulator’s anomaly bypass tests demonstrate that reconnaissance spread over days, parameter changes made gradually, and attacks timed to maintenance windows often go completely unnoticed.

Protocol-level detection is rare¶

Most OT monitoring focuses on network-level indicators (ports, IP addresses, connection counts). Very few systems actually understand industrial protocols well enough to detect protocol-level attacks.

The simulator demonstrates this: Modbus commands are logged at the packet level (source IP, destination IP, port 502) but not at the protocol level (which function codes, which registers, which unit IDs). An attacker reading holding register 0 generates the same log entry as an attacker writing holding register 0, because the logging doesn’t distinguish between Modbus function codes 03 (read) and 16 (write).

Testing detection systematically¶

Ponder’s approach to detection testing:

Generate known-bad traffic¶

Start with obvious attacks that any competent detection system should catch:

Port scans
Protocol scans
Authentication failures
Invalid commands

If these aren’t detected, detection capability is minimal.

Generate subtle attacks¶

Progress to sophisticated attacks:

Slow reconnaissance
Legitimate-looking commands
Gradual parameter changes
Attacks during maintenance windows

If these aren’t detected (and they often aren’t), detection requires improvement.

Measure response times¶

Detection without response is useless. For each test:

When was the event generated?
When was the alert created?
When was the alert seen by a human?
When was the response initiated?

If alerts take hours or days to reach humans, attacks have plenty of time to succeed.

Most valuable: identify what isn’t detected at all:

Protocols without monitoring
Commands without audit trails
Time periods without coverage
Systems excluded from SIEM

These blind spots are where sophisticated attackers operate.

Running detection tests safely¶

Detection testing generates suspicious traffic by design. This requires coordination:

Notify operations: Tell them testing is occurring, when, and what traffic to expect.

Document test traffic: Save packet captures and logs of all test traffic for comparison with detection logs.

Verify non-disruption: Ensure test traffic doesn’t affect production operations.

Schedule appropriately: Test during low-activity periods if possible.

Have rollback plans: Know how to stop test traffic if it causes problems.

The simulator makes this safer by allowing detection testing without risking production systems. Test traffic goes to the simulator, not to real PLCs.

The educational value¶

Detection testing teaches several lessons:

Detection technology exists: Most facilities have some monitoring capability.

Human factors matter more: Technology works better than the human processes around it.

Sophistication defeats detection: Patient, methodical attackers evade most monitoring.

Protocol awareness is rare: Network monitoring doesn’t catch protocol-level attacks.

The simulator’s detection tests demonstrate these realities in a safe environment, preparing security teams for improving detection in production environments.

Ponder’s conclusions¶

Ponder’s testing journal concluded:

“Detection capability testing reveals an uncomfortable truth: most OT security monitoring is optimised for compliance, not security.

“The technology exists. The IDS is running. The SIEM is collecting logs. The auditors are satisfied. But nobody’s actually watching, nobody responds to alerts in useful timeframes, and sophisticated attacks sail through undetected because the monitoring is tuned to catch only obvious, clumsy attacks.

“The simulator demonstrates this gap. Generate obvious attacks and watch them get detected (eventually, maybe). Generate patient, methodical attacks and watch them succeed unnoticed.

“This isn’t a criticism of security teams. They’re overwhelmed, under-resourced, and drowning in false positives. But it’s a reality that needs to be acknowledged and addressed.

“Detection without response is just expensive logging.”

Detection capability testing: Finding out if anyone’s watching¶

The appearance of security¶

The simulator’s detection testing scripts¶

IDS detection testing¶

SIEM correlation testing¶

Anomaly bypass testing¶

Logging gap identification¶

What Ponder’s testing revealed¶

Detection exists but nobody’s watching¶

Alert fatigue makes everything invisible¶

Slow attacks evade detection¶

Protocol-level detection is rare¶

Testing detection systematically¶

Generate known-bad traffic¶

Generate subtle attacks¶

Measure response times¶

Document blind spots¶

Running detection tests safely¶

The educational value¶

Ponder’s conclusions¶