Detection capability testing

Finding out if anyone’s watching the watchers.

The City Watch of Ankh-Morpork has, over the years, developed from a organisation primarily concerned with preventing crime into one primarily concerned with creating the appearance of preventing crime. The distinction is subtle but important. The Watch patrols the streets, maintains a visible presence, and responds to incidents with impressive speed. Whether they actually prevent crime is another matter entirely, but they certainly make everyone feel safer, which is worth something.

Many OT security monitoring systems work on similar principles. There are intrusion detection systems faithfully logging every packet. There are SIEMs (Security Information and Event Management systems) collecting logs from dozens of sources. There are alarms configured to trigger on suspicious activity. The question is whether anyone is actually watching these systems, whether anyone responds when they alert, and whether the alerts are tuned well enough to represent actual problems rather than operational noise.

Detection capability testing is the process of determining whether security monitoring actually works. Not whether it could work in theory, or whether it worked when first commissioned, but whether it works right now, today, in the actual operational environment with actual people responsible for actually responding to actual alerts.

This is distinctly different from penetration testing. In penetration testing, we are trying to compromise systems whilst avoiding detection. In detection capability testing, we are trying to be detected, specifically to verify that detection works. It’s like testing a burglar alarm by actually breaking in and seeing if anyone shows up.

At UU P&L, the security architecture diagram showed an impressive defence-in-depth strategy. Network IDS monitored all traffic between security zones. A SIEM correlated events from 47 different sources. Alarms were configured to trigger on suspicious activity and page the security operations centre. The SOC operated 24/7 with trained analysts. On paper, it was impenetrable.

In practice, the IDS had been generating so many false positives that someone had adjusted the thresholds until it stopped alerting on anything. The SIEM was forwarding all alerts to a shared mailbox that accumulated approximately 3,000 emails per day, of which nobody read any. The SOC was primarily focused on physical security (gates and cameras), and the one analyst who’d been trained on the OT security systems had left the company eight months ago. The defence-in-depth strategy had, through entropy and human nature, become defence-in-breadth-but-not-much-depth.

Understanding current monitoring capabilities

Before testing detection, document what monitoring exists:

Layered capability matrix (what exists vs what actually works)

Layer

Control

Present

Maintained

Monitored

Actually useful

Network

IDS (Snort)

Yes

No

No

Marginal

Network

Network tap

Yes

Yes

Offline

Limited

Application

SCADA audit logging

Yes

Yes

No

Untapped

Application

PLC audit logs

No*

N/A

N/A

None

System

Windows event logs

Yes

Yes

Yes

Partial

System

Linux auth logs

Yes

No

No

Weak

SIEM

Splunk

Yes

Unclear

Rarely

Questionable

Signal flow diagram (who sees what, and who does not)

[PLC Network]
     |
     | (Mirror)
     v
[Network Tap]
     |
     v
[Wireshark Server]
     |
     | (Manual, offline analysis)
     | (Used after incidents, not during)
     v
   [Human]


[Corporate Network]
     |
     v
[Snort IDS]
     |
     | Alerts
     v
[security@uupl.edu]
     |
     v
[Shared mailbox]
     |
     x  (Nobody reads it)
     x  (ET Open rules last updated 2021)


[SCADA / HMI]
     |
     | SCADA alarms
     v
[HMI Display]
     |
     v
[Operator console]
     |
     | (Audible alarm)
     | (Acknowledged to make it stop)
     v
[Local SCADA log files]
     |
     x  (No forwarding)
     x  (No analysis)


[PLC Alarms]
     |
     v
[HMI Display]
     |
     v
[Operator console]
     |
     | (Operational response only)
     v
[Logged locally]
     |
     x  (Never analysed)


[Windows Systems]
     |
     | Security events
     v
[SIEM (Splunk)]
     |
     | "Critical" alerts
     v
[SOC]
     |
     x  (Focus on physical security)
     x  (OT context missing)


[SIEM Alerting]
     |
     v
[splunk-alerts@uupl.edu]
     |
     v
[Ticket system]
     |
     x  (Auto-closed after 30 days)

Time-decay table (what rots if nobody touches them)

Control

Last meaningful update

Age

Risk implication

Snort ruleset

2021

3+ years

Blind to modern attacks

IDS deployment

2019

5+ years

Likely misaligned

Wireshark captures

Ongoing

N/A

Forensics only

SCADA logs

Never reviewed

Always

Incidents missed

PLC audit logging

Disabled

Forever

Zero visibility

SIEM dashboards

Unknown

?

Likely stale

SIEM analyst login

2023-08-15

4 months

Tool abandonment

Detection vs response gap analysis (who would notice?)

Scenario

Detected?

By what

Action taken

PLC logic modification

No

None

Unauthorised HMI login

Logged

SCADA

None

Suspicious Modbus traffic

Maybe

Snort

Email only

Lateral movement from IT to OT

Unlikely

Snort

Email only

Brute-force on jump server

Yes

Logs

Not centralised

Malware on Windows HMI

Possibly

SIEM

Nobody looking

Ownership and attention map (the real problem)

Tool ownership and log review:

Component

Nominal owner

Log review happening

Review frequency

Snort IDS

Network team

No

Never

Network tap

OT engineering

Rarely

Ad hoc

SCADA logs

Operations

No

Never

PLC audit logs

OT engineering

Disabled

N/A

Windows logs

IT Security

Sometimes

Unclear

Linux auth logs

Nobody

No

Never

SIEM

“Security”

No

Tool exists

Translation: ownership exists on paper; attention does not.

Alert response responsibility (theoretical vs real):

Alert type

Primary (org chart)

Secondary (org chart)

Actual responder

Network security alerts

IT Security Team

IT Manager (→ CIO)

Nobody

(3 people, part-time)

Mailbox ignored

SCADA alarms

Control room operators

Engineering on-call

Operators only

Alert fatigue

System security events

IT Operations

IT Security

Inconsistent

Depends on noise

Organisational reality (the gap)

What the org chart says:

  • IT Security monitors security

  • Operations monitors OT

  • SIEM provides oversight

What actually happens:

  • IT Security assumes Operations is watching OT

  • Operations assumes IT Security is handling security

  • Everyone assumes the SIEM is watching everything

Observed fact:

  • Nobody logged into the SIEM in four months

At UU P&L, we discovered that the org chart showed clear responsibilities, but the reality was that security monitoring had gradually become “someone else’s problem”. IT security thought operations was monitoring the OT systems. Operations thought IT security was handling the security monitoring. Everyone thought the SIEM was providing oversight, but nobody had logged into the SIEM in four months.

IDS and IPS effectiveness testing

Intrusion Detection Systems look for attack patterns in network traffic. Intrusion Prevention Systems do the same but also attempt to block attacks. Testing their effectiveness means intentionally triggering them and observing the response.

Test signature-based detection

Most IDS systems use signatures, patterns that match known attacks:

🐙 IDS Detection Test: Security Monitoring Validation

Perform the test, document the exact time, then verify:

  1. Does the IDS log the activity? (Check IDS console or logs)

  2. Does an alert fire? (Check alert destinations)

  3. Does anyone respond? (Check with SOC or security team)

At UU P&L, we performed three detection tests:

Test 1: Aggressive port scan

  • Performed: 2024-12-10 14:35:22

  • IDS Detection: YES (logged in Snort)

  • Alert Generated: YES (email sent)

  • Human Response: NO (mailbox unmonitored)

  • Result: Detection works, response doesn’t

Test 2: SMB exploit attempt

  • Performed: 2024-12-10 15:12:44

  • IDS Detection: NO (signature out of date)

  • Alert Generated: NO

  • Human Response: NO

  • Result: Detection failed (outdated signatures)

Test 3: Modbus write command

  • Performed: 2024-12-10 16:03:11

  • IDS Detection: NO (no OT-specific rules)

  • Alert Generated: NO

  • Human Response: NO

  • Result: Detection failed (OT protocols not monitored)

SIEM correlation testing

SIEMs are supposed to correlate events from multiple sources to identify complex attacks that wouldn’t be obvious from single events.

Test multi-stage attack detection

Perform an attack that involves multiple steps and see if the SIEM correlates them.

Attack chain:

  1. Login to VPN from unusual location (UK to US)

  2. Multiple failed login attempts on jump server

  3. Successful login after failures

  4. RDP from jump server to engineering workstation

  5. Download of PLC configuration files

  6. Connection to PLC network

  7. Modbus write commands to multiple PLCs

Expected SIEM correlation:

  • VPN login from new location → Flag as suspicious

  • Failed logins → Flag as brute force attempt

  • Successful login after failures → Escalate priority

  • Unusual RDP activity → Add to event chain

  • File downloads + PLC connections + Write commands → Critical Alert: Potential OT Attack Chain

Test whether this correlation actually happens:

🐙 SIEM Correlation Test: Automated Multi-Stage Attack

At UU P&L, we performed this test and discovered:

  • Event 1 (VPN login): Logged in SIEM ✓

  • Event 2 (Failed logins): Logged in SIEM ✓

  • Event 3 (RDP): Logged in SIEM ✓

  • Event 4 (File access): NOT logged (file auditing not enabled) ✗

  • Event 5 (Modbus): NOT logged in SIEM (IDS not integrated with SIEM) ✗

  • Correlation: NO correlation rules configured for this attack pattern ✗

  • Alert: NO alert generated ✗

  • Human Response: NO response ✗

The SIEM was collecting logs, but it wasn’t correlating them into meaningful patterns. Each event was visible if we specifically searched for it, but there was no automatic detection of the attack chain.

Anomaly detection bypass

Some monitoring systems use anomaly detection, they learn what’s “normal” and alert on deviations. Test whether we can perform attacks whilst staying within “normal” parameters:

🐙 Anomaly Detection Bypass Test

Alert fatigue exploitation

Alert fatigue occurs when monitoring systems generate so many alerts that humans stop paying attention. This is remarkably common in OT environments where normal operations can trigger hundreds of “security” alerts per day.

Test alert fatigue

Methodology:

  1. Review current alert volume

  2. Identify most common alerts

  3. Perform attack that generates similar alerts

  4. Verify attack alerts are ignored along with false positives

Example at UU P&L:

Current alert volume:

  • ~300 alerts/day

  • 95% false positives

  • Common alerts:

    • “Unusual Modbus traffic” (fires on normal HMI polling)

    • “Multiple login attempts” (fires on normal password typos)

    • “Unusual network connection” (fires on legitimate engineering activity)

Attack using alert fatigue:

  • Perform actual brute force attack

  • Generates “Multiple login attempts” alert

  • Alert looks identical to 50 other daily false positives

  • Lost in the noise, ignored by SOC

Result:

  • Attack successful

  • Alert generated correctly

  • Alert completely ignored due to alert fatigue

Document alert tuning needs

Current alert configuration:

  • 37 alert rules configured

  • 300+ daily alerts

  • 95% false positive rate

  • Result: Alert fatigue, real attacks ignored

Recommended tuning:

  • Reduce false positives by tuning thresholds

  • Whitelist known-good activities

  • Prioritise high-confidence alerts

  • Target: <10 alerts/day, >50% true positive rate

Example:

  • Current: “Unusual Modbus traffic” → Fires on ANY Modbus from unknown source

  • Tuned: “Unauthorised Modbus write” → Fires only on WRITE commands from non-engineering sources

Impact:

  • Reduces daily alerts from 300 to ~15

  • Increases true positive rate from 5% to ~60%

  • Increases likelihood of human response to alerts

At UU P&L, we demonstrated that the monitoring systems were technically functional but operationally useless due to poor tuning. The IDS could detect attacks, but it also detected everything else, so nobody was listening when it screamed.

Logging gaps

Monitoring is only effective if relevant events are actually logged. Test for gaps:

🐙 Logging Gap Test: PLC Logic Modification

Critical events that should be logged

Actual logging:

Category

Event

Actually logged

PLC operations

Logic uploads/downloads

✗ No (PLC auditing disabled)

Configuration changes

✗ No (PLC auditing disabled)

Write commands to critical registers

✗ Sometimes (if IDS catches them)

Network operations

Connections to PLC network

✗ Partially (only at firewall, not at switches)

Protocol-specific operations (Modbus writes, DNP3 commands)

✗ No (IDS has no OT protocol awareness)

Authentication

VPN logins

✓ Yes

Failed login attempts

✓ Yes

Privilege escalations

✗ Partially (Windows events logged, but not correlated)

Response time measurement

Even if detection works perfectly, effectiveness depends on response time. How quickly can the organization respond to an alert?

Test response workflow

Alert response test:

  1. Generate test alert (inform SOC this is a test)

  2. Document alert generation time

  3. Measure time to:

    • Alert reaches monitoring station

    • Analyst acknowledges alert

    • Analyst begins investigation

    • Analyst escalates to engineering

    • Engineering responds

    • Issue is resolved

Example Timeline at UU P&L:

14:35:22 - Alert generated (port scan detected)
14:35:24 - Email sent to security@uupl.edu
14:35:28 - Email delivered to mailbox
[6 hours pass - mailbox not monitored]
20:41:15 - IT security checks mailbox (next business day)
20:45:33 - Analyst reviews alert
20:52:11 - Analyst determines it's a penetration test
21:03:44 - Analyst contacts OT engineering
[Engineering after hours, no immediate response]
09:15:22 - Engineering reviews alert (next business day)
09:32:18 - Engineering confirms test, no action needed

Total time from alert to response: 18 hours 57 minutes

During an actual attack, attacker would have completed objectives and cleaned up traces multiple times over.

Establish baseline response times

Critical Alert (PLC manipulation detected):

  • Target response time: < 15 minutes

  • Actual response time: 18+ hours

  • Gap: Monitoring mailbox not checked in real-time

High Alert (Brute force attempt):

  • Target response time: < 1 hour

  • Actual response time: 20+ hours

  • Gap: No after-hours monitoring

Medium Alert (Unusual traffic pattern):

  • Target response time: < 4 hours

  • Actual response time: Never (lost in alert noise)

  • Gap: Alert fatigue means medium alerts ignored

The reality of detection in OT

The harsh truth about detection in OT environments is that most organizations have invested in monitoring technology but not in monitoring operations. They have IDS systems that nobody watches, SIEM platforms that nobody logs into, and alerts that nobody responds to. The security controls exist, they’re just not actually controlling anything.

This isn’t entirely the fault of the security team. OT environments are difficult to monitor because:

  • Normal operations can look like attacks (lots of Modbus polling, frequent network scans, etc.)

  • Attack patterns aren’t well-defined (OT-specific attack signatures are rare)

  • Alert tuning requires deep understanding of both security and operations

  • 24/7 monitoring is expensive and difficult to justify

  • Response requires coordination between IT security and OT engineering

  • False positives are common and create alert fatigue

At UU P&L, we found that detection capability existed in theory but not in practice. The technical controls were mostly functional, but the operational processes around them had gradually degraded until the entire monitoring infrastructure was essentially decorative.

Our recommendations focused less on technology and more on process:

  1. Reduce false positives: Tune alerts so they’re actionable

  2. Integrate monitoring: Connect IDS to SIEM, enable PLC auditing

  3. Define clear ownership: Who monitors what, who responds to what

  4. Establish realistic SLAs: Response times based on actual capability

  5. Regular testing: Quarterly detection tests to verify monitoring works

  6. Training: Ensure SOC understands OT-specific attacks

The goal isn’t perfect detection (which doesn’t exist). The goal is detection that’s good enough to catch most attacks, tuned well enough that alerts are taken seriously, and backed by processes that ensure someone actually responds when alerts fire.

Detection without response is just expensive logging. Response without detection is impossible. We need both, and they need to work together, and someone needs to be responsible for making sure they continue to work together as the environment evolves. That is the hard part, and it is not a technology problem.