Your own architecture¶

Apply everything you’ve learned

Overview¶

These aren’t prescriptive challenges with specific steps. These are open-ended design challenges where you apply everything you’ve learned to build comprehensive security architectures.

No hand-holding. No specific instructions. Just the goal and your judgement.

Challenge 10: Build a complete security architecture¶

The challenge: You conducted a pentest. You found 10-15 vulnerabilities. Pick your top 3-5 critical findings. Implement comprehensive remediations addressing all of them.

Your approach¶

Prioritise findings:

Which findings matter most?
Safety impact?
Operational impact?
Likelihood of exploitation?
Business impact?

Use a prioritisation framework:

Priority = (Safety × 2) + (Operational × 1.5) + (Likelihood × 1.5) + (Business × 1) / Remediation Feasibility

Choose 3-5 critical findings: Don’t try to fix everything. Focus on what matters most.

Examples:

Unauthenticated Modbus access to turbines
Anonymous SCADA access
No network segmentation
Missing safety system isolation
Lack of audit logging

Design comprehensive remediations: For each finding, design complete fix:

Technical controls (what to implement)
Process controls (procedures, change management)
Monitoring and detection (how to know if it’s working)
Incident response (what to do when it fails)

Implement defence in depth: Don’t rely on single control. Layer defenses:

Authentication (who are you?)
Authorization (what can you do?)
Network segmentation (limit blast radius)
Monitoring (detect attacks)
Logging (forensics and audit)

Document trade-offs: Every remediation has costs. Document:

Implementation cost (time, money, effort)
Operational impact (what becomes harder?)
Performance impact
Maintenance burden
Limitations (what doesn’t this fix?)
Residual risk (what’s still vulnerable?)

Test comprehensively¶

Security testing:

Run your original attack scripts
Do they still work?
Can you find new bypasses?
Test each layer independently

Operational testing:

Can operators do their jobs?
What workflows changed?
What’s harder now?
What’s impossible now?

Failure scenario testing:

What happens when controls fail?
Authentication server down?
Firewall misconfigured?
Certificates expired?
Can operations continue?

Red team vs blue team: If you have others working on this:

Swap simulators
Try to break each other’s defences
Learn from what works
Learn from what fails

What you can learn¶

Prioritisation is hard:

Can’t fix everything
Resources are limited
Some things are unfixable
Risk acceptance is reality

Defence in depth works:

No single control is sufficient
Multiple layers catch what one misses
But complexity increases
More maintenance burden

Trade-offs are everywhere:

Security vs usability
Security vs operational flexibility
Security vs performance
Security vs cost
Every choice is a trade-off

Documentation matters:

Why did you choose these remediations?
What trade-offs did you accept?
What residual risks remain?
Future you will need this information

Perfection is impossible:

There’s always residual risk
Accept it, document it, monitor it
Focus on what matters most

Where to start¶

# Review your pentest findings
# List all vulnerabilities found

# For each vulnerability, assess:
# - Safety impact (1-5)
# - Operational impact (1-5)
# - Likelihood (1-5)
# - Business impact (1-5)
# - Remediation feasibility (1-5)

# Calculate priority scores
# Choose top 3-5

# For each chosen finding:
# - Design remediation (what controls?)
# - Estimate cost (time, money, effort)
# - Document trade-offs
# - Implement
# - Test
# - Document results

Example approach¶

Finding 1: Unauthenticated Modbus to turbines

Remediations:

Technical: IP whitelisting at firewall (only SCADA and engineering)
Technical: Deploy anomaly detection for abnormal Modbus traffic
Technical: Integrate logging for all Modbus writes
Process: Change management for firewall rules
Monitoring: Alert on Modbus connections from non-whitelisted IPs
Incident response: Procedure for investigating unauthorised access attempts

Trade-offs:

Cost: 16 hours implementation, minimal financial cost
Operational impact: Vendor remote access requires firewall change
Performance: None (firewall rules are fast)
Maintenance: Need to update whitelist when systems change
Limitations: Doesn’t authenticate, just restricts source IPs (can be spoofed on local network)
Residual risk: Insider or compromised HMI still has full access

Test:

From unauthorised IP: Access blocked ✓
From SCADA: Access works ✓
From compromised HMI: Still works (residual risk) ✗
Anomaly detection catches unusual writes ✓

Finding 2: Anonymous SCADA access

Remediations:

Technical: Enable OPC UA authentication (see Challenge 1)
Technical: Generate and deploy certificates
Technical: Integrate authorisation checks
Process: Certificate lifecycle management procedures
Monitoring: Log all authentication attempts
Incident response: Procedure for handling compromised certificates

Trade-offs:

Cost: 40 hours implementation, €5,000 for certificate management
Operational impact: All clients need certificates, HMI reconfiguration
Performance: ~10ms latency increase per connection
Maintenance: Certificate renewal every year
Limitations: Doesn’t encrypt data, just authenticates (see Challenge 7 for encryption)
Residual risk: Compromised client certificate still grants access

Continue this for each priority finding…

Challenge 11: Design and defend a critical operation¶

The challenge: Choose one critical operation. Implement complete security controls for that one operation end-to-end. Make it both secure and usable.

Choose your operation¶

Pick one:

Option 1: Reactor startup

Complex multi-step procedure
Safety-critical
Requires coordination between multiple systems
Takes 30-60 minutes
Errors can be dangerous

Option 2: Turbine emergency stop

Must be fast (seconds matter)
Safety-critical
Can’t have delays
But must prevent unauthorised stops
Balance security and speed

Option 3: Safety system bypass

Extremely dangerous if abused
Legitimate need during maintenance
Must be temporary and monitored
Require multiple approvals
Automatic revert

Design comprehensive controls¶

Pre-operation:

Who can initiate?
What permissions required?
Any approvals needed?
Preconditions (system state checks)?

Authentication:

Single person or dual authorisation?
What role is required?
Certificate-based? Password? MFA?

Authorization:

What permissions grant access?
Time-based (only during maintenance windows)?
Location-based (only from control room)?

Initiation:

How is operation triggered?
Any confirmation required?
Any wait period (cooling-off)?

During operation:

Monitoring and logging
Progress tracking
Anomaly detection
Ability to abort?
Who can abort?

Safety interlocks:

What safety checks during operation?
Automatic abort conditions?
Override procedures?

Completion:

Success criteria
Validation checks
Automatic revert (for bypass operations)
Notification

Post-operation:

Logging and audit trail
Who did what when?
Success or failure?
Any anomalies detected?

Emergency scenarios:

What if authentication fails?
What if safety interlock triggers?
What if operation hangs?
Break-glass procedures?

Implement and test¶

Normal operation testing:

Authorized user performs operation
Everything works smoothly
Logging captures all steps
Operation completes successfully

Authorization testing:

Unauthorized user attempts operation - blocked
Wrong role attempts operation - blocked
Dual auth with only one person - blocked

Safety testing:

Trigger safety interlock during operation
Operation should abort safely
System returns to safe state

Failure testing:

Authentication server down during operation
What happens?
Can operation proceed?
Can operation complete?

Emergency testing:

Real emergency requiring immediate action
Can you bypass procedures?
Is it audited?
Can you justify it later?

Usability testing:

How long does the secure operation take vs unsecured?
Is the delay acceptable?
Do operators find it reasonable?
Or will they work around it?

What you can learn¶

Security vs safety:

Sometimes they conflict
Security can delay safety responses
Need emergency overrides
But overrides can be abused
No perfect answer

Usability vs security:

Most secure: lock it down completely
Most usable: no controls
Reality: somewhere in between
Finding balance requires iteration

Operational realities:

Procedures look good on paper
Reality is messier
Emergencies don’t follow procedures
Edge cases multiply
Need flexibility

Defence in depth for operations:

Authentication (who)
Authorization (permission)
Dual authorisation (two-person rule)
Safety interlocks (prevent physical danger)
Monitoring (detect anomalies)
Logging (audit trail)
Emergency procedures (break-glass)

Where to start¶

# Choose your operation
# (Recommend reactor startup or safety bypass)

# Map the operation:
# 1. What are the steps?
# 2. What systems are involved?
# 3. What can go wrong?
# 4. What are the risks?

# Design security controls:
# For each phase (pre, during, post):
# - What checks?
# - What approvals?
# - What monitoring?
# - What logging?

# Consider emergency scenarios:
# - Authentication failure
# - Safety interlock triggers
# - Operation hangs
# - Real emergency requiring immediate action

# Implement incrementally:
# - Start with authentication
# - Add authorization
# - Add dual auth if needed
# - Add monitoring
# - Add logging
# - Test each addition

# Test thoroughly:
# - Normal operation
# - Unauthorized attempts
# - Failure scenarios
# - Emergency scenarios
# - Usability (is it practical?)

Example: Safety system bypass¶

Chosen operation: Bypass reactor safety interlock during maintenance

Why it’s critical:

Allows maintenance while reactor is hot
Removes safety protection
Dangerous if abused or forgotten
Must be temporary and monitored

Pre-operation controls:

Dual authorisation required (supervisor + engineer)
Justification required (text field: why are you bypassing?)
Maintenance window validation (only allowed during scheduled maintenance)
Safety system status check (ensure other interlocks still active)
Automatic expiry configured (1 hour default, max 4 hours)

During operation controls:

Alarm displayed on all HMIs: “SAFETY BYPASS ACTIVE”
Monitoring for any safety parameter violations
Logging all operations performed during bypass
Ability to abort maintenance and restore safety
Countdown timer showing time until automatic revert

Post-operation controls:

Automatic revert after time limit
Manual restore option (before time limit)
Validation that safety system restored
Test that safety interlock is functional
Audit log entry with: who, when, duration, justification, what was done
Report to safety officer

Emergency procedures:

If safety parameter exceeds threshold during bypass, automatic revert
If reactor enters unsafe state, forced shutdown
Emergency button overrides bypass immediately

Implementation:

# Simplified pseudocode
async def request_safety_bypass(user1_session, user2_session, justification, duration_minutes):
    # Dual authorisation check
    if not await auth.authorize_with_dual_auth(
        user1_session, user2_session,
        PermissionType.SAFETY_BYPASS, "reactor_1"
    ):
        log_security("Safety bypass denied - insufficient authorization")
        return False

    # Maintenance window check
    if not in_maintenance_window():
        log_security("Safety bypass denied - not in maintenance window")
        return False

    # Duration limit check
    if duration_minutes > 240:  # Max 4 hours
        log_security("Safety bypass denied - duration exceeds maximum")
        return False

    # Record justification
    await log_audit(
        "Safety bypass requested",
        user1=get_user(user1_session),
        user2=get_user(user2_session),
        justification=justification,
        duration=duration_minutes
    )

    # Activate bypass
    await reactor.bypass_safety_interlock("temperature_high", duration_minutes)

    # Start monitoring
    await start_bypass_monitoring("reactor_1", duration_minutes)

    # Display alarm on all HMIs
    await hmi.show_alarm("SAFETY BYPASS ACTIVE", AlarmPriority.HIGH)

    return True

Testing results:

✓ Dual auth required (single user attempt blocked)
✓ Justification required (empty justification rejected)
✓ Maintenance window enforced (attempt during production blocked)
✓ Duration limited (5-hour request rejected)
✓ Automatic revert after time limit
✓ Emergency revert on safety parameter violation
✓ Complete audit trail

Trade-offs accepted:

Adds ~2 minutes to bypass procedure (dual auth, justification)
Acceptable for maintenance operations (not emergencies)
Manual restore required after maintenance (can’t auto-detect “maintenance complete”)
False alarms possible (parameter violations during normal maintenance)

Residual risks:

Two colluding insiders can still abuse bypass
Mitigation: Audit review, pattern detection
Operator fatigue could lead to expired bypass not being noticed
Mitigation: Countdown timer, alarm, automatic revert

Combining challenges 10 and 11¶

For the ultimate challenge:

Build complete security architecture (Challenge 10)
Design secure critical operation (Challenge 11)
Integrate them: your critical operation uses your security architecture
Test: Does it work? Is it secure? Is it usable?

This is what real OT security programs do.

“Security isn’t a feature you add. It’s an architecture you build. And it’s never finished.” - Ponder Stibbons