Architecture and encryption¶

Fundamental changes to how systems operate

Overview¶

These three challenges aren’t quick fixes. They’re architectural changes that affect how the entire system operates. They take planning, testing, and careful rollout. They also have the biggest security impact.

Expect these to be complex. Expect things to break. Expect to learn why OT security projects take months.

Challenge 7: Encrypt SCADA communications¶

The problem: SCADA data travels in cleartext. Anyone with network access can intercept operational data, understand system behaviour, and plan attacks.

Your goal: Deploy OPC UA with signing and encryption. Make SCADA communications confidential and tamper-proof.

What you can do¶

Configure OPC UA security policy:

from components.security.encryption import OPCUACrypto, OPCUASecurityPolicy

# Change from None to encrypted
security_policy = OPCUASecurityPolicy.AES256_SHA256_RSAPSS
policy_uri = OPCUACrypto.get_security_policy_uri(security_policy)

# Enable signing and encryption
message_security = MessageSecurityMode.SignAndEncrypt

Generate and distribute certificates:

from components.security.encryption import CertificateManager

cert_mgr = CertificateManager(cert_dir=Path("./certs"))

# Generate certificate for SCADA server
server_cert, server_key = cert_mgr.generate_self_signed_cert("scada.uu-power.local")
cert_mgr.save_certificate(server_cert, server_key, "scada_server")

# Generate certificates for each client (HMIs, engineering stations)
for client in ["hmi_1", "hmi_2", "engineering_1"]:
    cert, key = cert_mgr.generate_self_signed_cert(f"{client}.uu-power.local")
    cert_mgr.save_certificate(cert, key, client)

Implement certificate validation:

Server validates client certificates
Client validates server certificate
Reject connections with invalid/expired certificates
Handle trust chain

Measure performance impact:

Baseline: measure connection time, read latency, write latency without encryption
Encrypted: measure same operations with SignAndEncrypt
Calculate overhead
Is it acceptable for real-time operations?

Test it¶

Security testing:

# Try to intercept traffic
tcpdump -i any -w capture.pcap port 4840

# Open in Wireshark - can you read data?
wireshark capture.pcap

# Should be encrypted, unreadable

Connection testing:

# Connect without certificate - should fail
python scripts/vulns/opcua_readonly_probe.py --endpoint opc.tcp://127.0.0.1:4840

# Connect with valid certificate - should succeed
python scripts/vulns/opcua_readonly_probe.py --endpoint opc.tcp://127.0.0.1:4840 --cert client.pem --key client_key.pem

# Connect with expired certificate - should fail

Performance testing:

Measure read/write latency
Measure connection establishment time
Test under load (many simultaneous connections)
Is real-time performance still acceptable?

Operational testing:

What happens when certificate expires?
Can you renew without downtime?
What’s the emergency procedure when PKI fails?

What you can learn¶

Encryption overhead:

CPU cost of encryption/decryption
Memory overhead
Latency increase
May not be acceptable for hard real-time systems

Certificate lifecycle management:

Generation, distribution, installation
Renewal before expiry
Revocation when compromised
Backup and recovery
This is a full-time job in large deployments

PKI infrastructure requirements:

Certificate Authority (even if self-signed)
Certificate storage and backup
Certificate distribution mechanism
Monitoring for expiring certificates
Revocation checking (CRL or OCSP)

Operational complexity:

More moving parts
More points of failure
More things to monitor
More maintenance burden

Trade-offs:

Confidentiality vs performance
Security vs complexity
Protection vs operational risk

Where to start¶

# Understand OPC UA encryption
cat components/security/README.md | sed -n '/### encryption.py/,/### anomaly_detector.py/p'

# Look at OPC UA security classes
grep -A 30 "class OPCUACrypto\|class OPCUASecurityPolicy" components/security/encryption.py

# Find OPC UA server implementation
find components/ -name "*opcua*" -type f

# Look at certificate management
grep -A 30 "class CertificateManager" components/security/encryption.py

Going deeper¶

Questions to explore:

How do you handle certificate expiry without downtime?
What’s your CA strategy (commercial, internal, self-signed)?
How do you revoke compromised certificates?
How do you handle legacy clients that don’t support encryption?

Advanced options:

Deploy internal PKI with proper CA
Implement automated certificate renewal
Deploy certificate monitoring and alerting
Implement certificate pinning for additional security
Test different security policies (Basic128Rsa15 vs Basic256Sha256)
Measure and optimise performance
Implement hardware security modules (HSMs) for key storage

Challenge 8: Implement jump host architecture¶

The problem: Administrative access comes from anywhere on the corporate network. Compromised workstation = compromised OT. No centralised access control or monitoring.

Your goal: Deploy jump host (bastion) architecture. All administrative OT access flows through one controlled point.

What you can do¶

Design the architecture:

Before:
Corporate Network ──→ Turbine PLC
                  ├─→ Reactor PLC
                  ├─→ SCADA
                  └─→ Safety PLC

After:
Corporate Network ──→ Jump Host ──→ Turbine PLC
                                 ├─→ Reactor PLC
                                 ├─→ SCADA
                                 └─→ Safety PLC

Direct access: BLOCKED by firewall

Deploy jump host:

Hardened Linux server or Windows bastion
Minimal software installed
Strong authentication (certificates or MFA)
Session recording enabled
Logging all access

Configure firewall rules:

# Block direct access from corporate to OT
iptables -A FORWARD -s 192.168.1.0/24 -d 192.168.100.0/24 -j DROP

# Allow jump host to OT
iptables -A FORWARD -s 192.168.1.50 -d 192.168.100.0/24 -j ACCEPT

# Allow corporate to jump host
iptables -A FORWARD -s 192.168.1.0/24 -d 192.168.1.50 -j ACCEPT

Integrate authentication:

# Jump host authenticates using AuthenticationManager
# Records all sessions
# Enforces authorisation before allowing connections

Create break-glass procedure:

What happens when jump host fails?
Emergency bypass procedure
Documented, audited, infrequent
Temporary firewall rule modification
Automatic revert after emergency

Test it¶

Access control testing:

# Try direct access to PLC - should be blocked
telnet 192.168.100.10 502

# Try via jump host - should succeed
ssh jump-host
telnet 192.168.100.10 502

Bypass testing:

Can you bypass jump host?
Spoof source IP?
Use different protocol?
Find misconfigured firewall rule?

Failure scenario testing:

Stop jump host service
Can you still access OT? (should not, except emergency)
Activate break-glass procedure
Verify emergency access works
Verify automatic revert

Usability testing:

How does this affect operator workflow?
How long does it take to connect?
Is it practical for frequent access?
Do people try to work around it?

What you can learn¶

Single point of failure:

Jump host down = no administrative access
Need high availability (redundant jump hosts)
Need emergency procedures
But emergency procedures can be abused

Centralised control benefits:

All access logged in one place
Consistent authentication and authorisation
Session recording for audit
Easier to monitor for abuse

Usability impact:

Extra hop for every connection
More complex for users
Resistance from operators
Training required

Break-glass procedures:

Need emergency access mechanism
But emergency access can be abused
Need monitoring and audit
Difficult balance

Where to start¶

# This challenge requires architectural planning
# No single component to use - you're building infrastructure

# Consider jump host software options:
# - SSH bastion with session recording
# - RDP gateway
# - PAM solution (Privileged Access Management)

# Plan firewall rules
# Map current access patterns
# Design new access patterns through jump host
# Test in lab before production

# Read about jump host patterns
# Search: "bastion host OT" "jump server ICS"

Going deeper¶

Questions to explore:

How do you make jump host highly available?
What’s the monitoring strategy for jump host?
How do you handle vendor remote access?
What about third-party vendors who need temporary access?

Advanced options:

Deploy redundant jump hosts for HA
Implement PAM solution with full session recording
Deploy jump host in DMZ for vendor access
Implement just-in-time access (request approval, get temporary access)
Deploy different jump hosts for different privilege levels
Implement geofencing (only allow access from specific locations)

Challenge 9: Network segmentation (IEC 62443 zones)¶

The problem: Everything is on one flat network. Compromised corporate workstation = access to safety systems. No network-level isolation. One breach compromises everything.

Your goal: Design and implement zone-based architecture following IEC 62443. Separate safety from production. Isolate corporate IT from OT.

What you can do¶

Design zone architecture:

IEC 62443 Zones:

Level 3 (Enterprise): Corporate IT, ERP, Email
    ↓ (Conduit: DMZ with firewalls)
Level 2 (Supervision): SCADA, HMI, Historian
    ↓ (Conduit: Industrial firewall)
Level 1 (Control): PLCs, Controllers
    ↓ (Conduit: Process network)
Level 0 (Process): Sensors, Actuators, Field devices

Safety Zone (parallel): Safety PLC, Safety I/O
    ↓ (Isolated, minimal conduits)

Map systems to zones:

Level 0: Turbine sensors and actuators, reactor instrumentation
Level 1: Turbine PLCs, Reactor PLC
Level 2: SCADA servers (primary and backup), HMIs
Level 3: Engineering workstations, management systems
Safety: Safety PLC (separate zone, minimal connectivity)

Define conduits (allowed communications):

Allowed:
- Level 2 → Level 1: SCADA reads PLC data, HMI writes setpoints
- Level 3 → Level 2: Engineering access to SCADA (via jump host)
- Level 1 → Level 0: PLC controls field devices

Blocked:
- Level 3 → Level 1: No direct engineering to PLC
- Level 3 → Level 0: No direct corporate to field devices
- Any → Safety Zone: Minimal, tightly controlled

Exception: Safety Zone → Level 1: Safety interlocks

Implement segmentation:

Option 1: VLANs with Layer 3 routing and firewall
Option 2: Physical network separation
Option 3: Industrial firewalls between zones

Configure firewall rules:

# Example rules (simplified)
# Level 2 to Level 1: Allow Modbus, S7
iptables -A FORWARD -s 192.168.2.0/24 -d 192.168.1.0/24 -p tcp --dport 502 -j ACCEPT
iptables -A FORWARD -s 192.168.2.0/24 -d 192.168.1.0/24 -p tcp --dport 102 -j ACCEPT

# Level 3 to Level 2: Allow OPC UA via jump host only
iptables -A FORWARD -s 192.168.3.50 -d 192.168.2.0/24 -p tcp --dport 4840 -j ACCEPT
iptables -A FORWARD -s 192.168.3.0/24 -d 192.168.2.0/24 -j DROP

# Safety zone: tightly restricted
iptables -A FORWARD -d 192.168.99.0/24 -j DROP  # Default deny
iptables -A FORWARD -s 192.168.99.10 -d 192.168.1.0/24 -p tcp --dport 502 -j ACCEPT  # Only safety PLC to control PLCs

Test it¶

Segmentation testing:

# From corporate (Level 3), try to reach PLC (Level 1) - should fail
ping 192.168.1.10

# From SCADA (Level 2), try to reach PLC - should succeed
ping 192.168.1.10

# From anywhere, try to reach safety zone - should fail
ping 192.168.99.10

Pivot testing:

Compromise corporate workstation (Level 3)
Can you reach Level 2?
Can you reach Level 1?
Can you reach safety zone?
Where does segmentation stop you?

Operational testing:

Can operators use HMI?
Can engineers program PLCs?
Can maintenance access systems?
What workflows break?

Legitimate cross-zone requirements:

Historian needs data from all PLCs
Engineering needs to program PLCs
Vendor needs remote access
How do you handle these?

What you can learn¶

Zone architecture is complex:

Every system needs to be in a zone
Every communication needs to be in a conduit
Exceptions multiply
Change management becomes critical

Operational impact is huge:

Workflows change
Some things become harder
Need new procedures
Training required

Perfect segmentation is impossible:

Always need some cross-zone communication
Historian, engineering access, vendor access
Each conduit is a potential attack path
Defence in depth, not perfect isolation

Implementation challenges:

Existing infrastructure wasn’t designed for zones
Retrofitting is expensive
Switch/firewall replacements
Cable runs
Downtime for cutover

Trade-offs everywhere:

Security vs operational flexibility
Isolation vs necessary communication
Cost vs protection
Complexity vs usability

Where to start¶

# This is the most complex challenge
# Start with planning, not implementation

# Step 1: Map current network
# - What systems exist?
# - How do they communicate?
# - What protocols?
# - Draw current architecture

# Step 2: Design zones
# - Assign each system to a zone
# - Define security requirements per zone
# - Identify required conduits

# Step 3: Plan implementation
# - What hardware is needed?
# - What changes to systems?
# - Downtime requirements?
# - Testing approach?

# Read IEC 62443 documentation
# Search: "IEC 62443 zones and conduits"

Going deeper¶

Questions to explore:

How do you handle systems that span zones (historian)?
What’s the firewall change management process?
How do you test firewall rules without breaking production?
How do you handle new systems (which zone? which conduits)?

Advanced options:

Deploy industrial firewalls with deep packet inspection
Implement unidirectional gateways for critical isolation
Deploy DMZ for vendor remote access
Implement micro-segmentation within zones
Deploy IDPS (Intrusion Detection/Prevention) at zone boundaries
Implement protocol whitelisting at firewalls
Deploy application-layer gateways for protocol inspection

Phased implementation: Don’t try to do everything at once. Implement in phases:

Phase 1: Separate corporate (Level 3) from OT (Level 1-2)
Phase 2: Separate SCADA (Level 2) from PLCs (Level 1)
Phase 3: Isolate safety zone
Phase 4: Micro-segmentation within zones

Test each phase thoroughly before proceeding.

Combining architectural challenges¶

If you’re ambitious, implement all three:

Encryption: SCADA communications are confidential
Jump host: Administrative access is centralised
Segmentation: Zones limit lateral movement

The result: Defence in depth architecture

Network segmentation limits attack surface
Jump host controls and monitors administrative access
Encryption protects data in transit
Compromising one layer doesn’t compromise all

Test the combination:

Simulate attack from corporate network
How far can you get?
Which defences stop you?
What’s the attack path?

Understand the costs:

Implementation time (months)
Hardware costs (firewalls, switches, jump hosts)
Operational complexity
Maintenance burden
Training requirements

Is it worth it? Depends on your risk tolerance and asset value.

“Anyone can make a system secure by making it unusable. The skill is making it secure and usable. That requires architecture.” - Ponder Stibbons