Architecture and encryption¶
Fundamental changes to how systems operate
Overview¶
These three challenges aren’t quick fixes. They’re architectural changes that affect how the entire system operates. They take planning, testing, and careful rollout. They also have the biggest security impact.
Expect these to be complex. Expect things to break. Expect to learn why OT security projects take months.
Challenge 7: Encrypt SCADA communications¶
The problem: SCADA data travels in cleartext. Anyone with network access can intercept operational data, understand system behaviour, and plan attacks.
Your goal: Deploy OPC UA with signing and encryption. Make SCADA communications confidential and tamper-proof.
What you can do¶
Configure OPC UA security policy:
from components.security.encryption import OPCUACrypto, OPCUASecurityPolicy
# Change from None to encrypted
security_policy = OPCUASecurityPolicy.AES256_SHA256_RSAPSS
policy_uri = OPCUACrypto.get_security_policy_uri(security_policy)
# Enable signing and encryption
message_security = MessageSecurityMode.SignAndEncrypt
Generate and distribute certificates:
from components.security.encryption import CertificateManager
cert_mgr = CertificateManager(cert_dir=Path("./certs"))
# Generate certificate for SCADA server
server_cert, server_key = cert_mgr.generate_self_signed_cert("scada.uu-power.local")
cert_mgr.save_certificate(server_cert, server_key, "scada_server")
# Generate certificates for each client (HMIs, engineering stations)
for client in ["hmi_1", "hmi_2", "engineering_1"]:
cert, key = cert_mgr.generate_self_signed_cert(f"{client}.uu-power.local")
cert_mgr.save_certificate(cert, key, client)
Implement certificate validation:
Server validates client certificates
Client validates server certificate
Reject connections with invalid/expired certificates
Handle trust chain
Measure performance impact:
Baseline: measure connection time, read latency, write latency without encryption
Encrypted: measure same operations with SignAndEncrypt
Calculate overhead
Is it acceptable for real-time operations?
Test it¶
Security testing:
# Try to intercept traffic
tcpdump -i any -w capture.pcap port 4840
# Open in Wireshark - can you read data?
wireshark capture.pcap
# Should be encrypted, unreadable
Connection testing:
# Connect without certificate - should fail
python scripts/vulns/opcua_readonly_probe.py --endpoint opc.tcp://127.0.0.1:4840
# Connect with valid certificate - should succeed
python scripts/vulns/opcua_readonly_probe.py --endpoint opc.tcp://127.0.0.1:4840 --cert client.pem --key client_key.pem
# Connect with expired certificate - should fail
Performance testing:
Measure read/write latency
Measure connection establishment time
Test under load (many simultaneous connections)
Is real-time performance still acceptable?
Operational testing:
What happens when certificate expires?
Can you renew without downtime?
What’s the emergency procedure when PKI fails?
What you can learn¶
Encryption overhead:
CPU cost of encryption/decryption
Memory overhead
Latency increase
May not be acceptable for hard real-time systems
Certificate lifecycle management:
Generation, distribution, installation
Renewal before expiry
Revocation when compromised
Backup and recovery
This is a full-time job in large deployments
PKI infrastructure requirements:
Certificate Authority (even if self-signed)
Certificate storage and backup
Certificate distribution mechanism
Monitoring for expiring certificates
Revocation checking (CRL or OCSP)
Operational complexity:
More moving parts
More points of failure
More things to monitor
More maintenance burden
Trade-offs:
Confidentiality vs performance
Security vs complexity
Protection vs operational risk
Where to start¶
# Understand OPC UA encryption
cat components/security/README.md | sed -n '/### encryption.py/,/### anomaly_detector.py/p'
# Look at OPC UA security classes
grep -A 30 "class OPCUACrypto\|class OPCUASecurityPolicy" components/security/encryption.py
# Find OPC UA server implementation
find components/ -name "*opcua*" -type f
# Look at certificate management
grep -A 30 "class CertificateManager" components/security/encryption.py
Going deeper¶
Questions to explore:
How do you handle certificate expiry without downtime?
What’s your CA strategy (commercial, internal, self-signed)?
How do you revoke compromised certificates?
How do you handle legacy clients that don’t support encryption?
Advanced options:
Deploy internal PKI with proper CA
Implement automated certificate renewal
Deploy certificate monitoring and alerting
Implement certificate pinning for additional security
Test different security policies (Basic128Rsa15 vs Basic256Sha256)
Measure and optimise performance
Implement hardware security modules (HSMs) for key storage
Challenge 8: Implement jump host architecture¶
The problem: Administrative access comes from anywhere on the corporate network. Compromised workstation = compromised OT. No centralised access control or monitoring.
Your goal: Deploy jump host (bastion) architecture. All administrative OT access flows through one controlled point.
What you can do¶
Design the architecture:
Before:
Corporate Network ──→ Turbine PLC
├─→ Reactor PLC
├─→ SCADA
└─→ Safety PLC
After:
Corporate Network ──→ Jump Host ──→ Turbine PLC
├─→ Reactor PLC
├─→ SCADA
└─→ Safety PLC
Direct access: BLOCKED by firewall
Deploy jump host:
Hardened Linux server or Windows bastion
Minimal software installed
Strong authentication (certificates or MFA)
Session recording enabled
Logging all access
Configure firewall rules:
# Block direct access from corporate to OT
iptables -A FORWARD -s 192.168.1.0/24 -d 192.168.100.0/24 -j DROP
# Allow jump host to OT
iptables -A FORWARD -s 192.168.1.50 -d 192.168.100.0/24 -j ACCEPT
# Allow corporate to jump host
iptables -A FORWARD -s 192.168.1.0/24 -d 192.168.1.50 -j ACCEPT
Integrate authentication:
# Jump host authenticates using AuthenticationManager
# Records all sessions
# Enforces authorisation before allowing connections
Create break-glass procedure:
What happens when jump host fails?
Emergency bypass procedure
Documented, audited, infrequent
Temporary firewall rule modification
Automatic revert after emergency
Test it¶
Access control testing:
# Try direct access to PLC - should be blocked
telnet 192.168.100.10 502
# Try via jump host - should succeed
ssh jump-host
telnet 192.168.100.10 502
Bypass testing:
Can you bypass jump host?
Spoof source IP?
Use different protocol?
Find misconfigured firewall rule?
Failure scenario testing:
Stop jump host service
Can you still access OT? (should not, except emergency)
Activate break-glass procedure
Verify emergency access works
Verify automatic revert
Usability testing:
How does this affect operator workflow?
How long does it take to connect?
Is it practical for frequent access?
Do people try to work around it?
What you can learn¶
Single point of failure:
Jump host down = no administrative access
Need high availability (redundant jump hosts)
Need emergency procedures
But emergency procedures can be abused
Centralised control benefits:
All access logged in one place
Consistent authentication and authorisation
Session recording for audit
Easier to monitor for abuse
Usability impact:
Extra hop for every connection
More complex for users
Resistance from operators
Training required
Break-glass procedures:
Need emergency access mechanism
But emergency access can be abused
Need monitoring and audit
Difficult balance
Where to start¶
# This challenge requires architectural planning
# No single component to use - you're building infrastructure
# Consider jump host software options:
# - SSH bastion with session recording
# - RDP gateway
# - PAM solution (Privileged Access Management)
# Plan firewall rules
# Map current access patterns
# Design new access patterns through jump host
# Test in lab before production
# Read about jump host patterns
# Search: "bastion host OT" "jump server ICS"
Going deeper¶
Questions to explore:
How do you make jump host highly available?
What’s the monitoring strategy for jump host?
How do you handle vendor remote access?
What about third-party vendors who need temporary access?
Advanced options:
Deploy redundant jump hosts for HA
Implement PAM solution with full session recording
Deploy jump host in DMZ for vendor access
Implement just-in-time access (request approval, get temporary access)
Deploy different jump hosts for different privilege levels
Implement geofencing (only allow access from specific locations)
Challenge 9: Network segmentation (IEC 62443 zones)¶
The problem: Everything is on one flat network. Compromised corporate workstation = access to safety systems. No network-level isolation. One breach compromises everything.
Your goal: Design and implement zone-based architecture following IEC 62443. Separate safety from production. Isolate corporate IT from OT.
What you can do¶
Design zone architecture:
IEC 62443 Zones:
Level 3 (Enterprise): Corporate IT, ERP, Email
↓ (Conduit: DMZ with firewalls)
Level 2 (Supervision): SCADA, HMI, Historian
↓ (Conduit: Industrial firewall)
Level 1 (Control): PLCs, Controllers
↓ (Conduit: Process network)
Level 0 (Process): Sensors, Actuators, Field devices
Safety Zone (parallel): Safety PLC, Safety I/O
↓ (Isolated, minimal conduits)
Map systems to zones:
Level 0: Turbine sensors and actuators, reactor instrumentation
Level 1: Turbine PLCs, Reactor PLC
Level 2: SCADA servers (primary and backup), HMIs
Level 3: Engineering workstations, management systems
Safety: Safety PLC (separate zone, minimal connectivity)
Define conduits (allowed communications):
Allowed:
- Level 2 → Level 1: SCADA reads PLC data, HMI writes setpoints
- Level 3 → Level 2: Engineering access to SCADA (via jump host)
- Level 1 → Level 0: PLC controls field devices
Blocked:
- Level 3 → Level 1: No direct engineering to PLC
- Level 3 → Level 0: No direct corporate to field devices
- Any → Safety Zone: Minimal, tightly controlled
Exception: Safety Zone → Level 1: Safety interlocks
Implement segmentation:
Option 1: VLANs with Layer 3 routing and firewall
Option 2: Physical network separation
Option 3: Industrial firewalls between zones
Configure firewall rules:
# Example rules (simplified)
# Level 2 to Level 1: Allow Modbus, S7
iptables -A FORWARD -s 192.168.2.0/24 -d 192.168.1.0/24 -p tcp --dport 502 -j ACCEPT
iptables -A FORWARD -s 192.168.2.0/24 -d 192.168.1.0/24 -p tcp --dport 102 -j ACCEPT
# Level 3 to Level 2: Allow OPC UA via jump host only
iptables -A FORWARD -s 192.168.3.50 -d 192.168.2.0/24 -p tcp --dport 4840 -j ACCEPT
iptables -A FORWARD -s 192.168.3.0/24 -d 192.168.2.0/24 -j DROP
# Safety zone: tightly restricted
iptables -A FORWARD -d 192.168.99.0/24 -j DROP # Default deny
iptables -A FORWARD -s 192.168.99.10 -d 192.168.1.0/24 -p tcp --dport 502 -j ACCEPT # Only safety PLC to control PLCs
Test it¶
Segmentation testing:
# From corporate (Level 3), try to reach PLC (Level 1) - should fail
ping 192.168.1.10
# From SCADA (Level 2), try to reach PLC - should succeed
ping 192.168.1.10
# From anywhere, try to reach safety zone - should fail
ping 192.168.99.10
Pivot testing:
Compromise corporate workstation (Level 3)
Can you reach Level 2?
Can you reach Level 1?
Can you reach safety zone?
Where does segmentation stop you?
Operational testing:
Can operators use HMI?
Can engineers program PLCs?
Can maintenance access systems?
What workflows break?
Legitimate cross-zone requirements:
Historian needs data from all PLCs
Engineering needs to program PLCs
Vendor needs remote access
How do you handle these?
What you can learn¶
Zone architecture is complex:
Every system needs to be in a zone
Every communication needs to be in a conduit
Exceptions multiply
Change management becomes critical
Operational impact is huge:
Workflows change
Some things become harder
Need new procedures
Training required
Perfect segmentation is impossible:
Always need some cross-zone communication
Historian, engineering access, vendor access
Each conduit is a potential attack path
Defence in depth, not perfect isolation
Implementation challenges:
Existing infrastructure wasn’t designed for zones
Retrofitting is expensive
Switch/firewall replacements
Cable runs
Downtime for cutover
Trade-offs everywhere:
Security vs operational flexibility
Isolation vs necessary communication
Cost vs protection
Complexity vs usability
Where to start¶
# This is the most complex challenge
# Start with planning, not implementation
# Step 1: Map current network
# - What systems exist?
# - How do they communicate?
# - What protocols?
# - Draw current architecture
# Step 2: Design zones
# - Assign each system to a zone
# - Define security requirements per zone
# - Identify required conduits
# Step 3: Plan implementation
# - What hardware is needed?
# - What changes to systems?
# - Downtime requirements?
# - Testing approach?
# Read IEC 62443 documentation
# Search: "IEC 62443 zones and conduits"
Going deeper¶
Questions to explore:
How do you handle systems that span zones (historian)?
What’s the firewall change management process?
How do you test firewall rules without breaking production?
How do you handle new systems (which zone? which conduits)?
Advanced options:
Deploy industrial firewalls with deep packet inspection
Implement unidirectional gateways for critical isolation
Deploy DMZ for vendor remote access
Implement micro-segmentation within zones
Deploy IDPS (Intrusion Detection/Prevention) at zone boundaries
Implement protocol whitelisting at firewalls
Deploy application-layer gateways for protocol inspection
Phased implementation: Don’t try to do everything at once. Implement in phases:
Phase 1: Separate corporate (Level 3) from OT (Level 1-2)
Phase 2: Separate SCADA (Level 2) from PLCs (Level 1)
Phase 3: Isolate safety zone
Phase 4: Micro-segmentation within zones
Test each phase thoroughly before proceeding.
Combining architectural challenges¶
If you’re ambitious, implement all three:
Encryption: SCADA communications are confidential
Jump host: Administrative access is centralised
Segmentation: Zones limit lateral movement
The result: Defence in depth architecture
Network segmentation limits attack surface
Jump host controls and monitors administrative access
Encryption protects data in transit
Compromising one layer doesn’t compromise all
Test the combination:
Simulate attack from corporate network
How far can you get?
Which defences stop you?
What’s the attack path?
Understand the costs:
Implementation time (months)
Hardware costs (firewalls, switches, jump hosts)
Operational complexity
Maintenance burden
Training requirements
Is it worth it? Depends on your risk tolerance and asset value.
“Anyone can make a system secure by making it unusable. The skill is making it secure and usable. That requires architecture.” - Ponder Stibbons