BCDR (Business Continuity & Disaster Recovery) Specialist
Posted 4 days ago by Investigo Change Solutions
Negotiable
Inside
Hybrid
London, UK
Summary: The BCDR Specialist will provide expert guidance and hands-on support to ensure the resilience and operational continuity of a complex solution platform, including AI-driven components and third-party integrations. This role involves evaluating existing architectures, identifying gaps, and delivering evidence for governance and regulatory review. The position requires a strong focus on business continuity and disaster recovery planning, testing, and documentation. The specialist will work in a hybrid arrangement, primarily based in Reading, UK.
Key Responsibilities:
- Assess the platform's ability to meet defined availability, resilience, and service continuity requirements.
- Validate clear and defensible RTO (Recovery Time Objective) and RPO (Recovery Point Objective) definitions aligned to business criticality.
- Review architectural decisions and operational controls related to high availability, failover, redundancy, and disaster recovery.
- Identify single points of failure across the end-to-end solution, covering AI agents, orchestration, data pipelines, and cloud services.
- Conduct structured failure analysis and propose mitigation strategies.
- Evaluate and enhance BC/DR plans, procedures, and supporting documentation.
- Ensure runbooks are complete, actionable, and test-ready.
- Produce updated BC/DR playbooks suitable for both technical and operational audiences.
- Design and oversee BC/DR testing, including failover, restore, and resilience simulations.
- Validate backup and restore integrity, especially for ML/AI model artefacts and metadata.
- Produce credible resilience evidence to satisfy governance, audit, risk management, and compliance functions.
- Contribute to submission materials for internal approvals prior to production deployment.
- Map controls to applicable frameworks (e.g., ISO 22301, FCA/PRA operational resilience guidelines).
- Deliver resilience assurance reports, single points of failure registers, updated BC/DR plans, and test evidence packs.
- Provide recommendations for architectural or operational improvements.
Key Skills:
- Proven experience delivering BCDR programmes within cloud-native or hybrid cloud environments (Azure/AWS/GCP).
- Strong understanding of high availability architectures, data backup/restore strategies, and distributed systems resilience.
- Knowledge of AI/ML operationalisation considerations and orchestration platforms (Kubernetes/Microservices/Serverless).
- Experience with resilience features of major cloud providers.
- Demonstrable experience building, testing, and assuring BC/DR frameworks.
- Knowledge of industry standards such as ISO 22301, ISO 27031, NIST SP 800 34.
- Track record of delivering resilience artefacts for audit and risk review.
Salary (Rate): undetermined
City: London
Country: UK
Working Arrangements: hybrid
IR35 Status: inside IR35
Seniority Level: undetermined
Industry: IT
Business Continutiy and Disaster Recovery Specialist
Inside IR35
Hybrid- Reading 2 Days Per Week
The BCDR Specialist will provide expert guidance, assessment, and hands on support to ensure the resilience, recoverability, and operational continuity of a complex solution platform, including AI-driven components, orchestration layers, data services, and third party integrations. The specialist will evaluate existing architectures, processes, and controls, remediate gaps, and deliver evidence suitable for internal governance, audit, and regulatory review.
Key Responsibilities
Availability & Resilience Assurance
- Assess the platform's ability to meet defined availability, resilience, and service continuity requirements.
- Validate clear and defensible RTO (Recovery Time Objective) and RPO (Recovery Point Objective) definitions aligned to business criticality.
- Review architectural decisions and operational controls related to high availability, failover, redundancy, and disaster recovery.
Risk & Failure Mode Identification
- Identify single points of failure across the end-to-end solution, covering:
- AI agents and inference services
- Orchestration and workflow engines
- Data pipelines, storage, and backup strategies
- Cloud services, APIs, and 3rd party dependencies
- Conduct structured failure analysis and propose mitigation strategies.
BC/DR Planning & Documentation
- Evaluate and enhance BC/DR plans, procedures, and supporting documentation.
- Ensure runbooks are complete, actionable, and test-ready, including steps covering:
- AI specific failure modes
- Model or agent recovery
- Reintegration of data services and dependent workloads
- Produce updated BC/DR playbooks suitable for both technical and operational audiences.
Testing & Validation
- Design and oversee BC/DR testing, including failover, restore, and resilience simulations.
- Validate backup and restore integrity, especially for ML/AI model artefacts and metadata.
- Ensure testing meets organisational policy, industry standards, and regulatory expectations.
Governance, Audit & Regulatory Evidence
- Produce credible resilience evidence to satisfy governance, audit, risk management, and compliance functions.
- Contribute to submission materials for internal approvals prior to production deployment.
- Map controls to applicable frameworks (eg, ISO 22301, FCA/PRA operational resilience guidelines, NIST, cloud provider best practices).
Deliverables
- Resilience Assurance Report identifying compliance against RTO/RPO and availability requirements.
- Single Points of Failure Register with remediation plan.
- Updated BC/DR Plans and Technical Runbooks (AI-inclusive).
- Test Plan & Test Evidence Pack covering DR scenarios and outcomes.
- Production Readiness Resilience Pack for governance/audit sign off.
- Recommendations for architectural or operational improvements.
Required Skills & Experience
Technical Expertise
- Proven experience delivering BCDR programmes within cloud-native or hybrid cloud environments (Azure/AWS/GCP).
- Strong understanding of:
- High availability architectures
- Data backup/restore strategies
- Distributed systems resilience
- AI/ML operationalisation considerations
- Orchestration platforms (Kubernetes/Microservices/Serverless)
- Experience with resilience features of major cloud providers (regions, zones, failover models, managed services).
BC/DR & Risk Management
- Demonstrable experience building, testing, and assuring BC/DR frameworks.
- Knowledge of industry standards such as ISO 22301, ISO 27031, NIST SP 800 34, and UK regulatory expectations for operational resilience.
- Track record of delivering resilience artefacts for audit and risk review.
At Investigo, we make recruitment feel easy.
Let's keep this simple. We're all about your success, as your success is our business. We are part of The IN Group, a collection of six award-winning specialist brands that supply the globe with end-to-end talent solutions. With recruitment at the core of our business, we've been connecting people since 2003.
Data & Privacy
By applying, you consent to Investigo collecting and processing your data for the purpose of recruitment and placement, in accordance with applicable data protection laws. For more information, please refer to our Privacy Notice on our website