Platform Resiliency Lead (Disaster Recovery & Business Continuity Resilience Management)-Remote
Posted 1 week ago by Cyber Sphere LLC
Negotiable
Undetermined
Remote
Remote
Summary: The Platform Resiliency Lead is responsible for driving enterprise-wide platform resilience, disaster recovery, and business technology resilience initiatives across critical digital platforms. This leadership role involves collaborating with various teams to ensure robust resilience controls are in place, minimizing technology disruption risks, and ensuring recovery objectives are achievable and measurable. The ideal candidate will have extensive experience in platform reliability and disaster recovery, particularly within MarTech environments. The position requires strong stakeholder management and the ability to lead high-pressure incident response efforts.
Key Responsibilities:
- Lead the implementation and continuous improvement of Business Technology Resilience (BTR) and Disaster Recovery (DR) governance across enterprise platforms.
- Embed resilience and governance controls into engineering workflows following a "Guardrails, Not Gates" approach.
- Ensure development pipelines, tooling, and deployment processes align with organizational standards, secure coding practices, and shift-left principles.
- Drive consistent, measurable, audit-ready resilience practices across engineering and platform teams.
- Own the creation, maintenance, and periodic review of Application and Platform Disaster Recovery Plans (A/PDRPs).
- Govern disaster recovery testing, including scenario-based exercises and validation of Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).
- Coordinate recovery planning with Infrastructure, Cloud, Network, Security, and Platform teams.
- Ensure all DR testing is documented with evidence capture, issue tracking, and remediation plans.
- Provide resilience leadership during major incidents and disaster recovery events.
- Support command-and-control execution during crisis situations.
- Lead post-incident reviews and ensure lessons learned translate into platform, tooling, and process improvements.
- Track resilience maturity across platforms using standardized KPIs and compliance metrics.
- Partner with Engineering and Site Reliability Engineering (SRE) teams to improve platform observability, backup and recovery strategies, failover capabilities, and controlled recovery mechanisms.
- Promote resilience awareness through documentation, templates, training, and best practices.
Key Skills:
- 10+ years of experience in Platform Reliability, Disaster Recovery, Business Continuity, or Resilience Engineering.
- Experience leading resilience or disaster recovery programs across multiple enterprise applications and platforms.
- Experience supporting cloud-based or digital transformation environments.
- Experience working within regulated or audit-intensive organizations.
- MarTech platform experience or Disaster Recovery / Business Continuity expertise is required (one of the two is mandatory).
- Strong understanding of Disaster Recovery planning, Recovery Time Objective (RTO), Recovery Point Objective (RPO), backup and recovery strategies, recovery sequencing, and dependency management.
- Knowledge of cloud and SaaS recovery patterns.
- Familiarity with enterprise platforms such as Salesforce Marketing Cloud, Salesforce Loyalty, Salesforce Service Cloud, Aprimo, Treasure Data, and Okta.
- Ability to communicate technical resilience topics to business and executive stakeholders.
- Strong stakeholder management and cross-functional collaboration skills.
- Ability to influence teams without direct authority.
- Excellent governance, documentation, and audit support experience.
- Comfortable leading high-pressure incident response and recovery efforts.
Salary (Rate): undetermined
City: undetermined
Country: undetermined
Working Arrangements: remote
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
Detailed Description From Employer:
Title - Platform Resiliency Lead (Disaster Recovery & Business Continuity Resilience Management)
Location: Remote Must be available to work with global teams, including maintaining 4 6 hours of overlap with US Eastern Time (ET) and collaborating regularly with teams across the US, UK, and Europe.
Duration Contract
Experience: 10+ Years
Domain Preference: MarTech and/or Disaster Recovery (DR) / Business Continuity & Resilience Management (BCRM)
Key Technologies: Salesforce Marketing Cloud, Salesforce Loyalty, Salesforce Service Cloud, Aprimo, Treasure Data, Okta
Job Summary
We are seeking an experienced Platform Resiliency Lead to drive enterprise-wide platform resilience, disaster recovery, and business technology resilience initiatives across critical digital platforms. This leadership role is responsible for ensuring platforms are designed, implemented, tested, and operated with robust resilience controls aligned with business criticality and organizational standards.
The ideal candidate will partner closely with Platform Owners, Engineering, Infrastructure, Security, Enterprise Architecture, and Business Continuity teams to minimize technology disruption risks while ensuring recovery objectives are achievable, measurable, and audit-ready.
Key Responsibilities
Platform Resilience & Governance
- Lead the implementation and continuous improvement of Business Technology Resilience (BTR) and Disaster Recovery (DR) governance across enterprise platforms.
- Embed resilience and governance controls into engineering workflows following a "Guardrails, Not Gates" approach.
- Ensure development pipelines, tooling, and deployment processes align with organizational standards, secure coding practices, and shift-left principles.
- Drive consistent, measurable, audit-ready resilience practices across engineering and platform teams.
Disaster Recovery Planning & Preparedness
- Own the creation, maintenance, and periodic review of Application and Platform Disaster Recovery Plans (A/PDRPs).
- Govern disaster recovery testing, including scenario-based exercises and validation of Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).
- Coordinate recovery planning with Infrastructure, Cloud, Network, Security, and Platform teams.
- Ensure all DR testing is documented with evidence capture, issue tracking, and remediation plans.
Incident & Recovery Leadership
- Provide resilience leadership during major incidents and disaster recovery events.
- Support command-and-control execution during crisis situations.
- Lead post-incident reviews and ensure lessons learned translate into platform, tooling, and process improvements.
Continuous Improvement
- Track resilience maturity across platforms using standardized KPIs and compliance metrics.
- Partner with Engineering and Site Reliability Engineering (SRE) teams to improve:
- Platform observability
- Backup and recovery strategies
- Failover capabilities
- Controlled recovery mechanisms
- Promote resilience awareness through documentation, templates, training, and best practices.
Key Stakeholders
- Platform & Product Owners
- Engineering & Site Reliability Engineering (SRE) Teams
- Infrastructure, Cloud, Network & Security Teams
- Business Continuity Management (BCM)
- Enterprise Architecture
- Risk & Compliance Teams
- Internal Audit & External Assurance Partners
Required Qualifications
Experience
- 10+ years of experience in Platform Reliability, Disaster Recovery, Business Continuity, or Resilience Engineering.
- Experience leading resilience or disaster recovery programs across multiple enterprise applications and platforms.
- Experience supporting cloud-based or digital transformation environments.
- Experience working within regulated or audit-intensive organizations.
- MarTech platform experience or Disaster Recovery / Business Continuity expertise is required (one of the two is mandatory).
Technical Skills
- Strong understanding of:
- Disaster Recovery planning
- Recovery Time Objective (RTO)
- Recovery Point Objective (RPO)
- Backup and recovery strategies
- Recovery sequencing
- Dependency management
- Knowledge of cloud and SaaS recovery patterns.
- Familiarity with enterprise platforms such as:
- Salesforce Marketing Cloud
- Salesforce Loyalty
- Salesforce Service Cloud
- Aprimo
- Treasure Data
- Okta
- Ability to communicate technical resilience topics to business and executive stakeholders.
Leadership Skills
- Strong stakeholder management and cross-functional collaboration.
- Ability to influence teams without direct authority.
- Excellent governance, documentation, and audit support experience.
- Comfortable leading high-pressure incident response and recovery efforts.
Preferred Qualifications
- Experience with Business Technology Resilience (BTR) frameworks.
- Knowledge of Business Continuity Management (BCM) practices.
- Exposure to Site Reliability Engineering (SRE) principles.
- Experience working with globally distributed engineering teams.
Regards,
Sai Srikar
Email: