Site Reliability Engineer - Automation Ansible Unix Jenkins Terraform - Investment Bank
Posted Today by Adlam Consulting Ltd on JobServe
£800 Per day
Inside
Onsite
London, UK
ESSENTIAL SRE wih DevOps tools & automation/monitoring
- Ansible
- Unix commands and Shell Scripts
- Jenkins - CI & CD pipeline experience using groovy scripts
- Terraform
- Min 5 years of Experience in SRE.
- Proven foundation in Linux administration and troubleshooting.
- Solid knowledge of APM Tools ie Dynatrace/AppDynamics
- Good understanding of Log aggregators ie Splunk/ELK
- Solid work experience with load balancers (L4 & L7) preferably Apache HTTP(d)
Nice to have
- Knowledge of Docker, Kubernetes
- Knowledge in OpenStack, Networking, Security or Storage is desirable.
- Solid experience in at least one Scripting language. Python preferred.
- Experience with building, operating, and maintaining scalable distributed systems, and with operations automation.
- Strong knowledge of DevOps methodology and toolsets
- Knowledge of cloud computing fundamental and Cloud native applications.
- Understanding of Service level agreements and objective
Knowledge/Skills/Experience Required:
- Design, develop and implement systems software/scripts that improve the stability, scalability, availability, and latency of the Risk system applications.
- Solve problems occurring with our highly available production systems and build solutions & automation using combination of Scripting & tooling to prevent them from happening again.
- Defines and drives adoption of a best-in-class monitoring framework to accomplish end-to-end flow monitoring and effective alerting.
- Monitoring system performance and capacity levels to ensure high availability of applications with minimal downtime.
- Build and run capacity tests to manage the growth of systems.
- Investigating any service disruptions or other service issues to identify their causes.
- Performing regular audits of Servers to check for signs of degradation or malfunction which involves infra hygiene and end of life.
- Conducting post-mortem examinations of failed systems to identify and address root cause.
- Accountable for maintenance and improvement of IT continuity strategies
- Accountable for generation, reporting and improvements of various Production KPIs, SLs and dashboards.
- Be an advocate of release engineering best practices such as ZERO Downtime, Canary release, Incremental rollouts etc.,
- Share the on-call rotation and be an escalation contact for incidents.
- Works with Development, DevOps and IT operational team throughout the Software Development Life Cycle to ensure sustainable software releases.
This role offers hybrid working 50 % and is inside IR35 Umbrella
Adlam Consulting operates as an Employment Agency & an Employment Business Applicants must be eligible to work in the specified location
Load balancer (Apache or nginx)