£80 Per hour
Undetermined
Undetermined
England, United Kingdom
Summary: The Infrastructure & Machine Learning Engineer will be responsible for designing, implementing, and maintaining a secure and scalable infrastructure for the Digital Imaging System (DIS) and Traffic Management System (TMS). This role requires expertise in provisioning virtualized environments, particularly GPU-enabled systems, while ensuring security and observability. Candidates must possess SC level security clearance or the ability to obtain it. The position involves significant automation and integration with software development processes.
Key Responsibilities:
- Design and implement infrastructure components in line with the DIS Low Level Design (LLD).
- Operating system configuration (Ubuntu, RHEL).
- User account and permissions management.
- Virtual GPU (vGPU) provisioning and dynamic scaling.
- Secure networking, firewall rules, and HTTPS configuration.
- Centralized logging (e.g., rsyslog) and integration with monitoring platforms like SCOM.
- Security hardening, including vulnerability mitigation, certificate management, and secure coding validation.
- Author and maintain Ansible playbooks and Terraform scripts for infrastructure provisioning and automation.
- Contribute to and maintain configuration specifications for development, reference, and production environments.
- Support incremental deployments and environment validations.
- Integration with software development, CI/CD pipelines, and baseline/release management processes.
- QA and acceptance testing alongside development teams.
- Document all configuration parameters and deployment steps.
- Ensure progress and deliverables are tracked and reported through the client’s JIRA task management tool.
Key Skills:
- Operating Systems: Ubuntu and Red Hat Enterprise Linux (RHEL).
- Infrastructure Automation: Configuration Management: Ansible.
- Infrastructure-as-Code: Terraform.
- Containerization & Orchestration: Docker, Kubernetes.
- Virtualization: vGPU provisioning and tuning for ML workloads.
- Experience with dynamic virtual resource management.
- Security: OS and application-level security hardening.
- SSL/TLS certificate management and secure endpoints (e.g., HTTPS).
- Code and system-level vulnerability analysis.
- Networking: Virtual switches, firewalls, secure ingress/egress controls.
- Integration of application-level networking with underlying infrastructure.
- Monitoring & Logging: rsyslog, SCOM or similar enterprise tools.
Salary (Rate): £80.00/hr
City: undetermined
Country: United Kingdom
Working Arrangements: undetermined
IR35 Status: undetermined
Seniority Level: undetermined
Industry: IT
Infrastructure & Machine Learning Engineer – DIS & TMS Subsystem Deployment
Project Overview
We are seeking a highly capable and security-minded Infrastructure & Machine Learning Contractor to build and deliver the secure, scalable, and automated infrastructure required to host the Digital Imaging System (DIS) and Traffic Management System (TMS) components. The role involves provisioning and tuning virtualized environments—especially GPU-enabled systems—alongside ensuring end-to-end security, observability, and maintainability. Due to the nature of the role we require candidates with SC level security clearance or those able to obtain it.
Primary Objective
To design, implement, and maintain a secure and dynamically scalable infrastructure platform that supports the full deployment lifecycle of the DIS subsystem (including TMS components), with specific attention to virtualization, networking, OS hardening, and system automation.
Key Responsibilities
- Design and implement infrastructure components in line with the DIS Low Level Design (LLD) , including:
- Operating system configuration (Ubuntu, RHEL)
- User account and permissions management
- Virtual GPU (vGPU) provisioning and dynamic scaling
- Secure networking, firewall rules, and HTTPS configuration
- Centralized logging (e.g., rsyslog) and integration with monitoring platforms like SCOM
- Security hardening, including vulnerability mitigation, certificate management, and secure coding validation
- Author and maintain Ansible playbooks and Terraform scripts for infrastructure provisioning and automation
- Contribute to and maintain configuration specifications for:
- Development environment
- Reference environment
- Production environments (2 servers)
- Support:
- Incremental deployments and environment validations
- Integration with software development, CI/CD pipelines, and baseline/release management processes
- QA and acceptance testing alongside development teams
- Document all configuration parameters and deployment steps
- Ensure progress and deliverables are tracked and reported through the client’s JIRA task management tool
Required Skills & Experience
- Operating Systems : Ubuntu and Red Hat Enterprise Linux (RHEL)
- Infrastructure Automation : Configuration Management: Ansible
- Infrastructure-as-Code: Terraform
- Containerization & Orchestration : Docker, Kubernete
- Virtualization : vGPU provisioning and tuning for ML workloads
- Experience with dynamic virtual resource management
- Security : OS and application-level security hardening
- SSL/TLS certificate management and secure endpoints (e.g., HTTPS)
- Code and system-level vulnerability analysis
- Networking : Virtual switches, firewalls, secure ingress/egress controls
- Integration of application-level networking with underlying infra
- Monitoring & Logging : rsyslog, SCOM or similar enterprise tools
Preferred Qualifications
- Experience supporting machine learning environments in production
- Familiarity with hybrid deployments , including both on-prem and cloud-based setups
- Prior experience with DIS/TMS or similar real-time, safety-critical systems
- Strong communication and documentation skills