HPC Specialist - Lustre Cray Linux InfiniBand

HPC Specialist - Lustre Cray Linux InfiniBand

Posted Today by ComTech Europe Ltd

Negotiable
Undetermined
Remote
United Kingdom

Summary: The HPC Specialist role focuses on deploying and optimizing Lustre file system architectures for a large-scale HPC project. The consultant will work closely with end-customers to design and implement file systems while automating processes and managing data across extensive infrastructures. Proficiency in scripting and experience with various databases and storage systems are essential for success in this position. The role is fully remote and offers an initial three-month contract with potential for extension.

Key Responsibilities:

  • Design and optimize Lustre file system architectures.
  • Collaborate with end-customers to deploy file systems and manage directories, including applying ACLs and node maps.
  • Automate mass updates, permission changes, and data movement across large-scale data lakes using scripting.
  • Manage HPC file infrastructures and ensure efficient access control in multi-user environments.
  • Work with high-speed interconnects such as InfiniBand, focusing on topology design and performance tuning.

Key Skills:

  • In-depth knowledge of Lustre file systems and Cray storage systems.
  • Proficient in automation and scripting languages such as Ansible, Python, Perl, and Shell.
  • Strong understanding of Linux ACLs, SELinux, and nodemap configuration.
  • Experience with databases like MySQL, PostgreSQL, and MongoDB.
  • Familiarity with data fabric and interconnect technologies, particularly InfiniBand and RDMA.

Salary (Rate): undetermined

City: undetermined

Country: United Kingdom

Working Arrangements: remote

IR35 Status: undetermined

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

HPC Specialist - Lustre, Cray, Linux & InfiniBand Job Description: My client are looking for an experienced HPC specialist for a large scale HPC project. The consultant will require an indepth understanding and have deployment experience of Lustre file system Architectures. The consultant will require experience in the following areas.

  • Lustre File System Design & Optimization: In-depth understanding and deployment of Lustre architectures. Able to work with the end-customer in designing and deploying file and directories including applying ACLs node maps.
  • Automation & Scripting: Proficient in Scripting for automating mass updates, permission changes, and data movement across large-scale data lakes. Advanced Scripting, and large-scale data management: Ansible, Puppet, Salt, Chef, with experience of the following languages: Python, Perl, Shell
  • Databases: MySQL, PostgreSQL, MongoDB
  • Petabyte-Scale Storage Deployment: Hands-on experience with managing HPC file infrastructures. Knowledge of Cray storage systems would be an advantage. Familiarity ZFS, and parallel file system integration.
  • Linux Permissions & Nodemaps: Strong skills in Linux ACLs, SELinux, and nodemap configuration for secure and efficient access control. Experience in multi-user, multi-tenant environments with complex permission hierarchies.
  • Data Fabric & Interconnects: Working knowledge of high-speed interconnects such as InfiniBand, including topology design and performance tuning. Understanding of RDMA, fabric management tools, and integration with storage and compute nodes.

Skills required: Redhat Openshift Lustre, NFS File systems Cray Storage Systems Automation & Scripting Ansible, Python, Perl, Shell Linux ACLs, SELinux, Nodemap Datafabric InfiniBand, RDMA

Location: The role can be completed 100% remotely. Start: The role is to start Immediately and the contract will initially be for 3 months with potential to extend.