Apache Druid

Apache Druid

Posted 2 weeks ago by 1755069333

Negotiable
Outside
Remote
USA

Summary: The role of Apache Druid involves ensuring the high availability and reliability of production systems while managing and optimizing Apache Druid clusters. The position requires implementing automation and Infrastructure as Code (IaC) practices, as well as designing and maintaining data orchestration workflows using Apache Airflow. Collaboration with various teams and effective communication regarding system status and incidents are also key aspects of this role.

Key Responsibilities:

  • Ensure high availability and reliability of production systems.
  • Implement and maintain robust monitoring and alerting systems.
  • Participate in on-call rotations to respond to incidents and outages.
  • Conduct post-incident reviews and implement preventative measures.
  • Automate infrastructure provisioning, configuration, and deployment using IaC tools (e.g., Terraform, Ansible).
  • Develop and maintain CI/CD pipelines to streamline software releases.
  • Optimize and automate data pipelines and workflows.
  • Manage and optimize Apache Druid clusters for high performance and scalability.
  • Troubleshoot Druid performance issues and implement solutions.
  • Design and implement Druid data ingestion and query optimization strategies.
  • Design, develop, and maintain Airflow DAGs for data orchestration and workflow automation.
  • Monitor Airflow performance and troubleshoot issues.
  • Optimize Airflow workflows for efficiency and reliability.
  • Implement and maintain comprehensive monitoring and logging solutions (e.g., Prometheus, Grafana, ELK stack).
  • Analyze metrics and logs to identify performance bottlenecks and potential issues.
  • Create and maintain dashboards for visualizing system health and performance.
  • Collaborate with development, data, and operations teams to ensure smooth operations.
  • Communicate effectively with stakeholders regarding system status and incidents.
  • Document processes and procedures.

Key Skills:

  • Experience with Apache Druid and its management.
  • Proficiency in Infrastructure as Code (IaC) tools such as Terraform and Ansible.
  • Knowledge of CI/CD pipeline development and maintenance.
  • Experience with Apache Airflow for data orchestration.
  • Strong troubleshooting skills for performance issues.
  • Familiarity with monitoring and logging solutions (e.g., Prometheus, Grafana, ELK stack).
  • Ability to analyze metrics and logs for performance optimization.
  • Strong collaboration and communication skills.
  • Experience in documenting processes and procedures.

Salary (Rate): undetermined

City: undetermined

Country: USA

Working Arrangements: remote

IR35 Status: outside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

Position: Apache Druid

Location: Remote

Duration: Contract C2C

Job Description:

Ensure high availability and reliability of production systems.

Implement and maintain robust monitoring and alerting systems.

Participate in on-call rotations to respond to incidents and outages.

Conduct post-incident reviews and implement preventative measures.

Automation and Infrastructure as Code (IaC):

Automate infrastructure provisioning, configuration, and deployment using IaC tools (e.g., Terraform, Ansible).

Develop and maintain CI/CD pipelines to streamline software releases.

Optimize and automate data pipelines and workflows.

Apache Druid Management:

Manage and optimize Apache Druid clusters for high performance and scalability.

Troubleshoot Druid performance issues and implement solutions.

Design and implement Druid data ingestion and query optimization strategies.

Apache Airflow Orchestration:

Design, develop, and maintain Airflow DAGs for data orchestration and workflow automation.

Monitor Airflow performance and troubleshoot issues.

Optimize Airflow workflows for efficiency and reliability.

Monitoring and Logging:

Implement and maintain comprehensive monitoring and logging solutions (e.g., Prometheus, Grafana, ELK stack).

Analyze metrics and logs to identify performance bottlenecks and potential issues.

Create and maintain dashboards for visualizing system health and performance.

Collaboration and Communication:

Collaborate with development, data, and operations teams to ensure smooth operations.

Communicate effectively with stakeholders regarding system status and incidents.

Document processes and procedures

Thanks,

Nitesh