Negotiable
Outside
Remote
USA
Summary: The Flink Engineer role focuses on developing and maintaining applications using Apache Flink, specifically the DataStreams API. The position requires extensive experience in managing Flink deployments, integrating with various data sources, and ensuring high availability and disaster recovery setups. The engineer will also be responsible for architecting Flink clusters in a Kubernetes environment and implementing CI/CD pipelines. This role can be performed remotely, with a preference for candidates based in Dallas, TX.
Key Responsibilities:
- Build and maintain Flink applications using DataStreams API
- Implement Flink process functions, aggregators, and watermarking strategies
- Manage stateful streaming applications using RocksDB and Azure Data Lake (ADLS)
- Integrate Flink jobs with Kafka, EventHub, and MongoDB
- Architect and manage Flink clusters in AKS with Kubernetes-based deployment models
- Configure application/session deployments, task/job managers, and memory optimization
- Set up HA/DR, observability, and AutoPilot for self-healing infrastructure
- Implement deployment pipelines using ArgoCD, integrate logging and monitoring agents
- Provide visibility and access through Flink Dashboard and monitoring platforms like Dynatrace
Key Skills:
- 3+ years of hands-on experience with Apache Flink, specifically the DataStreams API
- Proven track record of production-grade Flink deployments, with case studies or documentation
- Currently supporting at least one active client using Flink DataStreams API
- Strong knowledge of state management using checkpoints and savepoints (local storage & ADLS)
- Experience configuring Flink connectors like Azure EventHub, Kafka, and MongoDB
- Expertise in Flink aggregators, watermarks, and handling out-of-order events
- Built and deployed private Flink clusters in AKS, including session-based and application-type deployments
- Hands-on experience managing Job Managers, Task Managers, and cluster resources
- Experience configuring RocksDB, heap memory, state recovery, and Auto-Pilot
- Integrated Flink with external tools: ArgoCD (for deployments), Dynatrace, and LTM logging agents
- Familiarity with Flink Dashboard, High Availability (HA), and Disaster Recovery (DR) setups
- 7+ years of experience in backend/distributed systems engineering
- 3-5 years of experience with Kafka, Azure EventHub, or similar platforms
- 2+ years managing cloud-native applications on AKS or Kubernetes
- Strong background in CI/CD, infrastructure as code (IaC), and cloud monitoring
- Excellent communication, technical leadership, and documentation skills
Salary (Rate): undetermined
City: undetermined
Country: USA
Working Arrangements: remote
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: IT
3+ years of hands-on experience with Apache Flink, specifically the DataStreams API
Proven track record of production-grade Flink deployments, with case studies or documentation
Currently supporting at least one active client using Flink DataStreams API
Strong knowledge of state management using checkpoints and savepoints (local storage & ADLS)
Experience configuring Flink connectors like Azure EventHub, Kafka, and MongoDB
Expertise in Flink aggregators, watermarks, and handling out-of-order events
Built and deployed private Flink clusters in AKS, including session-based and application-type deployments
Hands-on experience managing Job Managers, Task Managers, and cluster resources
Experience configuring RocksDB, heap memory, state recovery, and Auto-Pilot
Integrated Flink with external tools: ArgoCD (for deployments), Dynatrace, and LTM logging agents
Familiarity with Flink Dashboard, High Availability (HA), and Disaster Recovery (DR) setups
Core Responsibilities:
Functional:
Build and maintain Flink applications using DataStreams API
Implement Flink process functions, aggregators, and watermarking strategies
Manage stateful streaming applications using RocksDB and Azure Data Lake (ADLS)
Integrate Flink jobs with Kafka, EventHub, and MongoDB
Infrastructure & Platform:
Architect and manage Flink clusters in AKS with Kubernetes-based deployment models
Configure application/session deployments, task/job managers, and memory optimization
Set up HA/DR, observability, and AutoPilot for self-healing infrastructure
Implement deployment pipelines using ArgoCD, integrate logging and monitoring agents
Provide visibility and access through Flink Dashboard and monitoring platforms like Dynatrace
Qualifications:
7+ years of experience in backend/distributed systems engineering
3+ years of hands-on experience with Apache Flink (DataStreams API)
3-5 years of experience with Kafka, Azure EventHub, or similar platforms
2+ years managing cloud-native applications on AKS or Kubernetes
Strong background in CI/CD, infrastructure as code (IaC), and cloud monitoring
Excellent communication, technical leadership, and documentation skills
Key Deliverables:
Fully deployed, production-grade Flink applications with logging and monitoring
Scalable and highly available Flink infrastructure with HA/DR configurations
Automated deployment processes via ArgoCD, integrated with Dynatrace and LTM
Clear documentation and ongoing platform support