Negotiable
Outside
Remote
USA
Summary: Hiring a Data & ML Engineer to support the person matching and identity resolution workflows of the MPI initiative, leveraging Microsoft Fabric, Synapse, and ML capabilities. This role involves creating data pipelines, cleansing and linking records, and operationalizing ML-based entity resolution models.
Key Responsibilities:
- Build data pipelines and ML workflows within Microsoft Fabric for entity matching and deduplication across data domains.
- Implement and optimize MLOps pipelines (training, scoring, and retraining).
- Integrate data from multiple sources: CRM, EHRs, finance, HR, etc.
- Develop reusable modules for fuzzy matching, rule-based, and ML-based identity resolution.
- Collaborate with data scientists and SMEs to operationalize models using SynapseML, PySpark, or Azure ML.
Key Skills:
- 5+ years of experience in data engineering and machine learning in the Azure ecosystem.
- Proficient with Microsoft Fabric (Lakehouse, Pipelines, Notebooks), Synapse, and Azure ML.
- Solid understanding of identity resolution techniques, especially ML-based approaches.
- Strong programming skills in Python and PySpark.
- Familiarity with data privacy, governance, and ethics in ML.
Salary (Rate): undetermined
City: undetermined
Country: USA
Working Arrangements: remote
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: IT
Job Role: MPI Fabric Data & ML Engineer Location: 100% Remote
Duration: 6+ Months Contract
Summary:
Hiring a Data & ML Engineer to support the person matching and identity resolution workflows of the MPI initiative, leveraging Microsoft Fabric, Synapse, and ML capabilities. This role involves creating data pipelines, cleansing and linking records, and operationalizing ML-based entity resolution models.
Key Responsibilities:
- Build data pipelines and ML workflows within Microsoft Fabric for entity matching and deduplication across data domains.
- Implement and optimize MLOps pipelines (training, scoring, and retraining).
- Integrate data from multiple sources: CRM, EHRs, finance, HR, etc.
- Develop reusable modules for fuzzy matching, rule-based, and ML-based identity resolution.
- Collaborate with data scientists and SMEs to operationalize models using SynapseML, PySpark, or Azure ML.
Required Experience:
- 5+ years of experience in data engineering and machine learning in the Azure ecosystem.
- Proficient with Microsoft Fabric (Lakehouse, Pipelines, Notebooks), Synapse, and Azure ML.
- Solid understanding of identity resolution techniques, especially ML-based approaches.
- Strong programming skills in Python and PySpark.
- Familiarity with data privacy, governance, and ethics in ML.