Python Developer

Posted 1 week ago by Andela

Apply

Negotiable

Undetermined

EMEA

Apply

Summary: The Python Developer role at Andela focuses on building scalable data ingestion pipelines and extracting structured content from complex documents, particularly PDFs and scanned materials. This backend-oriented position requires significant experience in document processing and aims to support GenAI applications by ensuring high-quality data extraction. The role emphasizes collaboration within a cross-functional team and integration with AWS infrastructure. Candidates should possess strong Python skills and familiarity with OCR tools and document processing libraries.

Key Responsibilities:

Design and implement robust data extraction pipelines to process diverse document types, especially PDFs with both text and scanned content.
Customize extraction logic per data source, including metadata extraction (e.g., machine IDs, customer information).
Work with document processing tools like Tesseract, Unstructured IO, or similar.
Integrate with AWS-based infrastructure, including Lambda and ECS for deployment.
Collaborate with a cross-functional team to onboard and validate new data sources.
Ensure the high accuracy and quality of extracted data to support downstream GenAI use.

Key Skills:

5–10 years of professional experience with Python, especially in backend or data engineering roles.
Strong hands-on experience with document content extraction, particularly from PDFs with complex formats (e.g., scanned images, drawings).
Familiarity with OCR tools (e.g., Tesseract) and content extraction libraries (e.g., Unstructured IO, pdfminer).
Proficient in building modular, production-grade Python code with data models and validation (e.g., Pydantic).
Working knowledge of AWS services, especially Lambda, ECS, and containerization with Docker.
Ability to quickly understand new data structures and design custom ingestion strategies.

Salary (Rate): undetermined

City: undetermined

Country: undetermined

Working Arrangements: undetermined

IR35 Status: undetermined

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

About Andela

Andela exists to connect brilliance and opportunity. Since 2014, we have been dedicated to breaking down global barriers and accelerating the future of work for both technologists and organizations around the world. For technologists, Andela offers competitive long-term career opportunities with leading organizations, access to a global community of professionals, and education opportunities with leading technology providers. For companies, Andela provides access to a global network of fully integrated team members that unlock their business innovation and growth potential. At Andela, we are deeply passionate about creating long-lasting and transformative growth opportunities for all and doing it in an E.P.I.C. way. We are excited to continue building our remote-first team with incredible people like you!

About the role

The role focuses on building scalable data ingestion pipelines and extracting structured content from complex, often unstructured documents, especially PDF reports, scanned documents, and technical drawings. You will play a key part in enabling the GenAI application to access and reason over new data sources. This is a backend-focused role, with responsibilities centered on content extraction and processing. While exposure to GenAI technologies is beneficial, the primary requirement is deep hands-on experience with PDF/document processing.

Responsibilities

Design and implement robust data extraction pipelines to process diverse document types, especially PDFs with both text and scanned content.
Customize extraction logic per data source, including metadata extraction (e.g., machine IDs, customer information).
Work with document processing tools like Tesseract, Unstructured IO, or similar.
Integrate with AWS-based infrastructure, including Lambda and ECS for deployment.
Collaborate with a cross-functional team to onboard and validate new data sources.
Ensure the high accuracy and quality of extracted data to support downstream GenAI use.

Qualifications

5–10 years of professional experience with Python, especially in backend or data engineering roles.
Strong hands-on experience with document content extraction, particularly from PDFs with complex formats (e.g., scanned images, drawings).
Familiarity with OCR tools (e.g., Tesseract) and content extraction libraries (e.g., Unstructured IO, pdfminer).
Proficient in building modular, production-grade Python code with data models and validation (e.g., Pydantic).
Working knowledge of AWS services, especially Lambda, ECS, and containerization with Docker.
Ability to quickly understand new data structures and design custom ingestion strategies.

Preferred Qualifications

Prior experience working on GenAI or LLM-powered applications, especially in document understanding or search contexts.
Experience with AWS Textract or Azure Document Intelligence for cloud-based content extraction.
Familiarity with chunking strategies and data preparation for vector databases (e.g., for retrieval-augmented generation).
Experience in fast-paced, deadline-driven projects and ability to deliver with minimal supervision.
Comfortable working in globally distributed teams, with flexibility to align with European time zones.
Overlap Hours: 5-8 hours with CET (UTC+2)

At Andela, we outcompete through diversity. We know that our strengths lie in the multiplicity of talents, perspectives, backgrounds, and orientations of residents in our community and we take pride in that. Andela is committed to a work environment in which all individuals are treated with respect and dignity. Each individual has the right to work in a professional atmosphere that promotes equal employment opportunities and prohibits discriminatory practices. Andela provides equal employment opportunities and workplace to all employees and applicants without regard to factors including but not limited to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, pregnancy (including breastfeeding), genetic information, HIV/AIDS or any other medical status, family or parental status, marital status, amnesty or status as a covered veteran in accordance with applicable federal, state and local laws. This commitment applies to all terms and conditions of employment, including but not limited to hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training. Our policies expressly prohibit any form of harassment and/or discrimination as stated above. Andela is home for all, come as you are.

Apply

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

InsideIR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)

National Insurance

Holiday Pay

Expenses

Pensions

Maternity Pay

Sick Pay

What Is A Limited Company?

Limited Company vs Sole Trader

Incorporation

Taxes

Filing Responsibilities

Bookkeeping

Insurance

Expenses

Buying a Car or Van

Capital Allowances

Benefits In Kind

Pensions

Employing A Spouse

Managing Excess Money

Dormant Companies

Closing Your Company

Withdrawing Money

Business Asset Disposal Relief

How To Become A Contractor

Inside IR35 Checklist

Outside IR35 Checklist

Self-Assessment Tax Returns

Mortgages

Pensions

Working Multiple Contracts

What is the £100k Abatement?

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

InsideIR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)