Negotiable
Outside
Hybrid
London and home, UK
Summary: The AI Prompt Engineer role focuses on designing and optimizing prompts for advanced language models, developing scalable GenAI workflows, and integrating LLMs into applications. The position requires a strong technical background in AI and machine learning, with responsibilities spanning from prompt engineering to deployment and infrastructure management. Candidates should possess a deep understanding of LLM behavior and experience with various AI tools and frameworks. This role is ideal for technically sharp individuals who are systems-minded and eager to innovate in the GenAI space.
Key Responsibilities:
- Design, test and optimize prompts for leading frontier models (GPT-4/5, Claude 3.x, Gemini 2.x, Mistral Large, LLaMA 3, Cohere Command R+, DeepSeek).
- Apply advanced prompting strategies: Chain-of-Thought, ReAct, Tree-of-Thoughts, Graph-of-Thoughts, Program-of-Thoughts, self-reflection loops, debate prompting and multi-agent orchestration (AutoGen/CrewAI).
- Build agentic workflows with tool calling, memory systems, retrieval pipelines and structured reasoning.
- Integrate LLMs into applications using LangChain, LlamaIndex, Haystack, AutoGen and OpenAI's Assistant API patterns.
- Build high-performance RAG pipelines using hybrid search, reranking, embedding optimization, chunking strategies and evaluation harnesses.
- Develop APIs, microservices and serverless workflows for scalable deployment.
- Work with AI+ML pipelines through Azure ML, AWS SageMaker, Vertex AI, Databricks, or Modal/Fly.io for lightweight LLM deployment.
- Utilize vector databases (Pinecone, Weaviate, Milvus, ChromaDB, pgVector) and embedding stores.
- Use AI-powered dev tools (GitHub Copilot, Cursor, Codeium, Aider, Windsurf) to accelerate iteration.
- Implement LLMOps/PromptOps using Weights & Biases, MLflow, LangSmith, LangFuse, PromptLayer, Humanloop, Helicone, Arize Phoenix.
- Benchmark and evaluate LLM systems using Ragas, DeepEval and structured evaluation suites.
- Containerize and deploy workloads with Docker, Kubernetes, KNative and managed inference endpoints.
- Optimize model performance with quantization, distillation, caching, batching and routing strategies.
Key Skills:
- Strong Python skills, with experience using Transformers, LangChain, LlamaIndex and the broader GenAI ecosystem.
- Deep understanding of LLM behavior, prompt optimization, embeddings, retrieval and data preparation workflows.
- Experience with vector DBs (FAISS, Pinecone, Milvus, Weaviate, ChromaDB).
- Hands-on knowledge of Linux, Bash/Powershell, containers and cloud environments.
- Strong communication skills, creativity and a systems-thinking mindset.
- Curiosity, adaptability and a drive to stay ahead of rapid advancements in GenAI.
- Experience with PromptOps & LLM Observability tools (PromptLayer, LangFuse, Humanloop, Helicone, LangSmith).
- Understanding of Responsible AI, model safety, bias mitigation, evaluation frameworks and governance.
- Background in Computer Science, AI/ML, Engineering, or related fields.
- Experience deploying or fine-tuning open-source LLMs.
Salary (Rate): undetermined
City: London
Country: UK
Working Arrangements: hybrid
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: IT
AI Prompt Engineer, Technically Sharp & Systems-Minded
You'll design and optimize prompts, architect LLM-powered systems and deploy scalable GenAI workflows that connect people and intelligent systems in new, high-impact ways.
THE ROLE
Prompting & Reasoning Systems
- Design, test and optimize prompts for leading frontier models (GPT-4/5, Claude 3.x, Gemini 2.x, Mistral Large, LLaMA 3, Cohere Command R+, DeepSeek).
- Apply advanced prompting strategies:
Chain-of-Thought, ReAct, Tree-of-Thoughts, Graph-of-Thoughts, Program-of-Thoughts, self-reflection loops, debate prompting and multi-agent orchestration (AutoGen/CrewAI). - Build agentic workflows with tool calling, memory systems, retrieval pipelines and structured reasoning.
GenAI Application Engineering
- Integrate LLMs into applications using LangChain, LlamaIndex, Haystack, AutoGen and OpenAI's Assistant API patterns.
- Build high-performance RAG pipelines using:
hybrid search, reranking, embedding optimization, chunking strategies and evaluation harnesses. - Develop APIs, microservices and serverless workflows for scalable deployment.
ML/LLM Engineering
- Work with AI+ML pipelines through Azure ML, AWS SageMaker, Vertex AI, Databricks, or Modal/Fly.io for lightweight LLM deployment.
- Utilize vector databases (Pinecone, Weaviate, Milvus, ChromaDB, pgVector) and embedding stores.
- Use AI-powered dev tools (GitHub Copilot, Cursor, Codeium, Aider, Windsurf) to accelerate iteration.
- Implement LLMOps/PromptOps using:
- Weights & Biases, MLflow, LangSmith, LangFuse, PromptLayer, Humanloop, Helicone, Arize Phoenix
- Benchmark and evaluate LLM systems using Ragas, DeepEval and structured evaluation suites.
Deployment & Infrastructure
- Containerize and deploy workloads with Docker, Kubernetes, KNative and managed inference endpoints.
- Optimize model performance with quantization, distillation, caching, batching and routing strategies.
EXPERIENCE
- Strong Python skills, with experience using Transformers, LangChain, LlamaIndex and the broader GenAI ecosystem.
- Deep understanding of LLM behavior, prompt optimization, embeddings, retrieval and data preparation workflows.
- Experience with vector DBs (FAISS, Pinecone, Milvus, Weaviate, ChromaDB).
- Hands-on knowledge of Linux, Bash/Powershell, containers and cloud environments.
- Strong communication skills, creativity and a systems-thinking mindset.
- Curiosity, adaptability and a drive to stay ahead of rapid advancements in GenAI.
BENEFICIAL
- Experience with PromptOps & LLM Observability tools (PromptLayer, LangFuse, Humanloop, Helicone, LangSmith).
- Understanding of Responsible AI, model safety, bias mitigation, evaluation frameworks and governance.
- Background in Computer Science, AI/ML, Engineering, or related fields.
- Experience deploying or fine-tuning open-source LLMs.
TECH STACK
LLMs: GPT-4/5, Claude 3.x, Gemini 2.x, Mistral Large, LLaMA 3, Cohere Command R+, DeepSeek
Frameworks: LangChain, LlamaIndex, Haystack, AutoGen, CrewAI
Tools: GitHub Copilot, Cursor, LangSmith, LangFuse, Weights & Biases, MLflow, Humanloop
Cloud: Azure ML, AWS SageMaker, Google Vertex AI, Databricks, Modal
Infra: Python, Docker, Kubernetes, SQL/NoSQL, PyTorch, FastAPI, Redis
Staffworx are a UK based Talent & Recruiting Partner, supporting Digital Commerce, Software and Value Add Consulting sectors across the UK & EMEA.