Negotiable
Outside
Onsite
USA
Summary: The Data Analyst role focuses on leveraging Gen AI and Python to prepare datasets for LLM-based applications and chatbot systems, specifically within the insurance domain. The position requires extensive experience in handling structured and semi-structured data, as well as contributing to the development of semantic models. Candidates should possess strong analytical and research skills to collaborate effectively with engineering and domain experts. This role is primarily onsite and emphasizes familiarity with data governance best practices.
Key Responsibilities:
- Prepare datasets for LLM-based applications, Retrieval-Augmented Generation (RAG), or chatbot systems.
- Work with structured and semi-structured insurance data to build curated knowledge inputs.
- Annotate fields, data structures, documents, create prompt-response structures, and support semantic chunking strategies.
- Contribute to the development of semantic models, including defining taxonomies, ontologies, and synonym sets for insurance concepts.
- Extract and transform content from large data systems using SQL (Snowflake preferred).
- Understand how LLMs retrieve and process contextual information, including embedding logic and metadata tagging.
- Familiarize with data governance best practices in regulated domains like insurance, banking, or finance.
- Complete a quantitative course such as Mathematics, Statistics, or Computer Science/Engineering (preferred, not necessary).
Key Skills:
- Experience with Gen AI and Python.
- Proficiency in SQL, preferably Snowflake.
- Strong analytical thinking and research skills.
- Ability to work with structured and semi-structured data.
- Familiarity with data governance best practices.
- Experience in the insurance domain.
- Collaboration skills to work with engineering and domain SMEs.
- Completion of a quantitative course (preferred).
Salary (Rate): undetermined
City: undetermined
Country: USA
Working Arrangements: on-site
IR35 Status: outside IR35
Seniority Level: undetermined
Industry: Other
Job Title: Data Analyst
Location: Multiple Locations
Mode of work: Onsite
Primary skills: Gen AI, Python
Experience: 10+ years
- Who has experience preparing datasets for LLM-based applications, Retrieval-Augmented Generation (RAG) or chatbot systems.
- Who is comfortable working with structured and semi-structured insurance data to build curated knowledge inputs.
- Who can annotate fields, data structures, documents, create prompt-response structures, and support semantic chunking strategies.
- Who can contribute to the development of our semantic model, including defining taxonomies, ontologies, and synonym sets for insurance concepts.
- Who is proficient in SQL (Snowflake preferred) and able to extract and transform content from large data systems.
- Who understands how LLMs retrieve and process contextual information, including embedding logic and metadata tagging.
- Who is familiar with data governance best practices, especially in regulated domains like insurance, banking, or finance.
- Who has completed a quantitative course such as Mathematics, Statistics, or Computer Science/Engineering. (Preferred, not necessary)
Soft Skills:
- Analytical Thinking: Strong ability to identify patterns, gaps, and inconsistencies in large data sets.
- Research Skills: Able to explore domain-specific terminology and translate it into structured knowledge.
- Collaboration: Comfortable working closely with engineering, data, and domain SMEs in a distributed team.