Document Sourcing Specialist

Document Sourcing Specialist

Posted 1 week ago by micro1

Negotiable
Undetermined
Remote
EMEA

Summary: Join a remote team as a Document Sourcing Specialist, focusing on sourcing and verifying open-access documents for AI training. This role requires meticulous attention to detail to ensure compliance with licensing requirements. You will collaborate with data engineering and compliance teams to maintain documentation integrity and support audits. Your expertise in sourcing from reputable repositories will be crucial for the quality of data used in AI projects.

Key Responsibilities:

  • Source publicly available documents from platforms such as government archives, academic repositories, open datasets, and licensed open-source documentation.
  • Verify and document the license type of every sourced document, ensuring strict adherence to requirements such as CC0, CC-BY, MIT, or Apache 2.0 (or equivalent).
  • Log critical metadata for each submission, including source URLs and full license details, in designated tracking tools.
  • Flag and annotate any issues related to ownership, unclear licensing, paywalled access, or content with non-commercial usage restrictions.
  • Collaborate with data engineering and compliance teams to clarify requirements and resolve sourcing ambiguities.
  • Maintain up-to-date knowledge of open data best practices, licensing changes, and repository navigation strategies.
  • Communicate findings and unresolved issues clearly in both written and verbal form, supporting documentation integrity and compliance audits.

Key Skills:

  • Exceptional attention to detail and ability to accurately review complex licensing and compliance information.
  • Experience sourcing documents from repositories such as SEC EDGAR, arXiv, Kaggle, and GitHub.
  • Proficiency in academic research, data collection, and public records searching.
  • Strong written and verbal communication skills, able to articulate findings and collaborate remotely.
  • Demonstrated ability to distinguish between open and restricted content, and to identify potential sourcing risks.
  • Comfort working independently in a fast-paced, remote environment with evolving priorities.
  • Highly organized, reliable, and adept at managing and documenting large volumes of information.

Salary (Rate): undetermined

City: undetermined

Country: undetermined

Working Arrangements: remote

IR35 Status: undetermined

Seniority Level: undetermined

Industry: Other

Detailed Description From Employer:

Job Title: Document Sourcing Specialist

Job Type: Contract (full-time or part-time)

Location: Remote

Job Summary: Join our customer's team as a Document Sourcing Specialist, where your keen eye for detail and passion for compliance will directly impact the quality of data used in AI training. In this fully remote role, you will identify, verify, and source open-access documents from a variety of reputable repositories to ensure they meet stringent licensing requirements.

Key Responsibilities:

  • Source publicly available documents from platforms such as government archives, academic repositories, open datasets, and licensed open-source documentation.
  • Verify and document the license type of every sourced document, ensuring strict adherence to requirements such as CC0, CC-BY, MIT, or Apache 2.0 (or equivalent).
  • Log critical metadata for each submission, including source URLs and full license details, in designated tracking tools.
  • Flag and annotate any issues related to ownership, unclear licensing, paywalled access, or content with non-commercial usage restrictions.
  • Collaborate with data engineering and compliance teams to clarify requirements and resolve sourcing ambiguities.
  • Maintain up-to-date knowledge of open data best practices, licensing changes, and repository navigation strategies.
  • Communicate findings and unresolved issues clearly in both written and verbal form, supporting documentation integrity and compliance audits.

Required Skills and Qualifications:

  • Exceptional attention to detail and ability to accurately review complex licensing and compliance information.
  • Experience sourcing documents from repositories such as SEC EDGAR, arXiv, Kaggle, and GitHub.
  • Proficiency in academic research, data collection, and public records searching.
  • Strong written and verbal communication skills, able to articulate findings and collaborate remotely.
  • Demonstrated ability to distinguish between open and restricted content, and to identify potential sourcing risks.
  • Comfort working independently in a fast-paced, remote environment with evolving priorities.
  • Highly organized, reliable, and adept at managing and documenting large volumes of information.

Preferred Qualifications:

  • Prior experience supporting AI or machine learning projects with high-quality data sourcing.
  • Familiarity with open-source licensing and data compliance regulations.
  • Background in academic research, information science, or legal review.