Senior Software Development Engineer, Agi Data Services

Amazon Amazon · Big Tech · Boston, MA · Software Development

Senior Software Development Engineer focused on acquiring, creating, and curating high-quality software engineering datasets for training Generative AI models, including Reinforcement Learning. The role involves designing and building automated systems for data mining and quality assessment, as well as developing GenAI-powered workflow tools to streamline data collection and assurance processes.

What you'd actually do

  1. designing, building, and scaling automated systems that mine and curate high quality datasets
  2. architect judge pipelines, develop evaluation rubrics and scoring frameworks, build calibration and agreement mechanisms to eventually ensure Amazon models improve on key software engineering capabilities.
  3. design and build GenAI-powered workflow tools — such as conversational diagnostic agents, automated quality assessment systems, and guided remediation workflows — that streamline data collection and quality assurance processes, enabling cross-functional teams to rapidly identify issues, reduce resolution time, and continuously improve data throughput.
  4. identify high quality software engineering data that is valuable for model training (specifically Reinforcement Learning, but also other types of training)

Skills

Required

  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language experience
  • 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Bachelor's degree in computer science or equivalent

Nice to have

  • 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Master's degree
  • Knowledge of Machine Learning and LLM fundamentals, including transformer architecture, training/inference lifecycles, and optimization techniques
  • Knowledge of general AI tools

What the JD emphasized

  • accelerate the creation of rich Software Engineering datasets used to train Amazon's Models
  • design, build, and maintain systems to mine software engineering data
  • assess data quality using a combination of models and humans in the loop
  • design and build GenAI-powered workflow tools
  • streamline data collection and quality assurance processes
  • identify issues, reduce resolution time, and continuously improve data throughput
  • directly improve Amazon Nova models
  • accelerate and scale that momentum
  • identifying high quality software engineering data that is valuable for model training
  • designing, building, and scaling automated systems that mine and curate high quality datasets
  • architect judge pipelines
  • develop evaluation rubrics and scoring frameworks
  • build calibration and agreement mechanisms
  • ensure Amazon models improve on key software engineering capabilities
  • collaborate with Applied Scientists, Technical Program Managers, domain experts, and vendor teams
  • bridging technology, process, and operations
  • define their strategy
  • use spec driven development using coding agents to develop the necessary tooling
  • identify and link high value SWE data for reinforcement learning
  • delivering working training data to the modeling team
  • dive deep into data quality anecdotes to find patterns and root causes
  • propose improvements
  • communicate impact and roadmaps to cross-functional partners
  • appropriately leverage other SDEs for human in loop judgement to improve data
  • communicate regularly with Sr leadership on the approaches they define

Other signals

  • designing, building, and scaling automated systems that mine and curate high quality datasets
  • design and build GenAI-powered workflow tools
  • accelerate the creation of rich Software Engineering datasets used to train Amazon's Models