Software Development Engineer — Ci/cd, Trainium Manufacturing Test Infrastructure

Amazon Amazon · Big Tech · Cupertino, CA · Software Development

Software Development Engineer role focused on building and maintaining CI/CD infrastructure for AWS Trainium chip manufacturing test sites. The role involves designing deployment pipelines, extending the manufacturing platform, building integration test frameworks, and developing automated deployment strategies for edge/hybrid environments.

What you'd actually do

  1. Design, build, and maintain CI/CD pipelines (AWS CDK, Pipelines) that deploy containerized services to AWS Outposts at global manufacturing sites
  2. Extend the manufacturing infrastructure platform (TypeScript CDK, Python microservices) to support new workflows for Trainium accelerator cards, baseboards, and rack-level integration
  3. Build integration test frameworks and canary systems that validate service health across all production sites before and after deployments
  4. Develop automated alarming, rollback mechanisms, and deployment wave strategies to ensure zero-downtime releases to active manufacturing lines
  5. Develop infrastructure-as-code for containerized services, databases, artifact storage, messaging queues, and authentication systems deployed on Outposts

Skills

Required

  • BS degree in computer science or equivalent
  • Experience with at least one general-purpose programming language such as Java, Python, C++, C#, Go, Rust, or TypeScript
  • Experience with CI/CD pipeline design and implementation (AWS Pipelines, CircleCI, GitLab CI, GitHub Actions, Jenkins, or similar)
  • Experience with cloud services (AWS, GCP, or Azure) — particularly IaC tools such as CDK, CloudFormation, Terraform, or Pulumi

Nice to have

  • Experience deploying software to edge/hybrid environments (AWS Outposts, on-premises)
  • Experience with containerized microservice architectures (Docker, ECS/EKS, Kubernetes)
  • Familiarity with hardware test automation or manufacturing systems
  • Experience with setting up CI/CD for system software
  • Familiarity with network configuration in constrained environments (VPN, CIDR management, site connectivity)

What the JD emphasized

  • directly enable the manufacturing ramp of AWS's custom AI training chips
  • directly impacts how fast Trainium servers move from factory floor to customer
  • every hour of pipeline latency is lost customer revenue