Software Engineer, Infrastructure

Sierra Sierra · AI Frontier · San Francisco, CA · Engineering

Software Engineer, Infrastructure at Sierra, responsible for designing, building, and maintaining the core systems that make their AI platform possible, focusing on making the infrastructure secure, reliable, and scalable. Key responsibilities include ensuring the reliability, scalability, and performance of the platform and LLM inference serving, building and maintaining cloud infrastructure using Terraform, creating a self-serve infrastructure platform, owning CI/CD pipelines, architecting distributed systems, developing data serving abstractions and security features, integrating with enterprise customer environments, enhancing observability tooling, and leading incident management.

What you'd actually do

  1. Ensure the reliability, scalability, and performance of our platform and LLM inference serving as we rapidly grow traffic.
  2. Build and maintain cloud infrastructure using Terraform to ensure scalable, secure, and reproducible environments.
  3. Create and maintain a self-serve infrastructure platform that enables the rest of engineering to deploy and operate services.
  4. Own and evolve CI/CD pipelines and release management, enabling fast, reliable deployments for Sierra’s platform.
  5. Architect and operate distributed systems that leverage distributed databases, retrieval systems, and ML models.

Skills

Required

  • Strong software engineering background with 5–7+ years of hands-on development experience in highly technical products.
  • A strong inclination towards building automation, tooling, and platform, along with designing maintainable systems.
  • Proven experience with cloud platforms (AWS, GCP, or Azure) and infrastructure as code (Terraform preferred).
  • Hands-on expertise in CI/CD systems, release management, and container orchestration (e.g., Docker, Kubernetes).
  • Experience with observability tools (Prometheus, Grafana, Datadog, OpenTelemetry, etc.).
  • Experience in incident response and operating distributed systems in production.
  • Degree in Computer Science or related field, or equivalent professional experience

Nice to have

  • Production experience working with LLMs and machine learning models.
  • Background in distributed systems, running SaaS services at scale, and agentic architecture.
  • Familiarity with security and authentication protocols (OAuth, SSO, mTLS).
  • Previous experience in a fast-paced startup environment or platform/infra-focused team.

What the JD emphasized

  • LLM inference serving
  • cloud infrastructure
  • distributed systems
  • observability tooling

Other signals

  • LLM inference serving
  • cloud infrastructure
  • distributed systems
  • observability tooling