Staff Software Engineer, AI Authoring

Unity Unity · Enterprise · Mountain View, CA · AI & Machine Learning

Staff Software Engineer focused on the reliability, scalability, performance, and maintainability of AI authoring products, including LLM inference and ML-driven tools. Responsibilities include DevOps, CI/CD, Kubernetes, infrastructure-as-code, agentic troubleshooting, A/B testing framework, and observability on Azure and GCP.

What you'd actually do

  1. Partner closely with product, design, and other engineering teams to influence the product roadmap.
  2. Contribute to technical strategy and architectural decisions for Unity AI and backend platforms, balancing short-term delivery with long-term product health.
  3. Design and implement sustainable, scalable solutions for Unity AI products, pushing the boundaries of the Unity Editor with cutting-edge AI agent technologies.
  4. Improve the reliability, performance, observability, security, and cost-efficiency of our backend systems, and support investigations into issues like performance regressions and network latency through strong observability and monitoring.
  5. Run technical spikes with developers to address application infrastructure needs on Microsoft Azure and Google Cloud.

Skills

Required

  • Experience building and shipping complex, user-facing software products at scale
  • Experience delivering and supporting cloud backend services using Terraform, Kubernetes, Helm, and CI/CD pipelines (e.g., Argo, GitHub Actions)
  • Hands-on experience with Microsoft Azure or Google Cloud Platform
  • Proven track record in observability, including monitoring, logging, alerting, and debugging tools such as Grafana to ensure system reliability and performance
  • Strong grasp of software delivery best practices and network security, paired with a quality-first mindset
  • Ability to translate business and user needs into technical solutions
  • Strong communication and interpersonal skills, with a proven ability to align stakeholders across disciplines and organizations

Nice to have

  • Exposure to ML infrastructure or LLM inference deployment
  • Backend service development experience, including API design
  • Familiarity with networking, caching, real-time data pipelines, and relational databases such as PostgreSQL
  • Familiarity with languages such as C#/.NET, Python, or Go
  • Familiarity with Unity or similar 3D engines

What the JD emphasized

  • reliability, scalability, performance, and maintainability
  • agentic troubleshooting
  • robust disaster recovery strategies
  • trustworthy A/B testing framework
  • client-server compatibility
  • observability across system health and business metrics
  • asset generation, LLM inference, and the ML-driven tools
  • cutting-edge AI agent technologies
  • performance regressions and network latency
  • application infrastructure needs
  • infrastructure-as-code and Helm deployments
  • Terraform Cloud, Helm, and Kubernetes

Other signals

  • AI authoring products
  • LLM inference
  • ML-driven tools
  • agentic troubleshooting
  • infrastructure-as-code
  • Kubernetes management
  • CI/CD automation