Senior Staff Software Engineer, Machine Learning, ML Training

Google Google · Big Tech · Kirkland, WA +3

Senior Staff Software Engineer focused on building and delivering ML frameworks for training large language models (LLMs) and stable diffusion models for Google Cloud customers. The role involves designing and implementing AI frameworks software for various ML workloads, identifying and resolving software and performance issues, and collaborating with cross-functional teams. Requires extensive experience in software development, ML design, ML infrastructure, and leading technical projects, with a focus on training ML models at scale.

What you'd actually do

  1. Build and deliver ML frameworks to perform training at LLMS and stable diffusion models for Google Cloud customers.
  2. Design and implement AI frameworks software for ML workloads in the cloud that enables models** **for language modeling, stable diffusion, rankings and recommendations, etc.
  3. Identify and resolve issues in software, performance, and topology.
  4. Work closely with software engineers, product managers and other engineering teams to get high-quality products and features through the software project lifecycle.
  5. Design, develop, test, deploy, maintain, and improve software.

Skills

Required

  • Python
  • ML design
  • ML infrastructure
  • model deployment
  • model evaluation
  • data processing
  • debugging
  • fine tuning
  • Speech/audio
  • reinforcement learning
  • design and architecture
  • testing/launching software products
  • large scale distributed systems
  • machine learning systems (Training for LLM, image generation)

Nice to have

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
  • 8 years of experience with data structures and algorithms.
  • 5 years of experience in a technical leadership role leading project teams and setting technical direction.
  • Experience with ML modeling and scaling.
  • Experience with building cloud based services, including with GCP.

What the JD emphasized

  • 8 years of experience in software development (e.g., Python).
  • 7 years of experience leading technical project strategy, ML design, and working with ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning).
  • 5 years of experience with one or more of the following: Speech/audio (e.g., technology duplicating and responding to the human voice), reinforcement learning (e.g., sequential decision making), ML infrastructure, or specialization in another ML field.
  • 5 years of experience with design and architecture; and testing/launching software products.
  • Experience with large scale distributed systems or machine learning systems (Training for LLM, image generation).

Other signals

  • ML frameworks for training LLMs and stable diffusion models
  • AI frameworks software for ML workloads
  • ML infrastructure