Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to manage information at a massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.
Responsibilities
- Enable and optimize foundational models (e.g., LLMs and Diffusion) within key frameworks like vLLM, MaxText, and MaxDiffusion, providing Google Cloud customers with immediate access to AI capabilities.
- Partner with customers to measure Artificial Intelligence/Machine Learning (AI/ML) model performance on Google Cloud infrastructure. Identify and resolve technical bottlenecks to drive customer success working with Customer Engineers teams.
- Collaborate with internal infrastructure teams to enhance support for demanding AI workloads. Contribute to product improvement by identifying bugs and recommending enhancements.
- Conduct performance profiling, debugging, and troubleshooting of training and inference workloads. . Maintain and update documentation and educational content based on product changes and user feedback. Triage, debug, and resolve system issues by analyzing root causes and operational impact.
- Design and implement specialized Machine Leaning solutions leveraging advanced ML infrastructure.
Qualifications
Minimum qualifications:
- Bachelor's degree or equivalent practical experience.
- 2 years of experience with software development in one or more programming languages (e.g., Python).
- 2 years of experience with software development in one or more programming languages, or 1 year of experience with an advanced degree.
- 1 year of experience with ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging).
- 1 year of experience with GenAI concepts (Large Language Model, Multi-Modal, Large Vision Models) and experience with text, image, video, or audio generation.
Preferred qualifications:
- Master’s degree or PhD in Computer Science or a related technical field
- Experience with Generative AI, Large Language Models (LLM), or Machine Learning infrastructure, including model deployment, performance optimization, profiling, and debugging large-scale workloads.
- Experience with distributed computing leveraging Graphics Processing Units (GPUs) or Tensor Processing Units (TPUs).
- Ability to collaborate effectively with cross-functional teams.
- Ability to thrive in a changing environment where AI technologies are continuously advancing.