What you'd actually do

Maintain and foster relationships with model providers, becoming their trusted technical advisor and strategic partner

Develop deep knowledge of core foundational services (Compute, Network, Storage) along with ML expertise to build long-term relationships with customer engineering teams

Dive deep to understand the details of model provider’s environment, business goals, and technical requirements for building and deploying foundation models

Design and implement advanced cloud architectures that enable model providers to scale their AI research and production workloads efficiently

Partner closely with AWS service teams (EC2, Global Networking, EKS, Bedrock, S3) to influence roadmaps and develop custom solutions that meet model provider’s unique requirements

Skills

Required

10+ years of specific technology domain areas (e.g. software development, cloud computing, systems engineering, infrastructure, security, networking, data & analytics) experience
Bachelor's degree in computer science, engineering, mathematics or equivalent
Experience developing technology solutions and evangelising end-to-end technology roadmaps that guide IT transformations toward cloud computing
Experience communicating across technical and non-technical audiences and at C-level, including training, workshops, publications

Nice to have

Knowledge of large scale automation and workflow management or equivalent
Knowledge of presentations and whiteboarding skills with a high degree of comfort speaking with internal and external executives, IT management, and developers
Experience with training and deploying machine learning systems to solve large-scale optimizations, or experience operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets
Experience with CUDA kernels or ML/low-level kernels, or experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution
Experience in Kubernetes, Docker or containers ecosystem
Knowledge of foundation model architectures, training approaches, and serving infrastructure

What the JD emphasized

exabyte-scale data

millions of interconnected GPUs

custom hardware acceleration requirements

GPU optimization

network throughput

distributed training

cost efficiency at massive scale

training and deploying machine learning systems

highly available, distributed systems

large data sets

CUDA kernels

ML/low-level kernels

Machine Learning and Large Language Model fundamentals

architecture, training/inference lifecycles

optimization of model execution

foundation model architectures

training approaches

serving infrastructure

Other signals

design and implement advanced cloud architectures that enable model providers to scale their AI research and production workloads efficiently

Drive technical and architectural best practices for GPU optimization, network throughput, distributed training, and cost efficiency at massive scale

Experience with training and deploying machine learning systems to solve large-scale optimizations, or experience operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets

Experience with CUDA kernels or ML/low-level kernels, or experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution

Knowledge of foundation model architectures, training approaches, and serving infrastructure

As a Principal Solutions Architect supporting Foundation Model Providers (FMP) on AWS, you will tackle some of the most challenging and exciting problems in cloud computing today. Model provider’s compute footprint represents one of the most demanding workloads on the AWS platform, pushing the boundaries of what's possible with networking, GPU infrastructure, storage, container orchestration, and distributed computing at extraordinary scale. In this role, you'll design cloud architectures that enable model providers to train, fine-tune, and serve state-of-the-art generative AI models. You'll help solve technical challenges that few organizations in the world face - exabyte-scale data, millions of interconnected GPUs, complex networking topologies, and custom hardware acceleration requirements that define the leading edge of AI infrastructure. You will help them solve business challenges such as rapidly releasing products/services to the market and building elastic, scalable, cost optimized applications. You will engage with product owners and service teams to set the strategy for AWS services. For this role, we are looking for folks who have technical breadth complimented by technical depth in one or two areas, business aptitude, and the ability to lead in-depth technology discussions, articulating the business value of the AWS platform and services.

Key job responsibilities • Maintain and foster relationships with model providers, becoming their trusted technical advisor and strategic partner • Develop deep knowledge of core foundational services (Compute, Network, Storage) along with ML expertise to build long-term relationships with customer engineering teams • Dive deep to understand the details of model provider’s environment, business goals, and technical requirements for building and deploying foundation models • Design and implement advanced cloud architectures that enable model providers to scale their AI research and production workloads efficiently • Partner closely with AWS service teams (EC2, Global Networking, EKS, Bedrock, S3) to influence roadmaps and develop custom solutions that meet model provider’s unique requirements • Identify patterns and technical solutions that can be broadly applied across the FMP segment to accelerate innovation • Lead technical discussions that articulate the business value of AWS platform and services to both technical architects and executive stakeholders • Drive technical and architectural best practices for GPU optimization, network throughput, distributed training, and cost efficiency at massive scale

About the team About the team Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.

Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.

Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (diversity) conferences, inspire us to never stop embracing our uniqueness.

Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.

Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.

Basic Qualifications

10+ years of specific technology domain areas (e.g. software development, cloud computing, systems engineering, infrastructure, security, networking, data & analytics) experience
Bachelor's degree in computer science, engineering, mathematics or equivalent
Experience developing technology solutions and evangelising end-to-end technology roadmaps that guide IT transformations toward cloud computing
Experience communicating across technical and non-technical audiences and at C-level, including training, workshops, publications

Preferred Qualifications

Knowledge of large scale automation and workflow management or equivalent
Knowledge of presentations and whiteboarding skills with a high degree of comfort speaking with internal and external executives, IT management, and developers
Experience with training and deploying machine learning systems to solve large-scale optimizations, or experience operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets
Experience with CUDA kernels or ML/low-level kernels, or experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution
Experience in Kubernetes, Docker or containers ecosystem
Knowledge of foundation model architectures, training approaches, and serving infrastructure

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Basic Qualifications

10+ years of specific technology domain areas (e.g. software development, cloud computing, systems engineering, infrastructure, security, networking, data & analytics) experience
Bachelor's degree in computer science, engineering, mathematics or equivalent
Experience developing technology solutions and evangelising end-to-end technology roadmaps that guide IT transformations toward cloud computing
Experience communicating across technical and non-technical audiences and at C-level, including training, workshops, publications

Preferred Qualifications

Knowledge of large scale automation and workflow management or equivalent
Knowledge of presentations and whiteboarding skills with a high degree of comfort speaking with internal and external executives, IT management, and developers
Experience with training and deploying machine learning systems to solve large-scale optimizations, or experience operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets
Experience with CUDA kernels or ML/low-level kernels, or experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution
Experience in Kubernetes, Docker or containers ecosystem
Knowledge of foundation model architectures, training approaches, and serving infrastructure

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.