Mts Software System Design Engineer at AMD

What you'd actually do

design, test, and validate reference architectures for large-scale AI training and inference clusters

Develop comprehensive tools for AI training to enable efficient cluster management

Create detailed reference documentation and implementation guides for customers and internal teams

Serve as the primary technical interface with customer engineering teams during deployment planning

Conduct proof-of-concept implementations to validate designs in real-world scenarios

Skills

Required

Designing and implementing large-scale infrastructure solutions
Kubernetes and container orchestration technologies
AI/ML workloads in production environments
Datacenter networking and storage architectures
GPU/AI-accelerated computing environments
Creating technical documentation and reference architectures
Infrastructure automation and orchestration tools
Performance optimization for large-scale inference deployments
Ray, PyTorch, and HPC optimized schedulers for Kubernetes based AI training
SLURM or similar HPC schedulers
Infrastructure-as-code tools such as Terraform or Ansible
Performance tuning for GPU/AI-accelerated workloads
Creating automation tools for infrastructure deployment

What the JD emphasized

large-scale AI training and inference clusters

large-scale inference deployments

large-scale infrastructure solutions

Kubernetes and container orchestration technologies

AI/ML workloads in production environments

GPU/AI-accelerated computing environments

Performance optimization for large-scale inference deployments

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. **Together, we advance your career. **

Job Role and Responsibility: AMD, Inc., is hiring MTS Software System Design Engineer to design, test, and validate reference architectures for large-scale AI training and inference clusters. Develop comprehensive tools for AI training to enable efficient cluster management. Create detailed reference documentation and implementation guides for customers and internal teams. Serve as the primary technical interface with customer engineering teams during deployment planning. Conduct proof-of-concept implementations to validate designs in real-world scenarios. Evaluate and benchmark performance of various infrastructure configurations. Provide expert guidance on optimizing Kubernetes for AI workloads at scale. Collaborate with product management to influence roadmap based on customer requirements. Maintain deep technical expertise in emerging AI infrastructure technologies. Coordinate customer requirements gathering and work with the relevant Technical Program Management counterpart to arrive at a deployment plan. Creation of comprehensive, tested reference architectures that accelerate customer deployments. Drive test and interoperability validation with our HW and SW partners, lead implementation of reference datacenter solutions at our CSP partners. Development of automation tools that significantly reduce deployment complexity. Establishment as a trusted advisor to customer technical teams. Contribution to increased win rates through technical credibility and expertise. Regular feedback that improves our product roadmap and offering.

Multiple openings. Qualified applicants click “APPLY NOW” button to apply online.

Travel required: NO

Qualifications: Degree required

Master’s degree or foreign equivalent in Computer and Information Science, Computer Engineering, Electrical Engineering or related field.

Qualifications: Amount and type of experience required: Three years’ experience in the job offered or a closely related engineering role.

Alternate combination of education and experience: Employer will alternatively accept a bachelor’s degree or foreign equivalent in Computer and Information Science, Computer Engineering, Electrical Engineering or related field and five (5) years of progressive post baccalaureate experience in the job offered or a closely related engineering role.

Specific skills required: The following skills are required:

Position requires Three (3) years of experience in the following:

Designing and implementing large-scale infrastructure solutions;
Kubernetes and container orchestration technologies;
AI/ML workloads in production environments;
Datacenter networking and storage architectures;
GPU/AI-accelerated computing environments;
Creating technical documentation and reference architectures;
Infrastructure automation and orchestration tools;
Performance optimization for large-scale inference deployments;
Ray, PyTorch, and HPC optimized schedulers for Kubernetes based AI training;
SLURM or similar HPC schedulers;
Infrastructure-as-code tools such as Terraform or Ansible;
Performance tuning for GPU/AI-accelerated workloads; and
Creating automation tools for infrastructure deployment.

#LI-AM4

_Benefits offered are described: _AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.

_ _

This posting is for an existing vacancy.