What you'd actually do

Develop novel approaches for validating AI/ML workloads on GPU infrastructure

Lead the adoption of Agentic AI to transform test strategy, planning, and test solution development, improving validation effectiveness and release confidence.

Drive AI-assisted diagnostics and resolution of complex DevOps challenges across CI/CD, build, deployment, and engineering infrastructure ecosystems.

Architect and implement AI agents that automate critical workflows across software development, testing, and release management, delivering measurable gains in productivity and operational efficiency.

Establish best practices, governance, and scalable patterns for applying AI across engineering workflows while ensuring solution quality, reliability, and maintainability.

Skills

Required

Design, develop, and execute comprehensive test plans, test strategies, and test cases for complex systemlevel features.
Define the multi-year vision and roadmap for system software testing infrastructure and methodologies
Perform functional, integration, and system-level testing across different Linux distributions (e.g., Ubuntu, RHEL, SLES, etc.)
Analyze and debug issues across the software stack using strong knowledge of Linux internals, system services, kernel-level behavior, performance tools, and logs.
Ensure test coverage for the GPU product features
Review requirements and create associated test cases to ensure traceability
Develop and maintain automated tests using gtest, ctest, and other relevant test frameworks.
Collaborate with cross-functional teams to ensure testability and influence design decisions to improve product quality.
Write clean, maintainable C/C++, Python code for test automation, validation tools, and testing infrastructure.
Drive continuous improvements in test processes, tooling, and coverage.
Investigate failures, perform rootcause analysis, and provide detailed debug information to development teams.
Mentor junior engineers and contribute to building a high-quality engineering culture.
Must be a self-starter, and able to independently drive tasks to completion
Proven experience in validating complex systems with a focus on performance, scalability, and reliability across integrated hardware–software ecosystems.
Demonstrated ability to leverage Agentic AI to define test strategies, create detailed test plans, and develop effective test solutions that improve quality and release readiness.
Proven ability to apply AI-assisted problem solving to diagnose and resolve DevOps-related issues across CI/CD pipelines, build systems, deployment processes, and engineering environments.
Strong ability to design and build AI agents that automate workflows, reduce manual effort, and improve productivity across software engineering, testing, and release operations.

Nice to have

General Computer Architecture concepts
Windows and Linux Operating Systems
Cloud, Virtualization and Container environments
System level, functional and environmental stress testing
Deep Learning, High Performance Computing or GPU Server Based computing a big plus.
Knowledge of CUDA GPU Computing Languages a plus.
Parallel Computing Skills with MPI Programing experience a plus.
Proven record in large scale data center engineering
Experience with CI/CI tools (Jenkins, GitHub Actions, GitLab CI).
Knowledge of container technologies (Docker, Podman, Kubernetes).
Experience with performance testing or hardware–software systems.
Excellent interpersonal, organizational, analytical, planning, and technical leadership skills

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. **Together, we advance your career. **

SMTS SOFTWARE SYSTEMS DESIGN ENGINEER

THE ROLE:

We are seeking an engineer to join our team that will thrive in a fast-paced work environment, using effective communication, problem-solving and prioritization skills. Individuals that are well organized, show great attention to detail, and employ critical thinking are well-suited for our team.

THE PERSON:

This AMD (Advanced Micro Devices) team is looking for a senior level person that can help guide the team, mentor upcoming developers, provide long range strategy, and is willing to jump in to help resolve issues quickly. You will be involved in all areas that impact the team including performance, automation, and development. The right candidate will be informed on the latest trends and become prepared to give consultative direction to senior management.

KEY RESPONSIBILITIES:

Design, develop, and execute comprehensive test plans, test strategies, and test cases for complex systemlevel features.
Define the multi-year vision and roadmap for system software testing infrastructure and methodologies
Develop novel approaches for validating AI/ML workloads on GPU infrastructure
Perform functional, integration, and system-level testing across different Linux distributions (e.g., Ubuntu, RHEL, SLES, etc.)
Lead the adoption of Agentic AI to transform test strategy, planning, and test solution development, improving validation effectiveness and release confidence.
Drive AI-assisted diagnostics and resolution of complex DevOps challenges across CI/CD, build, deployment, and engineering infrastructure ecosystems.
Architect and implement AI agents that automate critical workflows across software development, testing, and release management, delivering measurable gains in productivity and operational efficiency.
Establish best practices, governance, and scalable patterns for applying AI across engineering workflows while ensuring solution quality, reliability, and maintainability.
Partner cross-functionally to identify high-impact automation opportunities and embed AI capabilities into core engineering and release processes.
Analyze and debug issues across the software stack using strong knowledge of Linux internals, system services, kernel-level behavior, performance tools, and logs.
Ensure test coverage for the GPU product features
Review requirements and create associated test cases to ensure traceability
Develop and maintain automated tests using gtest, ctest, and other relevant test frameworks.
Collaborate with cross-functional teams to ensure testability and influence design decisions to improve product quality.
Write clean, maintainable C/C++, Python code for test automation, validation tools, and testing infrastructure.
Drive continuous improvements in test processes, tooling, and coverage.
Investigate failures, perform rootcause analysis, and provide detailed debug information to development teams.
Mentor junior engineers and contribute to building a high-quality engineering culture.
Must be a self-starter, and able to independently drive tasks to completion

PREFERRED EXPERIENCE:

Proven experience in validating complex systems with a focus on performance, scalability, and reliability across integrated hardware–software ecosystems.
General Computer Architecture concepts
Windows and Linux Operating Systems
Cloud, Virtualization and Container environments
System level, functional and environmental stress testing
Demonstrated ability to leverage Agentic AI to define test strategies, create detailed test plans, and develop effective test solutions that improve quality and release readiness.
Proven ability to apply AI-assisted problem solving to diagnose and resolve DevOps-related issues across CI/CD pipelines, build systems, deployment processes, and engineering environments.
Strong ability to design and build AI agents that automate workflows, reduce manual effort, and improve productivity across software engineering, testing, and release operations.
Deep Learning, High Performance Computing or GPU Server Based computing a big plus.
Knowledge of CUDA GPU Computing Languages a plus.
Parallel Computing Skills with MPI Programing experience a plus.
Proven record in large scale data center engineering
Experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI).
Knowledge of container technologies (Docker, Podman, Kubernetes).
Experience with performance testing or hardware–software systems.
Excellent interpersonal, organizational, analytical, planning, and technical leadership skills

ACADEMIC CREDENTIALS:

Bachelor’s or Master’s in Electrical Engineer, Computer Engineering, Computer Science, or a closely related field

#LI-NR1

_Benefits offered are described: _AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.

_ _

This posting is for an existing vacancy.