Technical Program Manager- AI Cluster E… at AMD

What you'd actually do

Define, plan, and drive program plans for AI infrastructure systems validation and readiness, including server integration, rack bring-up, and cluster-scale deployment readiness.

Own program execution for rack- and cluster-network enablement, including topology decisions, switching/optics/cabling readiness, and validation schedules for scale-out operation.

Lead cross-functional delivery for rack solutions that integrate CPU + GPU+ NICs, ensuring end-to-end readiness across hardware, firmware, and management interfaces.

Own program coordination for pod/rack manageability solutions, aligning requirements and milestones for inventory, health monitoring, cluster provisioning, and observability across large-scale deployments.

Drive readiness for rack-level automation and regression workflows (scripts, log mapping, infrastructure automation planning), planning execution to de-risk hardware arrival timing.

Skills

Required

Proven program management experience delivering complex, cross-functional hardware/software infrastructure programs (server/rack/cluster environments).
Strong understanding of datacenter building blocks and lifecycle: servers, racks, clusters, HW/FW/SW integration, and readiness/validation flows.
Demonstrated ability to build and run schedules, manage risks, lead matrix teams, and communicate clearly to engineering and executive audiences.
Strong working knowledge of program tools (e.g., Jira/Confluence/Microsoft Office) and dashboard-based execution management.

Nice to have

AI cluster networking domain experience (NICs, switching/optics/cabling, topologies).
Familiarity with rack/pod management and operations concerns (telemetry, health monitoring, power control, FW provisioning, management networks).
Experience leading programs for integration of servers in OEMs, ODMs, or data centers
Demonstrated horizontal leadership across large matrix organizations.
Formal PM education/certification (PMP / Scrum Master) preferred.

What the JD emphasized

AI cluster engineering programs

GPU platforms, rack-scale solutions, high-speed networking, and datacenter AI infrastructure

server integration to rack and cluster-level validation

AI networking requirements

GPU / Rack Solution Integration

AI Infrastructure & Manageability

Automation, Tooling, and Regression Readiness

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. **Together, we advance your career. **

THE ROLE

We are seeking an experienced Technical Program Manager to drive end-to-end execution of AI cluster engineering programs spanning GPU platforms, rack-scale solutions, high-speed networking, and datacenter AI infrastructure. You will work cross-functionally to translate customer and internal requirements into executable plans, manage risks and dependencies, and deliver scalable, production-ready solutions across GPU → rack → cluster deployments.

THE PERSON

You are a hands-on TPM who thrives in complex, fast-moving ecosystems, and can connect deep technical details to crisp program plans, executive reporting, and customer outcomes. You will partner with cross functional teams to help server integration to rack and cluster-level validation You bring strong ownership, structured execution, and the ability to lead through influence across engineering, operations, vendors, and customers.

KEY RESPONSIBILITIES

Program Leadership & Execution

Define, plan, and drive program plans for AI infrastructure systems validation and readiness, including server integration, rack bring-up, and cluster-scale deployment readiness.
Create and maintain core PM artifacts: schedules, dependency maps, resource forecasts, risk/issue logs, and program dashboards/status reports.
Identify and drive mitigation plans for issues/risks, including cross-team escalations and corrective actions across multiple engineering areas.

AI Cluster and Networking (Topologies, NICs, Switching, Optics)

Own program execution for rack- and cluster-network enablement, including topology decisions, switching/optics/cabling readiness, and validation schedules for scale-out operation.
Drive alignment on advanced AI networking requirements such as network architecture, and reliability impacts that require mitigation.
Partner with internal/external stakeholders to track and close network blockers

GPU / Rack Solution Integration

Lead cross-functional delivery for rack solutions that integrate CPU + GPU+ NICs, ensuring end-to-end readiness across hardware, firmware, and management interfaces.
Drive requirements capture and execution planning for rack-scale deployments (rack density, rack form factor, power targets, whips, liquid cooling etc.) and ensure integration plans are validated with engineering and operations.

AI Infrastructure & Manageability (Operations, Telemetry, Provisioning)

Own program coordination for pod/rack manageability solutions, aligning requirements and milestones for inventory, health monitoring, cluster provisioning, and observability across large-scale deployments.
Coordinate with platform/automation teams on cluster provisioning and orchestration.

Automation, Tooling, and Regression Readiness

Drive readiness for rack-level automation and regression workflows (scripts, log mapping, infrastructure automation planning), planning execution to de-risk hardware arrival timing.
Partner with CI/CD and FW automation stakeholders to align on deliverables, and validation gates.

REQUIRED EXPERIENCE

Proven program management experience delivering complex, cross-functional hardware/software infrastructure programs (server/rack/cluster environments).
Strong understanding of datacenter building blocks and lifecycle: servers, racks, clusters, HW/FW/SW integration, and readiness/validation flows.
Demonstrated ability to build and run schedules, manage risks, lead matrix teams, and communicate clearly to engineering and executive audiences.
Strong working knowledge of program tools (e.g., Jira/Confluence/Microsoft Office) and dashboard-based execution management.

**PREFERRED EXPERIENCE **

AI cluster networking domain experience (NICs, switching/optics/cabling, topologies).
Familiarity with rack/pod management and operations concerns (telemetry, health monitoring, power control, FW provisioning, management networks).
Experience leading programs for integration of servers in OEMs, ODMs, or data centers
Demonstrated horizontal leadership across large matrix organizations.

ACADEMIC CREDENTIALS

Bachelor’s or master’s degree in computer/electrical engineering (or equivalent).
Formal PM education/certification (PMP / Scrum Master) preferred.

LOCATION

Austin, TX

This role is not eligible for visa sponsorship.

#LI-JE1

_Benefits offered are described: _AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.

_ _

This posting is for an existing vacancy.