Software Development Snr Manager

Oracle Oracle · Enterprise · Seattle, WA +1

Senior Manager to lead a team designing, developing, and optimizing AI compute infrastructure components, focusing on GPU control and data planes to enhance customer workload performance and experience on Oracle's AI infrastructure. The role involves people management, setting team goals, driving modern software engineering practices, and ensuring solutions are secure, reliable, and monitored.

What you'd actually do

  1. Own and build solutions to scale and optimize AI compute infrastructure components like GPU control plane and GPU data plane with the goal to optimize customer experience and customer workload performance on our AI infrastructure.
  2. Set and communicate individual expectations and team goals such that they align with the broader organization goals.
  3. Model and coach team members and drive modern software engineering practices like leveraging data/telemetry to make decisions, well-defined interfaces across components, design reviews, coding standards, code reviews, and comprehensive coverage from unit test, integration test and active production monitoring.
  4. Prioritize team’s work with focus on customer issues and requirements.
  5. Ensure that team solutions are well-defined and modularized, secure, reliable, diagnosable, actively monitored, compliant and reusable.

Skills

Required

  • C
  • C++
  • C#
  • Java
  • Go
  • Rust
  • people management
  • leadership role
  • cross-functional projects
  • large-scale distributed systems
  • services
  • infrastructure
  • Computer Science
  • Engineering
  • communication
  • collaboration
  • project management
  • adapt to a fast-paced, dynamic environment
  • manage multiple tasks and priorities

Nice to have

  • cloud infrastructure
  • containerization technologies
  • Docker
  • Kubernetes
  • scheduling high-performance workloads
  • Slurm

What the JD emphasized

  • AI compute infrastructure components
  • GPU control plane
  • GPU data plane
  • optimize customer workload performance
  • hundreds of thousands of servers

Other signals

  • AI compute infrastructure
  • GPU control plane
  • GPU data plane
  • optimize customer workload performance
  • manage a team