Senior Software Developer - AI Infra Compute

Oracle Oracle · Enterprise · Austin, TX +1

Develops and designs fundamental architectural changes for a cutting-edge, ultra-high-performance GPU platform supporting AI/ML/HPC workloads, focusing on GPU delivery, health monitoring, triage automation, and diagnostic services for large-scale distributed systems.

What you'd actually do

  1. designing and developing fundamental architectural changes for GPU delivery, health monitoring, triage automation, and diagnostic services
  2. running distributed AI/ML/HPC workloads across thousands of GPUs
  3. software debugging and low-level systems troubleshooting
  4. building and operational tools and dashboards
  5. translating requests into prioritized work or feature

Skills

Required

  • backend software development experience
  • Java language or similar object-oriented languages
  • scripting language (Python, Shell)
  • Git/Bitbucket
  • building and operational tools and dashboards

Nice to have

  • public cloud platform experience
  • multi-AD/AZ and regional data centers
  • large distributed systems
  • continuous integration/deployment pipelines
  • working with internal customers

What the JD emphasized

  • Rock-solid developers and distributed systems engineers
  • deep understanding of distributed systems and algorithms
  • diving deep into any part of the stack
  • software debugging and low-level systems troubleshooting

Other signals

  • GPU platform for AI/ML/HPC workloads
  • distributed systems
  • high-performance computing