Software Developer 4

Oracle Oracle · Enterprise · United States

Software Developer role focused on building and operating ultra-high-performance GPU platforms for AI/ML/HPC workloads within Oracle Cloud Infrastructure. Responsibilities include designing and developing fundamental architectural changes for GPU delivery, health monitoring, testing, triage automation, and diagnostic services, operating at the intersection of bare metal hardware and full-stack orchestration frameworks. Requires strong distributed systems and Linux engineering skills.

What you'd actually do

  1. own the software design and development for major components of Oracle's Cloud Infrastructure
  2. dive deep into any part of the stack and low-level systems to design broad distributed system interactions
  3. launch, configure, test, and validate server platforms across OCI’s massive fleet of Compute and GPU Infrastructure
  4. partner closely across other teams in Compute, Networking, Security, Data Center Engineering, and Hardware Development to ensure OCI can launch, scale, and maintain new server platforms with minimal operational overhead and high reliability
  5. work directly with cutting edge GPU hardware and see the direct impact of your work on the business

Skills

Required

  • BS or MS degree in Computer Science or relevant technical field involving coding or equivalent practical experience
  • Deep understanding of operating systems, computer networks, and high-performance applications
  • 4+ plus year's of experience delivering and operating large-scale production systems (1000's server instances)
  • Proficient in multiple programming languages (java/python/c/c++/goLang/shell scripting)
  • Systematic problem-solving approach
  • strong communication skills
  • a sense of ownership, and drive
  • Proven ability to deliver products and experience with the full software development lifecycle

Nice to have

  • Strong background in Linux systems
  • Familiarity with system-level architecture, data synchronization, fault tolerance, and state management
  • General enterprise storage, networking, or computing experience
  • Experience with Server/GPU hardware architecture and system management
  • Experience with Infiniband or RoCE networking technologies
  • Hands-on experience designing, developing, and operating public cloud service data planes

What the JD emphasized

  • ultra-high-performance GPU platforms
  • AI/ML/HPC workloads
  • thousands of GPUs
  • distributed systems
  • Linux engineer
  • Systems triage experience
  • low-level systems
  • high performance, scalable services and tooling
  • cutting edge GPU hardware