Senior Member of Technical Staff (ai Infrastructure)

Oracle Oracle · Enterprise · Austin, TX +1

This role is for a Senior Member of Technical Staff within Oracle Cloud Infrastructure (OCI) AI Infrastructure Engineering team. The team is building services and platforms for a large AI mega-cluster, supporting data center lifecycle management and enabling OCI to scale its AI cloud platform. The role involves designing, developing, and operating highly scalable, resilient cloud services and distributed systems for AI infrastructure, with a focus on performance, reliability, and scale. It requires strong distributed systems expertise and experience in cloud-native services.

What you'd actually do

  1. Design, develop, and operate highly scalable, resilient cloud services
  2. Drive architecture and technical direction for critical infrastructure systems
  3. Build and optimize distributed systems for performance, reliability, and scale
  4. Collaborate across engineering teams in a fast-paced, agile environment
  5. Troubleshoot complex production issues across services and infrastructure layers

Skills

Required

  • 4+ years of experience designing, developing, and operating large-scale, highly available distributed systems
  • Strong programming skills in Java, C, or C++
  • Experience in scripting languages such as Python
  • Solid understanding of distributed systems, operating systems, data structures, and algorithms
  • Experience building cloud-native services and infrastructure platforms
  • Strong troubleshooting, debugging, and performance tuning skills
  • Experience with databases, NoSQL systems, storage platforms, and distributed persistence technologies

Nice to have

  • Familiarity with AI-first software development practices and modern engineering tooling

What the JD emphasized

  • highly technical environments
  • scalable, reliable, and highly available distributed systems
  • massive scale
  • cutting-edge AI infrastructure
  • deep distributed systems expertise
  • large-scale, highly available distributed systems
  • AI-first software development practices
  • modern engineering tooling
  • AI-first software development
  • distributed systems
  • operating systems
  • data structures
  • algorithms
  • cloud-native services
  • infrastructure platforms
  • troubleshooting
  • debugging
  • performance tuning skills
  • databases
  • NoSQL systems
  • storage platforms
  • distributed persistence technologies
  • ambiguous technical problems