Principal Infrastructure Engineer

Oracle Oracle · Enterprise · TOKYO, Japan

The Principal Core Infrastructure Engineer is responsible for leading the design, deployment, and support of large-scale AI, GPU, and HPC infrastructure solutions on Oracle Cloud Infrastructure (OCI). The role partners closely with customers throughout the entire engagement lifecycle, from solution architecture and Proof of Concept (POC) through production deployment, optimization, and ongoing operational support. As a trusted technical advisor, the engineer provides guidance on cloud-native architectures, Kubernetes, Slurm, AI platforms, automation, and best practices while working closely with Product Management, Engineering, Support, Sales, and partners to deliver successful customer outcomes. In addition, the role contributes to Oracle's technical leadership by developing reusable assets, automation, reference architectures, and technical enablement content that accelerate customer adoption and strengthen Oracle's position in AI and cloud infrastructure.

What you'd actually do

  1. Architect and deploy large-scale GPU/HPC infrastructure on OCI using tools like Terraform, Ansible, Slurm and Kubernetes.
  2. Build automated solutions for cluster provisioning, software deployment, and infrastructure as code.
  3. Collaborate with Oracle’s largest enterprise customers to define and tailor solutions that meet high-performance compute and AI requirements.
  4. Support LLM-based solutions, agentic AI systems, and robotic AI platforms from design through deployment.
  5. Act as a trusted technical advisor, guiding customers on best practices, cloud migration strategies, and deployment patterns.

Skills

Required

  • GPU and HPC architecture in cloud and on-prem environments
  • scripting and automation: Python, Bash, PowerShell, Terraform, Ansible
  • cluster managers (SLURM, PBS, Bright), Kubernetes, and container orchestration
  • RDMA, Infiniband, MPI, and distributed file systems
  • Core Cloud Native experience
  • AI/ML platforms, large language models (LLMs), and inference serving stacks
  • pre-sales, technical consulting, or customer-facing solution architecture
  • communication and presentation skills

Nice to have

  • Bachelor’s or Master’s degree in Computer Science, Engineering, Mathematics, or related field
  • demonstrated thought leadership through publications, speaking engagements, or community contributions
  • Experience working with Oracle Cloud Infrastructure (OCI) or similar cloud platforms

What the JD emphasized

  • deep expertise in HPC, GPU infrastructure, and AI platform engineering
  • design and deploy large-scale accelerated computing solutions
  • lead customer engagements
  • drive adoption of cutting-edge AI workloads
  • architect and deploy complex HPC and GPU clusters, AI platforms, and intelligent agentic solutions
  • support LLM-based solutions, agentic AI systems, and robotic AI platforms
  • trusted technical advisor
  • high-performance compute and AI requirements

Other signals

  • design and deploy large-scale accelerated computing solutions
  • drive adoption of cutting-edge AI workloads on Oracle Cloud Infrastructure (OCI)
  • architect and deploy complex HPC and GPU clusters, AI platforms, and intelligent agentic solutions
  • support LLM-based solutions, agentic AI systems, and robotic AI platforms