Principal Software Engineer – Ai-native Platform Engineering

Oracle Oracle · Enterprise · United States

Principal Software Engineer to design and build AI-native cloud platforms, distributed systems, and intelligent automation solutions for healthcare analytics. Role involves developing highly available services, reliability platforms, observability systems, automation frameworks, and AI-powered operational tooling, with a focus on integrating Generative AI and agent-based technologies.

What you'd actually do

  1. design and build the next generation of cloud-native platforms, distributed systems, and intelligent automation solutions that power large-scale healthcare analytics
  2. develop highly available services, reliability platforms, observability systems, automation frameworks, and AI-powered operational tooling that enable mission-critical analytics workloads across Oracle Cloud Infrastructure and multi-cloud environments
  3. partner with product, platform, data, and reliability teams to build scalable software systems that process massive datasets, improve developer productivity, automate operational workflows, and enhance platform resilience
  4. help drive the adoption of Generative AI and agent-based technologies to build intelligent operational platforms, self-service infrastructure solutions, and autonomous reliability capabilities

Skills

Required

  • Strong software development experience in Python, Java, Go (Golang), or similar languages
  • Strong hands-on system design experience with the ability to architect and build large-scale distributed systems
  • Demonstrated expertise writing high-quality, maintainable, testable, and production-grade code
  • Strong understanding of software architecture, design patterns, and engineering best practices
  • Experience developing cloud-native applications, microservices, and platform services
  • Experience leading technical design discussions, architecture reviews, and complex engineering initiatives
  • Experience building highly available, fault-tolerant distributed systems at scale
  • Strong understanding of scalability, concurrency, resiliency, performance optimization, and reliability patterns
  • Experience developing platform services, shared frameworks, developer tooling, and self-service platforms
  • Knowledge of event-driven architectures, service-oriented systems, and asynchronous processing patterns
  • Hands-on experience building solutions using Generative AI, Agentic AI, Large Language Models (LLMs), and intelligent automation technologies
  • Experience integrating frameworks such as LangChain, AutoGen, CrewAI, Semantic Kernel, OpenAI, or equivalent AI platforms
  • Experience building AI-powered automation for: Incident investigation and root cause analysis, Operational intelligence and observability, Infrastructure lifecycle management, Engineering productivity and developer experience
  • Experience designing APIs, services, and platforms that incorporate AI capabilities
  • Strong experience with OCI, AWS, Azure, or multi-cloud environments
  • Experience building cloud-native services using Kubernetes, Docker, and container orchestration platforms
  • Strong understanding of cloud architecture, networking, security, compliance, and cost optimization
  • Deep experience with Infrastructure as Code (IaC) using Terraform, Ansible, and related automation frameworks
  • Experience building infrastructure automation, deployment tooling, and platform engineering solutions
  • Experience building data-intensive applications and analytics platforms
  • Knowledge of ETL pipelines and large-scale data processing frameworks
  • Understanding of distributed storage systems, columnar databases, and large-scale analytics architectures
  • Strong understanding of SRE principles and operational excellence practices
  • Experience implementing observability solutions using Prometheus, Grafana, OpenTelemetry, or similar technologies
  • Experience analyzing production issues and implementing durable engineering solutions
  • Knowledge of monitoring, alerting, reliability engineering, performance tuning, and self-healing systems
  • 10+ years of hands-on software engineering experience designing, building, and operating large-scale distributed systems
  • Proven experience delivering production software in cloud-native environments
  • Strong track record of leading complex technical initiatives from architecture and design through deployment and operations
  • Experience building platform services, developer tooling, infrastructure automation frameworks, or large-scale analytics platforms
  • Large-scale distributed systems architecture and hands-on system design
  • Software engineering with strong coding proficiency in Python, Java, and/or Go
  • Cloud-native application development and microservices architecture
  • Infrastructure as Code (Terraform, Ansible) and automation engineering
  • Platform engineering and developer productivity tooling
  • Large-scale data processing and analytics systems
  • Performance optimization, scalability, resiliency, and reliability engineering
  • AI-powered platforms, intelligent automation, and agent-based system development
  • Experience building AI-powered software products, engineering platforms, or operational tooling
  • Experience integrating LLMs, agent frameworks, RAG architectures, and intelligent automation systems into production environments
  • Understanding of emerging AI engineering patterns and practical applications within software engineering, infrastructure, and operations

Nice to have

  • Experience building AI-assisted operational tooling, autonomous remediation systems, or intelligent platform services is highly desirable
  • Familiarity with data warehouse technologies such as Snowflake, Vertica, or equivalent platforms

What the JD emphasized

  • U.S. citizenship is required
  • must obtain and maintain a U.S. government security clearance after hire

Other signals

  • AI-native infrastructure
  • Generative AI
  • agent-based technologies
  • intelligent operational platforms
  • autonomous reliability capabilities