Principal Software Engineer – Ai-native Platform Engineering

Principal Software Engineer to design and build AI-native cloud platforms, distributed systems, and intelligent automation solutions for healthcare analytics. Role involves developing highly available services, reliability platforms, observability systems, automation frameworks, and AI-powered operational tooling, with a focus on integrating Generative AI and agent-based technologies.

What you'd actually do

design and build the next generation of cloud-native platforms, distributed systems, and intelligent automation solutions that power large-scale healthcare analytics
develop highly available services, reliability platforms, observability systems, automation frameworks, and AI-powered operational tooling that enable mission-critical analytics workloads across Oracle Cloud Infrastructure and multi-cloud environments
partner with product, platform, data, and reliability teams to build scalable software systems that process massive datasets, improve developer productivity, automate operational workflows, and enhance platform resilience
help drive the adoption of Generative AI and agent-based technologies to build intelligent operational platforms, self-service infrastructure solutions, and autonomous reliability capabilities

Skills

Required

Strong software development experience in Python, Java, Go (Golang), or similar languages
Strong hands-on system design experience with the ability to architect and build large-scale distributed systems
Demonstrated expertise writing high-quality, maintainable, testable, and production-grade code
Strong understanding of software architecture, design patterns, and engineering best practices
Experience developing cloud-native applications, microservices, and platform services
Experience leading technical design discussions, architecture reviews, and complex engineering initiatives
Experience building highly available, fault-tolerant distributed systems at scale
Strong understanding of scalability, concurrency, resiliency, performance optimization, and reliability patterns
Experience developing platform services, shared frameworks, developer tooling, and self-service platforms
Knowledge of event-driven architectures, service-oriented systems, and asynchronous processing patterns
Hands-on experience building solutions using Generative AI, Agentic AI, Large Language Models (LLMs), and intelligent automation technologies
Experience integrating frameworks such as LangChain, AutoGen, CrewAI, Semantic Kernel, OpenAI, or equivalent AI platforms
Experience building AI-powered automation for: Incident investigation and root cause analysis, Operational intelligence and observability, Infrastructure lifecycle management, Engineering productivity and developer experience
Experience designing APIs, services, and platforms that incorporate AI capabilities
Strong experience with OCI, AWS, Azure, or multi-cloud environments
Experience building cloud-native services using Kubernetes, Docker, and container orchestration platforms
Strong understanding of cloud architecture, networking, security, compliance, and cost optimization
Deep experience with Infrastructure as Code (IaC) using Terraform, Ansible, and related automation frameworks
Experience building infrastructure automation, deployment tooling, and platform engineering solutions
Experience building data-intensive applications and analytics platforms
Knowledge of ETL pipelines and large-scale data processing frameworks
Understanding of distributed storage systems, columnar databases, and large-scale analytics architectures
Strong understanding of SRE principles and operational excellence practices
Experience implementing observability solutions using Prometheus, Grafana, OpenTelemetry, or similar technologies
Experience analyzing production issues and implementing durable engineering solutions
Knowledge of monitoring, alerting, reliability engineering, performance tuning, and self-healing systems
10+ years of hands-on software engineering experience designing, building, and operating large-scale distributed systems
Proven experience delivering production software in cloud-native environments
Strong track record of leading complex technical initiatives from architecture and design through deployment and operations
Experience building platform services, developer tooling, infrastructure automation frameworks, or large-scale analytics platforms
Large-scale distributed systems architecture and hands-on system design
Software engineering with strong coding proficiency in Python, Java, and/or Go
Cloud-native application development and microservices architecture
Infrastructure as Code (Terraform, Ansible) and automation engineering
Platform engineering and developer productivity tooling
Large-scale data processing and analytics systems
Performance optimization, scalability, resiliency, and reliability engineering
AI-powered platforms, intelligent automation, and agent-based system development
Experience building AI-powered software products, engineering platforms, or operational tooling
Experience integrating LLMs, agent frameworks, RAG architectures, and intelligent automation systems into production environments
Understanding of emerging AI engineering patterns and practical applications within software engineering, infrastructure, and operations

Nice to have

Experience building AI-assisted operational tooling, autonomous remediation systems, or intelligent platform services is highly desirable
Familiarity with data warehouse technologies such as Snowflake, Vertica, or equivalent platforms

What the JD emphasized

U.S. citizenship is required
must obtain and maintain a U.S. government security clearance after hire

Other signals

AI-native infrastructure
Generative AI
agent-based technologies
intelligent operational platforms
autonomous reliability capabilities

Read full job description

Join Oracle's Health Data Intelligence (HDI) team as a Principal Software Engineer, where you will design and build the next generation of cloud-native platforms, distributed systems, and intelligent automation solutions that power large-scale healthcare analytics. This role is ideal for engineers who enjoy solving complex software engineering challenges at scale. You will develop highly available services, reliability platforms, observability systems, automation frameworks, and AI-powered operational tooling that enable mission-critical analytics workloads across Oracle Cloud Infrastructure and multi-cloud environments. You will partner with product, platform, data, and reliability teams to build scalable software systems that process massive datasets, improve developer productivity, automate operational workflows, and enhance platform resilience. As Oracle continues investing in AI-native infrastructure, you will help drive the adoption of Generative AI and agent-based technologies to build intelligent operational platforms, self-service infrastructure solutions, and autonomous reliability capabilities.

U.S. citizenship is required for this position, as the successful candidate will be required to obtain and maintain a U.S. government security clearance after hire.

Required Skills

Software Engineering

Strong software development experience in Python, Java, Go (Golang), or similar languages
Strong hands-on system design experience with the ability to architect and build large-scale distributed systems
Demonstrated expertise writing high-quality, maintainable, testable, and production-grade code
Strong understanding of software architecture, design patterns, and engineering best practices
Experience developing cloud-native applications, microservices, and platform services
Experience leading technical design discussions, architecture reviews, and complex engineering initiatives

Distributed Systems & Platform Engineering

Experience building highly available, fault-tolerant distributed systems at scale
Strong understanding of scalability, concurrency, resiliency, performance optimization, and reliability patterns
Experience developing platform services, shared frameworks, developer tooling, and self-service platforms
Knowledge of event-driven architectures, service-oriented systems, and asynchronous processing patterns

AI-Native Engineering

Hands-on experience building solutions using Generative AI, Agentic AI, Large Language Models (LLMs), and intelligent automation technologies
Experience integrating frameworks such as LangChain, AutoGen, CrewAI, Semantic Kernel, OpenAI, or equivalent AI platforms
Experience building AI-powered automation for:
- Incident investigation and root cause analysis
- Operational intelligence and observability
- Infrastructure lifecycle management
- Engineering productivity and developer experience
Experience designing APIs, services, and platforms that incorporate AI capabilities
Experience building AI-assisted operational tooling, autonomous remediation systems, or intelligent platform services is highly desirable

Cloud & Infrastructure Engineering

Strong experience with OCI, AWS, Azure, or multi-cloud environments
Experience building cloud-native services using Kubernetes, Docker, and container orchestration platforms
Strong understanding of cloud architecture, networking, security, compliance, and cost optimization
Deep experience with Infrastructure as Code (IaC) using Terraform, Ansible, and related automation frameworks
Experience building infrastructure automation, deployment tooling, and platform engineering solutions

Data Engineering

Experience building data-intensive applications and analytics platforms
Knowledge of ETL pipelines and large-scale data processing frameworks
Familiarity with data warehouse technologies such as Snowflake, Vertica, or equivalent platforms
Understanding of distributed storage systems, columnar databases, and large-scale analytics architectures

Reliability Engineering

Strong understanding of SRE principles and operational excellence practices
Experience implementing observability solutions using Prometheus, Grafana, OpenTelemetry, or similar technologies
Experience analyzing production issues and implementing durable engineering solutions
Knowledge of monitoring, alerting, reliability engineering, performance tuning, and self-healing systems

What You Bring

10+ years of hands-on software engineering experience designing, building, and operating large-scale distributed systems
Proven experience delivering production software in cloud-native environments
Strong track record of leading complex technical initiatives from architecture and design through deployment and operations
Experience building platform services, developer tooling, infrastructure automation frameworks, or large-scale analytics platforms

Core Technical Expertise

Large-scale distributed systems architecture and hands-on system design
Software engineering with strong coding proficiency in Python, Java, and/or Go
Cloud-native application development and microservices architecture
Infrastructure as Code (Terraform, Ansible) and automation engineering
Platform engineering and developer productivity tooling
Large-scale data processing and analytics systems
Performance optimization, scalability, resiliency, and reliability engineering
AI-powered platforms, intelligent automation, and agent-based system development

AI-Native Experience

Experience building AI-powered software products, engineering platforms, or operational tooling
Experience integrating LLMs, agent frameworks, RAG architectures, and intelligent automation systems into production environments
Understanding of emerging AI engineering patterns and practical applications within software engineering, infrastructure, and operations

Technical Skills

Python, Java, Go (Golang)
Terraform, Ansible, Infrastructure as Code (IaC)
Kubernetes, Docker
CI/CD and DevOps platforms
Prometheus, Grafana, OpenTelemetry
Cloud platforms (OCI preferred)
Generative AI, Agentic AI, LLM frameworks, and AI-powered automation platforms

Disclaimer:

Certain U.S. based or U.S. customer or client-facing roles may be required to comply with applicable requirements, such as immunization/occupational health mandates, and/or drug testing requirements.

Range and benefit information provided in this posting are specific to the stated locations only

US: Hiring Range in USD from: $99,600 to $234,600 per annum. May be eligible for bonus, equity, and compensation deferral.

Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business. Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.

Oracle US offers a comprehensive benefits package which includes the following:

Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance

The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.

Career Level - IC4