What you'd actually do

Leads a multple teams to implement strategies for the architecture and delivery of interdependent, scalable distributed systems that meet organizational and customer demands.

Orchestrates cross-group optimization for high‑throughput, large‑scale data processing; aligns stakeholders on scalability requirements; and oversees elastic designs and effective use of data plane platforms.

Provides strategic oversight for fault‑tolerant, in‑service‑upgradable architectures, sets direction for partition‑aware design choices, and leads initiatives to harden networks via load‑shedding, throttling, and rate‑limiting.

Establishes expectations for formal verification and peer reviews, and sets SLO‑aligned durability and availability standards across the department.

Drives KPI and telemetry strategies; directs creation of complex dashboards and alerting for proactive health assurance; and ensures functional/correctness validation, data replication, and synchronization meet organizational needs.

Skills

Required

Distributed systems architecture
System design and scalability
Reliability engineering
Performance optimization
Data processing
Fault tolerance
Network engineering
Load balancing
Rate limiting
Formal verification
SLO management
KPI definition
Telemetry and monitoring
Incident management
Security architecture
Compliance documentation
Infrastructure as Code (IaC)
Change management

Nice to have

Cloud infrastructure management
Multi-tenant environments
Customer maintenance windows
Standard Operating Procedures (SOPs)

What the JD emphasized

scalable distributed systems

high-throughput

large-scale data processing

fault-tolerant

in-service-upgradable architectures

partition-aware design

load-shedding

throttling

rate-limiting

formal verification

SLO-aligned durability and availability standards

KPI and telemetry strategies

functional/correctness validation

data replication

synchronization

incident management

operational readiness

customer maintenance windows

SOPs

security guidance

encryption

access controls

remediation

compliance documentation

automation (IaC)

change-management alignment

patching

updating

rolling back at scale

Leads a multple teams to implement strategies for the architecture and delivery of interdependent, scalable distributed systems that meet organizational and customer demands. Orchestrates cross-group optimization for high‑throughput, large‑scale data processing; aligns stakeholders on scalability requirements; and oversees elastic designs and effective use of data plane platforms. Provides strategic oversight for fault‑tolerant, in‑service‑upgradable architectures, sets direction for partition‑aware design choices, and leads initiatives to harden networks via load‑shedding, throttling, and rate‑limiting. Establishes expectations for formal verification and peer reviews, and sets SLO‑aligned durability and availability standards across the department. Drives KPI and telemetry strategies; directs creation of complex dashboards and alerting for proactive health assurance; and ensures functional/correctness validation, data replication, and synchronization meet organizational needs. Guides organization‑wide incident management and operational readiness, eliminating customer maintenance windows and ensuring consistent SOPs. Provides strategic security guidance (encryption, access controls), oversees remediation and compliance documentation, and sponsors automation (IaC) and change‑management alignment so systems can be safely patched, updated, and rolled back at scale.

Key Responsibilities

System Design & Architecture – System Scalability:

Implements strategies across multiple teams or groups for the architecture and design of interdependent scalable distributed systems, including the use of distributed state management tools, ensuring organizational and system demands are met.
Spearheads code and/or system optimization initiatives for large-scale data processing and high-throughput requirements across multiple areas, driving improvements that support hyper-scale systems.
Facilitates collaborations to define system scalability requirements, ensuring the defined requirements meet customer expectations.
Oversees the design of interdependent systems to scale with elasticity (e.g., effectively scaling both up and down).
Drives the effective use and implementation of data plane platforms for large-scale data operations.

System Design & Architecture – System Reliability Design:

Provides strategic oversight for the architecture of fault-tolerant interdependent systems capable of withstanding in-service updates by overseeing implementation across teams of redundancy, replication, and automatic failover mechanisms.
Influences and sets direction for designing systems to effectively handle service disruptions (e.g., network partitions) by prioritizing consistency, availability, or partition tolerance.
Leads strategic optimization initiatives for handling network unreliability, including directing the design of load-shedding, throttling, and rate-limiting techniques.
Holds teams accountable for leveraging formal verification techniques to verify system designs and conduct peer reviews across teams.
Drives the design of systems that are durable and adhere to service level objectives (SLOs), developing standards for availability and durability of other computing services across the department.

System Design & Architecture – System Reliability Performance:

Drives strategies for defining key performance indicators (KPIs) and telemetry to identify risks, gaps, or cyclical dependencies in running systems, ensuring alignment with organizational goals.
Directs the creation and customization of complex dashboards, telemetry systems, and alerting mechanisms that proactively monitor and ensure optimal system health across teams.

System Design & Architecture – Correctness / Availability:

Implements strategies to effectively determine if systems are meeting functional and correctness requirements, and encourages teams to identify improvement opportunities.
Provides thought leadership on processes for formally verifying complex features to ensure system design correctness.
Oversees the implementation of data replication and synchronization techniques, ensuring data integrity and availability across the organization.

Operational Troubleshooting & Incident Management:

Provides strategic oversight for diagnosing, debugging, and resolving issues in active systems to support ongoing operation.
Directs strategies within teams to prevent interruptions, ensuring no maintenance windows are required for customers and users when resolving issues.
Drives alignment across teams for operational readiness protocol and standard operating procedures.
Provides expert guidance for complex incident response and root cause investigations.

Compliance & Security:

Provides strategic guidance in architecting robust security measures to protect data and applications in multi-tenant environments, ensuring encryption techniques and access controls are implemented.
Oversees execution of remediation plans to address identified security gaps, promoting significant improvements and continuous advancement of security measures.
Drives documentation efforts and ensures cloud infrastructure compliance with industry standards and regulations.

Automation & Change Management:

Provides strategic guidance across teams on developing and maintaining automation scripts and tools (e.g., Infrastructure as Code (IaC)) to manage cloud infrastructure.
Drives strategic alignment of change management plans for patching, updating, and rolling back applications, and oversees that system designs allow for automation of these processes.

Core Responsibilities

Planning & Execution:

Oversees and guides multiple teams on managing complex projects or initiatives, monitoring timelines, deliverables, and budgets (when applicable) to ensure strategic objectives are met.
Serves as a role model for appropriately delegating work, setting priorities, and ensuring alignment with business needs.
Coaches others on adjusting resources or project timelines in anticipation of business changes.

Collaboration & Partnership:

Role models leading cross-functional collaborative efforts to ensure alignment of expectations and strategic objectives.
Empowers teams to build and maintain partnerships with business leaders, stakeholders, and/or customers to address barriers and contribute to organizational success.
Drives transparency and inclusivity by modeling actively seeking, listening to, and leveraging diverse perspectives.

Problem Solving:

Shares problem-solving strategies across teams, providing oversight on complex operational and/or technical issues, as needed.
Coaches teams on analyzing highly complex data and/or information to identify solutions to ambiguous issues.
Provides direction on identifying root causes to prevent recurrence of issues.

Continuous Learning:

Pursues strategic learning opportunities to maintain expertise and apply best practices at the organizational level.
Creates opportunities for team members and leaders to build their expertise in new areas, coaching them to build innovative skills.
Identifies skill gap trends across the organization and upholds a culture that places significant emphasis on sharing knowledge and pursuing learning opportunities that advance the organization.
Evaluates the efficiency of learning strategies and recommends adjustments as needed.

Continuous Improvement:

Empowers teams to own the development and implementation of ideas that increase the efficiency and effectiveness of processes, protocols, and workflows across the department.
Coaches teams to gain buy-in for ideas and to seek feedback on approaches and methods for continued improvement.
Prioritizes and reviews the roadmap of improvement initiatives to ensure alignment with strategic direction and maximize return on investments.

Performance and Development:

Serves as a role model for driving performance across teams through tailored feedback and coaching in alignment with performance management processes, guidelines, and expectations.
Drives consistency in the application of talent development procedures and socializes performance expectations across the organization.
Ensures that individual development goals are aligned with organizational strategic initiatives.
Collaborates with HR to implement talent strategy through hiring and promotion processes.

Disclaimer:

Certain U.S. based or U.S. customer or client-facing roles may be required to comply with applicable requirements, such as immunization/occupational health mandates, and/or drug testing requirements.

Range and benefit information provided in this posting are specific to the stated locations only

US: Hiring Range in USD from: $169,800 to $355,400 per annum. May be eligible for bonus, equity, and compensation deferral.

Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business. Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.

Oracle US offers a comprehensive benefits package which includes the following:

Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance

The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.

Career Level - M4