What you'd actually do

Design and develop large-scale distributed software services and solutions to manage AI infrastructure of OCI.

Write high quality and maintainable code by leveraging design reviews, code reviews, unit tests and integration tests.

Develop complete solutions by ensuring that the services and the components are well-defined and modularized, secure, reliable, diagnosable, actively monitored, compliant and reusable.

Focus on customer needs through a data driven approach.

Collaborate with other team members working on the same project to meet customer requirements.

Skills

Required

3+ years of experience in software development with programming languages including, but not limited to, C, C++, C#, Java, Go, Rust.
1+ year of experience designing and developing distributed systems and services.
Strong problem-solving and troubleshooting skills, with the ability to analyze complex systems and identify areas for improvement.
Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.

Nice to have

Experience in managing cloud infrastructure with hundreds of thousands of servers.
Experience in containerization technologies such as Docker and Kubernetes.
Experience in scheduling high-performance workloads on Kubernetes or Slurm.

Here at OCI we’re building the world’s largest AI clusters and we’re the fastest at bringing them to the market. This is your chance to be part of the AI revolution by creating systems that allow customers to scale from tens to thousands of GPUs without compromising performance. You will have the opportunity to work with cutting-edge technologies and make a significant impact on our organization's success.

Our team is responsible for deliver trusted, fast health determinations and customer‑initiated diagnostics that reduce false positives for GPU clusters, prevent unnecessary node returns, increase capacity for customers, protect revenue, and improve uptime—by providing an OCI‑supported safe diagnostic experience and by living OCI values.

Why Join Us?

Innovative Projects: Build groundbreaking solutions for our customers from the ground up. Exciting Times: Be part of a young, fast-growing team working on ambitious new initiatives. Dynamic Environment: Collaborate in a vibrant, agile environment where learning and adaptability are key.

What We’re Looking For?

Adaptable Engineers: Self-motivated individuals with a quick learning ability. Technical Excellence: Rock-solid developers and distributed systems engineers with a deep understanding of distributed systems and algorithms. Comfortable diving deep into any part of the stack. Passion for Simplicity and Scale: Value simplicity and scalability in design and implementation. Collaborative Spirit: Comfortable working in a collaborative, agile environment and eager to learn. Ability to collaborate effectively with various dependencies, including Network and Data Center operations.

Join us and be a part of the team that's pushing the boundaries of AI technology!

Location: Austin, TX

Design and develop large-scale distributed software services and solutions to manage AI infrastructure of OCI.
Write high quality and maintainable code by leveraging design reviews, code reviews, unit tests and integration tests.
Develop complete solutions by ensuring that the services and the components are well-defined and modularized, secure, reliable, diagnosable, actively monitored, compliant and reusable.
Focus on customer needs through a data driven approach.
Collaborate with other team members working on the same project to meet customer requirements.
Troubleshoot and optimize automation for reliability, performance, and availability.

Qualifications & Skills

BS (or equivalent experience) in Computer Science, Engineering, or related field.
3+ years of experience in software development with programming languages including, but not limited to, C, C++, C#, Java, Go, Rust.
1+ year of experience designing and developing distributed systems and services.
Strong problem-solving and troubleshooting skills, with the ability to analyze complex systems and identify areas for improvement.
Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.

Preferred Qualifications

Experience in managing cloud infrastructure with hundreds of thousands of servers.
Experience in containerization technologies such as Docker and Kubernetes.
Experience in scheduling high-performance workloads on Kubernetes or Slurm.

Disclaimer:

Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.

Range and benefit information provided in this posting are specific to the stated locations only

US: Hiring Range in USD from: $79,200 to $178,100 per annum. May be eligible for bonus and equity.

Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business. Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.

Oracle US offers a comprehensive benefits package which includes the following:

Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance

The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.

Career Level - IC3