Principal Systems Software Engineer

Oracle Oracle · Enterprise · Albuquerque, NM +1

Principal Systems Software Engineer at Oracle, focusing on complex system modules, fleet automation, and ensuring reliability, performance, and security in a hyperscale environment. The role involves deep-dive investigations, designing improvements for drivers, services, and provisioning pipelines, and guiding platform bring-up and diagnostics. It emphasizes code quality, reviews, patterns, tooling, and adherence to standards for observability, security, and compliance. Collaboration with hardware, firmware, and silicon partners is key, as is diagnosing cross-domain issues. The role also includes developing and debugging advanced features across various platforms and implementing secure coding practices.

What you'd actually do

  1. Design, develop, and maintain complex system software modules for managing, monitoring, and provisioning computer servers, storage, networking, and GPU subsystems in hyperscale environment.
  2. Utilizes advanced knowledge to design, develop, and deploy software layers and tooling to automate hardware provisioning and onboarding of servers to large-scale fleet.
  3. Play a significant role in platform bring-up, firmware updates, platform-level diagnostics, and complex cross-layer debugging.
  4. Diagnose and resolve complex issues across software, firmware, and hardware layers, working in DevOps and incident response paradigms to maintain reliability and availability.
  5. Designs software solutions and analyzes and identifies requirements to achieve business and operational goals, sharing results with manager upon completion.

Skills

Required

  • Systems Software Development
  • Fleet Automation
  • DevOps
  • Incident Response
  • Performance Optimization
  • Reliability Engineering
  • Fault Tolerance
  • Observability
  • Security
  • Compliance
  • Firmware Integration
  • Hardware/Software Debugging
  • Platform Bring-up
  • API Design and Integration
  • Secure Coding Practices
  • System Maintenance
  • Technical Leadership
  • Code Reviews

Nice to have

  • ARM, AMD, Intel, and custom ASIC/FPGA platforms
  • Logic Analyzers, JTAG, and Emulators
  • Schematic/Board Reviews

What the JD emphasized

  • complex system modules
  • fleet automation workflows
  • deep-dive investigations
  • hard faults
  • durable fixes
  • preventive controls
  • performance, reliability, and fault-tolerance improvements
  • provisioning pipelines at scale
  • observability, security, and compliance
  • platform bring-up
  • diagnostics
  • cross-domain issues
  • firmware/hardware/OS/ bootloaders
  • schematic/board reviews
  • board/device bring-up
  • advanced features
  • DevOps and incident response paradigms
  • complex cross-layer debugging
  • complex software issues
  • complex performance optimization and scalability strategies
  • customer-reported issues
  • customer satisfaction
  • customer issue and/or defect handling and training processes
  • maintenance issues
  • secure coding practices
  • critical vulnerabilities
  • service/product availability, health, support, and reliability