What you'd actually do

Build and improve scalable distributed systems capable of processing massive amounts of data.

Write efficient code, run rigorous performance tests, and create highly reliable features that stay online during network issues and system updates.

Troubleshoot live production problems, participate in on-call support duties, and set up proactive dashboards and alerts to catch errors early.

Maintain platform security and infrastructure automation while independently managing project timelines, collaborating across teams, and continuously improving engineering processes.

Implement and develop core components of horizontally and vertically scalable distributed systems powering a robust Access Governance product.

Skills

Required

distributed systems
data processing
performance testing
reliability engineering
scalability
fault tolerance
incident response
infrastructure automation
cloud infrastructure
security
access governance

Nice to have

recovery-oriented computing
circuit breakers
retries
timeouts
telemetry systems
fault-injection
brown-outs
Infrastructure as Code
change management protocols

Build and improve scalable distributed systems capable of processing massive amounts of data. Write efficient code, run rigorous performance tests, and create highly reliable features that stay online during network issues and system updates. Troubleshoot live production problems, participate in on-call support duties, and set up proactive dashboards and alerts to catch errors early. Maintain platform security and infrastructure automation while independently managing project timelines, collaborating across teams, and continuously improving engineering processes.

As a member of the software engineering division, implement and develop core components of horizontally and vertically scalable distributed systems powering a robust Access Governance product. Optimize code and system performance for large-scale data processing and hyper-scale environments. Leverage data plane platforms to efficiently handle massive data retrieval, storage, and processing. Execute comprehensive performance and load testing to continuously validate and meet scalability requirements.

Collaborate across engineering teams to build fault-tolerant components capable of withstanding in-service updates through redundancy, replication, and automatic failover. Apply recovery-oriented computing principles, implementing circuit breakers, retries, and timeouts to mitigate network unreliability and service disruptions. Build and customize dashboards, telemetry systems, and alerting mechanisms to proactively monitor system health and detect failures. Evaluate system correctness and maintain data integrity by executing complex test scenarios, such as fault-injection and brown-outs, alongside standard data replication techniques.

Diagnose, debug, and resolve complex issues in system components to ensure continuous, uninterrupted operations. Participate in operational support rotations, actively assisting in rapid incident response, root cause investigations, and runbook execution. Design and maintain automation scripts and Infrastructure as Code tooling to troubleshoot operational issues and efficiently manage cloud infrastructure. Apply advanced security measures, including encryption and access controls, to protect multi-tenant environments while strictly adhering to change management protocols for safe patching, updating, and application rollbacks.

Track project timelines independently, prioritizing tasks and adjusting workflows dynamically to meet shifting resource and project demands. Collaborate seamlessly across the organization to achieve shared objectives, actively listening to diverse perspectives to support specific stakeholder and customer needs. Troubleshoot standard and non-standard technical issues through rigorous, data-driven analysis. Foster a culture of continuous learning and improvement by actively building new skills, recommending process upgrades, and sharing technical knowledge with team members.

Career Level - IC3