Forward Deployed Reliability Engineer

Palantir Palantir · Enterprise · London, United Kingdom · Product Development

This role focuses on ensuring the stability and reliability of mission-critical workflows built on Palantir software. The engineer will be responsible for resolving incidents, driving product changes, improving internal tooling, and refining operational processes to enhance service quality. The approach is hands-on, involving rapid issue resolution, automation, and advocating for product enhancements based on field insights. The role also involves creating documentation and sharing best practices to improve overall reliability and efficiency.

What you'd actually do

  1. Go on-call, responding quickly and effectively to mission-critical incidents
  2. Diagnose, resolve, and proactively prevent issues encountered in the field
  3. Collaborate with internal stakeholders to increase the scalability and reliability of Foundry workflows for our customers
  4. Identify recurring pain points and inefficiencies, and take initiative to automate or streamline workflows
  5. Advocate for and implement product enhancements based on insights gleaned from the field

Skills

Required

  • Python
  • Java
  • SQL
  • parallel data processing
  • Spark job optimisation
  • root cause analysis
  • documenting solutions

Nice to have

  • script to automate a manual task
  • creative workarounds
  • building a case for a product enhancement

What the JD emphasized

  • mission-critical workflows
  • stability and reliability
  • resolve problems before the customer is impacted
  • drive product change
  • shape our internal tooling
  • refine our operational processes
  • increasing quality of service
  • hands-on and pragmatic
  • rapidly address issues
  • quick and effective solutions
  • advocate for workflow or product improvements
  • simplify, automate, and make the entire system more resilient
  • synthesises learnings from support into best practices
  • raise the bar for reliability and efficiency