Forward Deployed Infrastructure Engineer - US Government

Palantir Palantir · Enterprise · New York, NY · Delta

This role focuses on building, operating, and maintaining scalable and reliable infrastructure for Palantir's platforms and products, including those that utilize AI and LLM technology. The engineer will be responsible for deployment, debugging, optimization, and automation of services and infrastructure, with a strong emphasis on production systems and troubleshooting.

What you'd actually do

  1. Handle support and operations of Palantir software, including monitoring and alerting, configuration management, and upgrades
  2. Deploy new Palantir products at customer deployments and perform migrations to the latest infrastructure types
  3. Debug, improve, and optimize Palantir’s services and infrastructure with a focus on long-term reliability and scalability
  4. Reduce manual operations and automate workflows, processes, and/or runbooks where possible
  5. Provide technical troubleshooting support for production issues, ensuring timely resolution and minimal impact on operations. Participate in a support on-call schedule.

Skills

Required

  • Proficiency with programming languages such as Java, Python, Bash, JavaScript, Go or similar languages
  • Ability to work with a high level of autonomy and responsibility
  • Excellent communication and interpersonal skills
  • Strong engineering background, preferred in fields such as Computer Science, Mathematics, Software Engineering, Physics, and Data Science.
  • Strong coder with shown proficiency in programming languages such as Java, Go, Python, JavaScript, or similar languages.
  • Active US Security clearance, or eligibility and willingness to obtain a US Security clearance.

Nice to have

  • Confidence in troubleshooting complex systems issues independently using observability tools and service logs.
  • Ability to identify and automate highly manual tasks, driving ongoing improvements within and across teams.
  • Comfort with large scale production systems and technologies - for example, load balancing, monitoring, distributed systems, or configuration management.

What the JD emphasized

  • high-performance, scalable, and reliable services
  • industry-leading LLM and AI technology
  • critical to solving our government’s greatest challenges
  • responsible for the operations of their services in production
  • long-term reliability and scalability
  • technical troubleshooting support for production issues
  • complex systems issues independently using observability tools and service logs
  • large scale production systems and technologies
  • Active US Security clearance