What you'd actually do

Design, implement, and maintain scalable observability solutions for cloud-native environments

Own monitoring across AWS and Kubernetes (EKS) environments, covering clusters and workloads

Operate and maintain self-hosted monitoring stacks (e.g., Prometheus, Grafana, Mimir, Loki, Tempo)

Manage and optimize DataDog (metrics, logs, APM, alerts, cost monitoring)

Improve observability architecture to support high availability, scalability, and fault tolerance

Skills

Required

Monitoring/observability engineering
Cloud-native environments
AWS services
Kubernetes (EKS)
Prometheus
Grafana
Mimir
Loki
Tempo
DataDog
High availability architectures
Scalability architectures
Fault-tolerant architectures
Infrastructure as Code (Terraform, Helm)
CI/CD pipelines
Capacity planning
Performance tuning

Nice to have

GitHub Actions

What the JD emphasized

5+ years of hands-on experience in monitoring / observability engineering within cloud-native environments

Strong experience with AWS services 5+ years of hands-on experience working with Kubernetes

Solid knowledge of Kubernetes monitoring, including metrics, logs, and traces for clusters and workloads, alerting, SLOs, SLIs, and dashboards.

Proven experience operating and maintaining self-hosted monitoring stacks, advantage: Prometheus, Grafana, Mimir, Loki, Tempo Experience designing or improving observability architectures at scale

Experience with DataDog (metrics, logs, APM, alerts, and cost monitoring)

Experience with monitoring cost optimization, including log and trace sampling strategies, storage and retention optimization

Who we are is what we do.

Deel is the all-in-one payroll and HR platform for global teams. Our vision is to unlock global opportunity for every person, team, and business. Built for the way the world works today, Deel combines HRIS, payroll, compliance, benefits, performance, and equipment management into one seamless platform. With AI-powered tools and a fully owned payroll infrastructure, Deel supports every worker type in 150+ countries—helping businesses scale smarter, faster, and more compliantly.

Among the largest globally distributed companies in the world, our team of 7,000 spans more than 100 countries, speaks 74 languages, and brings a connected and dynamic culture that drives continuous learning and innovation for our customers.

Why should you be part of our success story?

As the fastest-growing Software as a Service (SaaS) company in history, Deel is transforming how global talent connects with world-class companies – breaking down borders that have traditionally limited both hiring and career opportunities. We're not just building software; we're creating the infrastructure for the future of work, enabling a more diverse and inclusive global economy. In 2024 alone, we paid $11.2 billion to workers in nearly 100 currencies and provided healthcare and benefits to workers in 109 countries—ensuring people get paid and protected, no matter where they are.

Our momentum is reflected in our achievements and customer satisfaction: CNBC Disruptor 50, Forbes Cloud 100, Deloitte Fast 500, and repeated recognition on Y Combinator’s top companies list – all while maintaining a 4.83 average rating from 15,000 reviews across G2, Trustpilot, Captera, Apple and Google.

Your experience at Deel will be a career accelerator. At the forefront of the global work revolution, you'll tackle complex challenges that impact millions of people's working lives. With our momentum—backed by a $17.3 billion valuation and $1 B in Annual Recurring Revenue (ARR) in just over five years—you'll drive meaningful impact while building expertise that makes you a sought-after leader in the transformation of global work.

The Observability Engineer will own the design, implementation, and evolution of our monitoring and observability ecosystem across our cloud-native SaaS platform. This role is responsible for ensuring high system reliability, performance visibility, and cost-efficient monitoring at scale. The position blends hands-on Kubernetes and AWS expertise with deep knowledge of metrics, logs, traces, and SLO-driven observability practices. The ideal candidate is a proactive problem-solver who can design scalable observability architectures, operate self-hosted monitoring stacks, optimize monitoring costs, and integrate observability into CI/CD workflows to enable resilient, high-performing production systems.

**Key Responsibilities **

Design, implement, and maintain scalable observability solutions for cloud-native environments
Own monitoring across AWS and Kubernetes (EKS) environments, covering clusters and workloads
Operate and maintain self-hosted monitoring stacks (e.g., Prometheus, Grafana, Mimir, Loki, Tempo)
Manage and optimize DataDog (metrics, logs, APM, alerts, cost monitoring)
Improve observability architecture to support high availability, scalability, and fault tolerance
Implement monitoring cost optimization strategies (log/trace sampling, retention policies, storage optimization)
Automate observability infrastructure using Infrastructure as Code (Terraform, Helm, etc.)
Integrate monitoring and alerting into CI/CD pipelines (GitHub Actions is an advantage)
Support capacity planning and performance tuning initiatives
Collaborate with DevOps, SRE, and Engineering teams to embed observability best practices
Drive continuous improvement of monitoring standards, tooling, and reliability practices

Required Skills & Experience

5+ years of hands-on experience in monitoring / observability engineering within cloud-native environments
Strong experience with AWS services 5+ years of hands-on experience working with Kubernetes
Solid knowledge of Kubernetes monitoring, including metrics, logs, and traces for clusters and workloads, alerting, SLOs, SLIs, and dashboards.
Proven experience operating and maintaining self-hosted monitoring stacks, advantage: Prometheus, Grafana, Mimir, Loki, Tempo Experience designing or improving observability architectures at scale
Experience with DataDog (metrics, logs, APM, alerts, and cost monitoring)
Strong understanding of high availability, scalability, and fault-tolerant architectures
Experience with monitoring cost optimization, including log and trace sampling strategies, storage and retention optimization
Ability to automate monitoring tasks using Infrastructure as Code and scripting (Terraform, Helm, etc.)
Familiarity with CI/CD pipelines and integrating monitoring into deployment workflows (GitHub Actions is an advantage).
Experience with capacity planning and performance tuning

**Soft Skills **

Strong problem-solving and analytical skills
Ability to work independently and take ownership of complex systems
Good communication skills, able to collaborate with DevOps, SRE, and other teams
Proactive mindset with a focus on continuous improvement

Total Rewards

Our workforce deserves fair and competitive pay that meets them where they are. With scalable benefits, rewards, and perks, our total rewards programs reflect our commitment to inclusivity and access for all.

Some things you’ll enjoy

Stock grant opportunities dependent on your role, employment status and location
Additional perks and benefits based on your employment status and country
The flexibility of remote work, including optional WeWork access

At Deel, we’re an equal-opportunity employer that values diversity and positively encourage applications from suitably qualified and eligible candidates regardless of race, religion, sex, national origin, gender, sexual orientation, age, marital status, veteran status, disability status, pregnancy or maternity or other applicable legally protected characteristics.

Unless otherwise agreed, we will communicate with job applicants using Deel-specific emails, which include @_deel.com__ and other acquired company emails like @_payspace.com_ and @_paygroup.com_. You can view the most up-to-date job listings at Deel by visiting _our careers page_.

_Deel is an equal-opportunity employer and is committed to cultivating a diverse and inclusive workplace that reflects different abilities, backgrounds, beliefs, experiences, identities and perspectives.

Deel will provide accommodations on request throughout the recruitment, selection and assessment process for applicants with disabilities. If you require accommodations, please inform our Talent Acquisition Team via this link and a team member will be in touch to ensure your equal participation. If you have difficulty accessing the form, please email at recruiting@deel.com.

As part of our hiring process, we primarily rely on interviews and role-related assessments. In limited cases, we may also consider informal background information relevant to the role, in line with our privacy and fairness obligations.

This application process does utilise Automated Employment Decision Tools (AEDT) and AI systems to assist in evaluating candidates based on experience level, technical skills and qualifications. As a fully remote company, we also utilise AI-powered deepfake and fraud detection technologies to verify the authenticity of candidate identities and interactions during assessments and interviews. This processing is conducted in compliance with applicable Data Protection, AI Governance and Labour Laws. We ensure human oversight is maintained in all final hiring decisions. Your personal data is not used to train AI models. For more information on how we process your personal data, please see our Privacy Policy.

For NYC Residents: In accordance with NYC Local Law 144, an independent bias audit has been conducted on AEDT; results are available at Ashby, Covey.