Software Engineer 2

Abnormal AI Abnormal AI · Vertical AI · Bangalore, India · Hybrid · Platform & Infrastructure

Software Engineer 2 role focused on building and evolving the observability and data infrastructure platforms that power Abnormal AI's growth. Responsibilities include owning the monitoring, metrics, and alerting stack, designing developer tooling, driving SLAs/SLOs for critical shared infrastructure, and participating in on-call rotations. Requires strong backend engineering, distributed systems experience, and proficiency in Python and Golang.

What you'd actually do

  1. Own and evolve the monitoring, metrics, and alerting infrastructure that every engineering team at Abnormal depends on.
  2. Design platforms and developer tooling that remove friction — reducing deployment times, simplifying pipeline authoring, and letting product teams focus on building rather than firefighting.
  3. Drive SLAs and SLOs for critical shared infrastructure ensuring the systems behind our products are resilient and cost-efficient.
  4. Your architectural decisions on alerting pipelines and cross-environment deployments will define what products we can build and how quickly we deliver them to customers.
  5. Own features end-to-end: scoping, implementation, testing, deployment, and post-launch monitoring across multiple environments (US, EU, GovCloud)

Skills

Required

  • Backend Engineering & Distributed Systems (4+ years)
  • Python
  • Golang
  • Experience building systems that process data at scale
  • Demonstrated experience owning a service or platform end-to-end
  • Comfortable balancing feature development with operational responsibilities
  • Experience writing technical design documents
  • Track record of breaking down ambiguous problems into concrete, deliverable milestones
  • Experience with fault tolerance patterns
  • Proven incident response capability
  • Strong testing discipline
  • Ability to design systems with a forward-looking perspective
  • Ability to contribute to and influence cross-team technical direction
  • Async-first communication excellence
  • Proactive communicator
  • Solid understanding of monitoring, alerting, and observability principles

Nice to have

  • Prometheus
  • Grafana
  • Chronosphere
  • Datadog
  • New Relic
  • Honeycomb

What the JD emphasized

  • Own the observability stack (Prometheus, Chronosphere, Grafana, PagerDuty) that every team relies on to detect, diagnose, and resolve production issues
  • Own features end-to-end
  • Take ownership of 1-3 key services within Observability (Prometheus, Chronosphere, Grafana, PagerDuty pipeline) or Data Infra (Airflow, Spark) and be accountable for their reliability, performance, and evolution
  • Proven incident response capability: you've been on-call, diagnosed production issues under pressure, and driven them to resolution
  • Solid understanding of monitoring, alerting, and observability principles