Senior Software Engineer, Agentic Systems - Moveworks

ServiceNow ServiceNow · Enterprise · Mountain View, CA +1 · Engineering

The role is for a Senior Software Engineer focused on building the runtime infrastructure for AI agents, specifically the orchestration engine, distributed session management, event-driven message pipeline, structured concurrency, and observability infrastructure. While the company's product is an Agentic AI Assistant platform powered by LLMs, this specific role is described as a distributed systems engineering role, not an ML role, focusing on the systems that enable AI agents to plan, execute workflows, and interact with users and tools.

What you'd actually do

  1. Agent orchestration engine — A state machine that manages long-running agent sessions, coordinating planning, execution, and user interaction across multiple LLM calls and tool invocations
  2. Distributed session management — Lease-based ownership using DynamoDB conditional writes, heartbeat protocols, and crash recovery via checkpointing
  3. Event-driven message pipeline — SQS FIFO queues for ordered delivery, Kafka consumers for event processing, and real-time streaming via gRPC and [Socket.IO](http://Socket.IO)
  4. Structured concurrency — Python asyncio TaskGroups running multiple concurrent tasks per session (message polling, lease heartbeats, output publishing, orchestrator execution) with fail-fast semantics and graceful cancellation
  5. Observability infrastructure — OpenTelemetry instrumentation, distributed trace context propagation across async boundaries, custom span lifecycle management for sessions that span minutes
  6. Caching and state layers — Redis, DynamoDB KV stores with per-org/per-bot scoping, batch read optimization, and hot-reload configuration

Skills

Required

  • Distributed systems
  • Concurrent/async programming
  • Event-driven architectures
  • Database systems for infrastructure
  • Observability
  • gRPC/protobuf
  • Python
  • Go
  • 5+ years building production backend/infrastructure systems
  • Experience designing and operating systems that handle real traffic at scale
  • Comfort with ambiguity

What the JD emphasized

  • distributed systems engineering
  • Agent orchestration engine
  • Distributed session management
  • Event-driven message pipeline
  • Structured concurrency
  • Observability infrastructure
  • Caching and state layers
  • Distributed systems
  • Concurrent/async programming
  • Event-driven architectures
  • Database systems for infrastructure
  • Observability
  • gRPC/protobuf
  • 5+ years building production backend/infrastructure systems
  • systems that handle real traffic at scale
  • novel problems without textbook solutions