Staff Machine Learning Engineer, AI Agent Platform

GEICO GEICO · Insurance · New York, NY +3

Staff ML Engineer to build the next generation enterprise AI Agent OS and SDKs. This role involves designing, implementing, and maintaining scalable backend systems for AI agent workflows, including configuration, evaluation, synthetic data generation, workflow simulation, MCP server registry, A2A communication infrastructure, and guardrail enforcement. The role also focuses on building an enterprise AI agent skill ecosystem, implementing production-grade AI agent harnesses, and developing context engineering systems and observability frameworks. AI safety, governance, and guardrails are also key responsibilities.

What you'd actually do

  1. Architect scalable multi-tenant backend systems for AI agent workflows — including AI agent configuration, evaluation, synthetic data generation, workflow simulation & evaluation, MCP server registry, A2A communication infrastructure, and guardrail enforcement layers using AKS, FastAPI, etc.
  2. Build an enterprise AI agent skill ecosystem — a platform for authoring, publishing, discovering, versioning, and governing reusable skill packages that encode domain expertise into portable modules. Implement an internal skill marketplace with search/discovery, quality scoring, security vetting pipelines, approval workflows, and progressive disclosure loading.
  3. Implement production-grade AI agent harnesses — the non-model infrastructure (tool dispatch, context management, error recovery/self-healing, session state, sub-agent coordination) that makes AI agents reliable for long-running tasks. Design feedforward guides (linters, type checkers, architecture constraints) and feedback sensors (test execution, LLM-as-judge, semantic analysis) mixing computational and inferential controls.
  4. Build and optimize context engineering systems — memory hierarchies (short-term, working, long-term), RAG pipelines, scratchpads, context compaction/summarization, and dynamic skill/tool loading — ensuring AI agents receive the right information at the right time while minimizing token waste.
  5. Develop observability frameworks (OpenTelemetry, distributed tracing) with LLM-specific telemetry: token usage, latency profiling, hallucination detection, AI agent behavior auditing, and skill execution monitoring.

Skills

Required

  • Python
  • Java
  • Go
  • Kubernetes
  • Temporal
  • OpenSearch
  • PostgreSQL
  • Redis
  • Neo4j
  • Docker
  • Prometheus
  • OpenTelemetry
  • TensorFlow
  • PyTorch
  • LangGraph
  • CrewAI
  • AutoGen
  • mentoring engineers
  • leading technical initiatives
  • communication across diverse seniority levels and professional backgrounds

Nice to have

  • Cursor
  • Claude Code
  • GitHub Copilot
  • harness engineering concepts and practices
  • AI agent skill systems
  • MCP
  • A2A
  • LLM observability
  • LangSmith
  • Langfuse
  • Arize Phoenix
  • guardrail systems
  • multi-agent orchestration
  • Llama
  • Qwen
  • Mistral
  • GPT
  • Claude
  • no-code/low-code AI agent development environments

What the JD emphasized

  • AI agent skill ecosystem
  • harness engineering
  • context engineering
  • governance-first design
  • AI agent skill systems
  • harness engineering concepts and practices
  • MCP
  • A2A

Other signals

  • build the next generation enterprise AI Agent OS and SDKs
  • design, implement, and maintain scalable backend systems that enable business, product, and engineering teams to build, test, and deploy their own AI agents & workflows
  • AI agent skill ecosystem
  • production-grade AI agent harnesses
  • context engineering systems
  • observability frameworks
  • AI Safety, Governance & Guardrails