Senior Machine Learning Engineer, AI Agent Platform

GEICO · Insurance · New York, NY

Senior Machine Learning Engineer to join GEICO's AI organization, focusing on developing a virtual assistant platform for internal associates and customer experience. The role involves building multi-tenant services for AI agent development, testing, deployment, and hosting, including agent skill ecosystems, harness and context engineering, interoperability, and guardrail systems.

What you'd actually do

  1. Contribute to building an enterprise AI agent skill ecosystem — developing services that support authoring, publishing, discovering, and versioning reusable skill packages (SKILL.md standard). Implement skill marketplace features including search/discovery, security vetting pipelines, and progressive disclosure loading.
  2. Build and maintain AI agent harness components — the non-model infrastructure (tool dispatch, context management, error recovery, session state) that makes AI agents reliable for long-running workflows. Implement feedforward guides and feedback sensors mixing computational and inferential controls.
  3. Contribute to context engineering systems that manage the LLM context window — memory management, RAG pipelines, context compaction/summarization, scratchpads, and dynamic skill/tool loading — ensuring AI agents receive the right information at the right time.
  4. Implement guardrail components including input validation, prompt injection defense, PII detection, output verification, and skill-level security scanning. Contribute to bounded autonomy systems, human-in-the-loop escalation paths, and audit trail infrastructure.
  5. Collaborate with cross-functional teams including data scientists, ML engineers, software engineers, product managers, and designers to gather requirements, define project scope, and prioritize feature backlogs for AI agent use cases.

Skills

Required

  • 5+ years of professional software development experience with at least two general-purpose programming languages such as Java, C++, Python, or C#.
  • 4+ years of experience designing and building AI/ML platforms and systems utilizing open-source/cloud-agnostic components such as search engines (e.g., OpenSearch, Milvus), data warehouses (e.g., Snowflake), streaming platforms (e.g., Kafka), relational databases (e.g., PostgreSQL), NoSQL (e.g. Cassandra), distributed processing (e.g., Spark, Ray), workflow management (e.g., Airflow, Temporal), memory management (e.g., Redis/Valkey), etc.
  • 3+ years' experience contributing towards end-to-end software development lifecycles (version control, CI/CD pipelines, Kubernetes clusters, testing, monitoring & alerting, production support, etc.).
  • 3+ years' experience building evaluation and observability systems for AI/ML models and LLMs, especially utilizing GPU-powered infrastructure.
  • Familiarity with harness engineering concepts — memory management, RAG, context/ tool management, guardrails, etc.
  • Strong communication and problem-solving skills to excel in dynamic, cross-functional decision-making environments.
  • Bachelor's degree or above in Computer Science, Engineering, Statistics, or a related field.

Nice to have

  • 3+ years' experience building conversational experiences and agentic workflows, leveraging open-source and proprietary LLMs
  • Experience contributing to AI agent harness infrastructure — tool dispatch, error recovery, session state management, or sub-agent coordination using feedforward/feedback control patterns.
  • Experience with AI agent skill systems — building or integrating reusable skill packages, skill registries, MCP servers, etc. along with governance & control measures.
  • Experience with multi-agent orchestration frameworks (LangGraph, AutoGen, CrewAI).
  • Experience with LLM observability platforms such as LangSmith, Langfuse, etc.
  • Experience building ai agent guardrails

What the JD emphasized

  • proven track record of building high-performance Generative AI Systems and platforms
  • 3+ years' experience building evaluation and observability systems for AI/ML models and LLMs
  • 3+ years' experience building conversational experiences and agentic workflows

Other signals

  • AI Agent Platform
  • GenAI workflows
  • LLM-based AI agents
  • multi-tenant services
  • enterprise AI agent skill ecosystem