Senior Staff Machine Learning Engineer

GEICO GEICO · Insurance · Bethesda, MD +2

Senior Staff Machine Learning Engineer to lead the strategy, architecture, and delivery of ML systems for the Claims organization, focusing on predictive models, data pipelines, and MLOps. The role involves designing, building, and integrating AI-powered capabilities, with potential leverage of Generative AI and agentic workflows, while ensuring reliability, performance, and governance of ML services. It requires extensive experience in ML platform development, SDLC for ML systems, and cloud environments, with a strong emphasis on end-to-end ML lifecycle management.

What you'd actually do

  1. Own ML platform architecture: data/feature pipelines, experiment tracking, model registries, serving layers, offline/online evaluation, and observability.
  2. Define standards for reliability, performance, cost efficiency, security, governance, and model risk management across ML services.
  3. Lead design and implementation of models across classical ML and deep learning (e.g., gradient boosted trees, sequence models, Transformers for tabular/time-series/NLP where relevant).
  4. Build scalable training and inference pipelines; establish CI/CD for ML, automated evaluations, canary releases, and rollback strategies.
  5. Implement monitoring for data quality, drift, fairness, latency, reliability, and cost; lead incident response and postmortems.

Skills

Required

  • Bachelor’s degree or above in Computer Science, Engineering, Statistics, or related field.
  • 10+ years of professional software development experience using at least two general-purpose languages (e.g., Java, C++, Python, C#).
  • 10+ years architecting, designing, and building multi-component ML platforms leveraging open-source/cloud-agnostic components: Search/vector: ElasticSearch, Qdrant (as applicable to ML features and retrieval); Data warehouse/lakehouse: Snowflake; familiarity with Parquet/Delta/Iceberg; Streaming: Kafka; plus Flink/Spark Streaming experience; Datastores: PostgreSQL; NoSQL (MongoDB, Cassandra); Distributed compute: Spark, Ray; Workflow orchestration: Airflow, Temporal
  • 6+ years managing end-to-end SDLC for ML systems: version control, CI/CD, Kubernetes, testing (unit/integration/data/ML eval), monitoring/alerting, production support.
  • 6+ years working with cloud providers (Azure and/or AWS) in production ML contexts.

Nice to have

  • Experience leveraging or fine-tuning LLMs (e.g., GPT, Llama, Mistral, Claude) to augment ML workflows, retrieval, or claims-facing tooling.
  • Hands-on with MLOps tooling: MLflow/Kubeflow, model registries, feature stores (e.g., Feast), experiment tracking, A/B testing and online evaluation frameworks.
  • Observability: Prometheus/Grafana, OpenTelemetry; SLO-driven operations and incident management.
  • Model safety, fairness, explainability (e.g., SHAP/LIME), and regulatory compliance; familiarity with model risk management practices.
  • Insurance/financial services domain experience: claims automation, fraud detection, risk modeling, subrogation, severity/triage, and regulatory stewardship.
  • Experience with high-throughput, low-latency inference and real-time feature pipelines.

What the JD emphasized

  • 10+ years of professional software development experience using at least two general-purpose languages (e.g., Java, C++, Python, C#).
  • 10+ years architecting, designing, and building multi-component ML platforms leveraging open-source/cloud-agnostic components
  • 6+ years managing end-to-end SDLC for ML systems
  • 6+ years working with cloud providers (Azure and/or AWS) in production ML contexts.

Other signals

  • end-to-end ML systems
  • ML platform architecture
  • Generative AI
  • predictive models
  • production-grade MLOps