Senior Machine Learning Operations Engineer

Mercury Mercury · Fintech · Remote · Software Engineering

This role focuses on building and operating the real-time inference service for ML models in a fintech setting, emphasizing low latency and high availability. It involves owning model deployment infrastructure, observability, and partnering with data science for production operation. The goal is to create a reliable production ML lifecycle platform.

What you'd actually do

  1. Build and operate the real-time inference service that scores models for the risk decision engine, with low latency and high availability as first-class requirements
  2. Own model deployment infrastructure — registry and versioning, CI/CD with performance, bias, and consistency checks, shadow mode, and staged rollouts
  3. Build model observability: availability, latency, and error monitoring, plus drift detection as a retraining trigger
  4. Partner with Risk Data Science to take models from a clean development-to-production handoff through to production operation under MLP ownership
  5. Implement experimentation capabilities such as champion/challenger and canary routing, and explainability outputs like SHAP attributions

Skills

Required

  • Python
  • API frameworks (FastAPI or Flask)
  • model deployment and lifecycle tooling
  • model registries
  • CI/CD for models
  • versioning
  • staged rollout patterns
  • observability and alerting for production services
  • SQL
  • key-value/low-latency stores (Redis, DynamoDB, or equivalent)
  • streaming pipelines (Kafka, Kinesis, Redpanda, or equivalent)

Nice to have

  • modern data stack (Snowflake, dbt, Dagster, Airflow, or similar)
  • regulated, audit-sensitive, or compliance-adjacent environment
  • functional languages
  • Haskell
  • React
  • TypeScript

What the JD emphasized

  • low latency
  • high availability
  • model observability
  • real-time inference

Other signals

  • production ML lifecycle
  • real-time inference
  • model observability
  • low-latency, high-availability