Senior Machine Learning Engineer

Cloudflare Cloudflare · Enterprise · India · Remote · Business Intelligence

Senior Machine Learning Engineer to join the Data Intelligence & Analytics team, focusing on scaling AI/ML models and building/operating pipelines for AI-driven applications, Agents, and Chatbots. The role involves end-to-end ownership from requirements to deployment, working with modern AI infrastructure and tools like vector databases and Workers AI. Responsibilities include deploying ML applications on Kubernetes, understanding MLOps, leading efficiency improvements in training-to-deployment, and leveraging Cloudflare products for AI/ML initiatives. Experience with LLMs, frameworks like Langchain/LangGraph, and deploying ML systems is required.

What you'd actually do

  1. Deploy, manage & support ML Applications & Services on Kubernetes
  2. Understand MLOps landscape i.e tooling, tech stack, source systems etc. and work on introducing new tools and solutions for ML & AI initiatives.
  3. Partner and align with Data Scientists, Data Engineers and internal teams to deliver ML solutions in a globally distributed environment.
  4. Lead development of efficiencies to boost model training to deployment lead times
  5. Understand business/product strategy and high-level roadmap and align analysis efforts to enable them with data insights and help achieve their strategic goals.

Skills

Required

  • 5+ years of ML Engineering experience with proven industry experience in a large scale environment (PBs scale & globally distributed teams)
  • Strong experience in scientific computing using Python with Scikit-Learn & PyTorch or Tensorflow.
  • Strong experience working with Docker & Kubernetes to build and deploy applications and systems.
  • Experience working with Data Scientists to deploy Machine Learning applications systems for training, inference and observability.
  • Proficiency in large language models and the frameworks like Langchain, LangGraph, etc. necessary for implementing GenAI applications, such as chatbots and related use cases.
  • Demonstrated ability to design scalable, reliable, and observable systems, with experience influencing architecture and improving platform found

Nice to have

  • Experience with ML Platform tools (AirFlow, Argo Workflows, ArgoCD) preferred.
  • Experience with Full-stack Web technologies and languages (FastAPI, Streamlit, JavaScript/TypeScript, Cloudflare Workers, etc.) preferred with the ability to quickly learn and contribute across a multi-language stack.
  • Experience with Terraform, Google Cloud Platform (or any other public cloud equivalent) etc.
  • Experience working with CI/CD systems, version control (Git, Bitbucket, etc.), testing (Pytest, etc.) and DevOps tools.
  • Experience with Databases such as BigQuery, Postgres, SQLite and ETL/ELT practices
  • Strong cross-functional collaboration experience with data engineering and data analysts teams within the function.

What the JD emphasized

  • build and operate the pipelines behind AI-driven applications, Agents, Chatbots
  • end-to-end — from shaping requirements and designing systems to implementation, deployment, and long-term ownership
  • scalable, reliable services and application backends
  • strong AI components
  • vector databases
  • ML applications systems for training, inference and observability
  • implementing GenAI applications, such as chatbots and related use cases
  • design scalable, reliable, and observable systems

Other signals

  • build and operate the pipelines behind AI-driven applications, Agents, Chatbots
  • deploy, manage & support ML Applications & Services on Kubernetes
  • MLOps landscape
  • introduce new tools and solutions for ML & AI initiatives
  • deliver ML solutions
  • boost model training to deployment lead times
  • publish model scores/insights/learnings at scale
  • implementing GenAI applications, such as chatbots