What you'd actually do

Architect and evolve a highly scalable, multi-tenant AI/ML platform that seamlessly unifies traditional ML (classification, regression, forecasting) and Generative AI/LLM orchestration.

Design and implement robust production-grade AI Agents and Advanced Chatbots. Build reliable execution environments for Multi-Agent Systems, including state management, long-term memory architectures, and Model Context Protocol (MCP) server integrations.

Build high-throughput, low-latency application backends and orchestration layers. Partner closely with data, platform, and full-stack engineers to ensure seamless feature delivery and reliable production operations.

Act as a technical anchor for the Data Science team – enforcing rigorous engineering standards, leading design and security reviews, evaluating build-vs-buy decisions, and mapping business requirements to robust technical designs.

Evaluate trade-offs and drive adoption of modern AI infrastructure tools, optimized embedding pipelines, vector databases, and serverless compute paradigms (such as Workers AI).

Skills

Required

Extensive experience as a Senior or Lead ML Engineer
Proven track record of architecting and operating production-grade ML platforms, services and distributed backends
Strong competency in Traditional ML lifecycles (feature stores, training pipelines, model monitoring)
Deep experience in Generative AI patterns (RAG pipelines, context engineering, fine-tuning, guardrailing, and agentic AI systems)
Mastery of Python
Robust experience with modern backend ecosystems
3+ years of dedicated ML Engineering experience within a large-scale, enterprise environment
Proven ability to architect, scale, and secure reliable, highly observable distributed systems
Experience mentoring engineers
Leading by example through high-quality code and rigorous design reviews
Fostering a culture of technical excellence
Strong problem-solving skills
Demonstrated ability to independently drive complex projects through ambiguous spaces
Collaborate cross-functionally with data engineers, full-stack teams, and analysts
Hands-on proficiency in building production-grad

Nice to have

Familiarity with (or willingness to collaborate on) full-stack technologies like React and TypeScript
willingness to collaborate on full-stack technologies like React and TypeScript

What the JD emphasized

architecting and operating production-grade ML platforms, services and distributed backends

shaping your own technical roadmap

taking extreme ownership of system reliability, costs, and model performance

architect, scale, and secure reliable, highly observable distributed systems

drive complex projects through ambiguous spaces

About Us

At Cloudflare, we are on a mission to help build a better Internet. Today the company runs one of the world’s largest networks that powers millions of websites and other Internet properties for customers ranging from individual bloggers to SMBs to Fortune 500 companies. Cloudflare protects and accelerates any Internet application online without adding hardware, installing software, or changing a line of code. Internet properties powered by Cloudflare all have web traffic routed through its intelligent global network, which gets smarter with every request. As a result, they see significant improvement in performance and a decrease in spam and other attacks. Cloudflare was named to Entrepreneur Magazine’s Top Company Cultures list and ranked among the World’s Most Innovative Companies by Fast Company.

At Cloudflare, we’re not looking for people who wait for a polished roadmap; we’re looking for the builders who see the cracks in the Internet that everyone else has simply learned to live with. We value candidates who have the instinct to spot a "normalized" problem and the AI-native curiosity to create a solution using the latest tools. Our culture is built on iteration, leveraging AI to ship faster today to make it better tomorrow, while ensuring that every improvement, no matter how small, is shared across the team to lift everyone up. If you’re the type of person who values curiosity over bureaucracy, and that AI is a partner in solving tough problems to keep the Internet moving forward, you’ll fit right in.

Available Locations: Austin, TX - Hybrid

About the team

The Data Intelligence & Analytics organization builds the core data platform and internal products that power decision-making across the company. We design and operate large-scale data systems, own the company’s data lake, ingestion infrastructure, and platform tooling, and develop end-to-end applications that transform complex datasets into fast, reliable, business-critical products used daily by go-to-market, product, and engineering teams. Our work sits at the intersection of data platforms, distributed systems, and product development, giving engineers the opportunity to own meaningful problems across the stack and build systems that truly run the business.

About the role

We are looking for a visionary and hands-on Lead Machine Learning Engineer to join our Austin team. In this role, you will be the principal architect behind the next generation of our unified AI/ML platform, designing and building the infrastructure that powers everything from traditional predictive models to generative AI, large language models (LLMs), and autonomous agent frameworks.

You will own the end-to-end technical strategy, blueprint, and execution of scalable backend services and data pipelines that support AI-driven applications across go-to-market, engineering, and product teams. Because our products are initiated and owned entirely within the team, you will drive the vision from initial requirements and system design to global deployment, optimization, and long-term evolutionary ownership.

What you'll do

Architect and evolve a highly scalable, multi-tenant AI/ML platform that seamlessly unifies traditional ML (classification, regression, forecasting) and Generative AI/LLM orchestration.
Design and implement robust production-grade AI Agents and Advanced Chatbots. Build reliable execution environments for Multi-Agent Systems, including state management, long-term memory architectures, and Model Context Protocol (MCP) server integrations.
Build high-throughput, low-latency application backends and orchestration layers. Partner closely with data, platform, and full-stack engineers to ensure seamless feature delivery and reliable production operations.
Act as a technical anchor for the Data Science team – enforcing rigorous engineering standards, leading design and security reviews, evaluating build-vs-buy decisions, and mapping business requirements to robust technical designs.
Evaluate trade-offs and drive adoption of modern AI infrastructure tools, optimized embedding pipelines, vector databases, and serverless compute paradigms (such as Workers AI).\

What We Are Looking For

Extensive experience as a Senior or Lead ML Engineer, with a proven track record of architecting and operating production-grade ML platforms, services and distributed backends.
Strong competency in Traditional ML lifecycles (feature stores, training pipelines, model monitoring) alongside deep experience in Generative AI patterns (RAG pipelines, context engineering, fine-tuning, guardrailing, and agentic AI systems).
Mastery of Python and robust experience with modern backend ecosystems. Familiarity with (or willingness to collaborate on) full-stack technologies like React and TypeScript is highly valued.
A builder's mindset. You are comfortable navigating ambiguity, shaping your own technical roadmap, adapt as needed and taking extreme ownership of system reliability, costs, and model performance.

Examples of desirable skills, knowledge and experience

Technical Leadership & Systems Architecture

3+ years of dedicated ML Engineering experience within a large-scale, enterprise environment (handling petabyte-scale data and working across globally distributed teams).
Proven ability to architect, scale, and secure reliable, highly observable distributed systems, with a track record of leveling up platform foundations.
Experience mentoring engineers, leading by example through high-quality code and rigorous design reviews, and fostering a culture of technical excellence.
Strong problem-solving skills with a demonstrated ability to independently drive complex projects through ambiguous spaces and collaborate cross-functionally with data engineers, full-stack teams, and analysts.

AI, LLMOps & Agentic Engineering

Hands-on proficiency in building production-grade GenAI applications and multi-agent systems using advanced LLM frameworks like LangGraph, LangChain, or Autogen. Deep understanding of agent harness primitives, state management, memory architectures, and tool-calling loop mechanics.
Experience establishing LLMOps foundations, including automated prompt tracking, LLM evaluation pipelines (e.g., Ragas, TruLens), vector database optimization, context/token management, and real-time guardrailing/moderation layers.
Deep experience in scientific computing using Python (Scikit-Learn, PyTorch, or TensorFlow) and deploying traditional systems for end-to-end training, batch/real-time inference, and model observability.

Infrastructure, Cloud & Data Platforms

Strong experience with Docker and Kubernetes for containerization and orchestration, alongside Infrastructure-as-Code tools like Terraform and public cloud ecosystems (GCP, AWS, or Azure).
Hands-on experience with modern MLOps platform tools (e.g., Airflow, Argo Workflows, ArgoCD) and data systems including BigQuery, Postgres, and robust ETL/ELT practices.
Experience with full-stack web technologies and serverless/edge environments (FastAPI, TypeScript/JavaScript, Cloudflare Workers), with the agility to contribute across a multi-language stack.
Strong foundation in continuous integration/continuous deployment (CI/CD), testing frameworks (Pytest), and robust version control practices.

Education & Communication

M.S. or Ph.D. in Computer Science, Statistics, Mathematics, or a related quantitative field.
Exceptional written and verbal communication skills, with the ability to translate complex technical architectures into clear concepts for both engineering peers and business stakeholders.

What Makes Cloudflare Special?

We’re not just a highly ambitious, large-scale technology company. We’re a highly ambitious, large-scale technology company with a soul. Fundamental to our mission to help build a better Internet is protecting the free and open Internet.

**Project Galileo**: Since 2014, we've equipped more than 2,400 journalism and civil society organizations in 111 countries with powerful tools to defend themselves against attacks that would otherwise censor their work, technology already used by Cloudflare’s enterprise customers--at no cost.

Athenian Project: In 2017, we created the Athenian Project to ensure that state and local governments have the highest level of protection and reliability for free, so that their constituents have access to election information and voter registration. Since the project, we've provided services to more than 425 local government election websites in 33 states.

**1.1.1.1**: We released1.1.1.1 to help fix the foundation of the Internet by building a faster, more secure and privacy-centric public DNS resolver. This is available publicly for everyone to use - it is the first consumer-focused service Cloudflare has ever released. Here’s the deal - we don’t store client IP addresses never, ever. We will continue to abide by ourprivacy commitment and ensure that no user data is sold to advertisers or used to target consumers.

Sound like something you’d like to be a part of? We’d love to hear from you!

Please note that applicants who progress to the offer stage of the interview process may be asked to attend an in-person interview within one of the Cloudflare Offices or Cloudflare Hubs. More details about this will be available at that stage of the interview process.

This position may require access to information protected under U.S. export control laws, including the U.S. Export Administration Regulations. Please note that any offer of employment may be conditioned on your authorization to receive software or technology controlled under these U.S. export laws without sponsorship for an export license.

Cloudflare is proud to be an equal opportunity employer. We are committed to providing equal employment opportunity for all people and place great value in both diversity and inclusiveness. All qualified applicants will be considered for employment without regard to their, or any other person's, perceived or actual race, color, religion, sex, gender, gender identity, gender expression, sexual orientation, national origin, ancestry, citizenship, age, physical or mental disability, medical condition, family care status, or any other basis protected by law. We are an AA/Veterans/Disabled Employer.

Cloudflare provides reasonable accommodations to qualified individuals with disabilities. Please tell us if you require a reasonable accommodation to apply for a job. Examples of reasonable accommodations include, but are not limited to, changing the application process, providing documents in an alternate format, using a sign language interpreter, or using specialized equipment. If you require a reasonable accommodation to apply for a job, please contact us via e-mail at hr@cloudflare.com or via mail at 101 Townsend St. San Francisco, CA 94107.

About Us

Available Locations: Austin, TX - Hybrid

About the team

About the role

What you'll do

Architect and evolve a highly scalable, multi-tenant AI/ML platform that seamlessly unifies traditional ML (classification, regression, forecasting) and Generative AI/LLM orchestration.
Design and implement robust production-grade AI Agents and Advanced Chatbots. Build reliable execution environments for Multi-Agent Systems, including state management, long-term memory architectures, and Model Context Protocol (MCP) server integrations.
Build high-throughput, low-latency application backends and orchestration layers. Partner closely with data, platform, and full-stack engineers to ensure seamless feature delivery and reliable production operations.
Act as a technical anchor for the Data Science team – enforcing rigorous engineering standards, leading design and security reviews, evaluating build-vs-buy decisions, and mapping business requirements to robust technical designs.
Evaluate trade-offs and drive adoption of modern AI infrastructure tools, optimized embedding pipelines, vector databases, and serverless compute paradigms (such as Workers AI).\

What We Are Looking For

Extensive experience as a Senior or Lead ML Engineer, with a proven track record of architecting and operating production-grade ML platforms, services and distributed backends.
Strong competency in Traditional ML lifecycles (feature stores, training pipelines, model monitoring) alongside deep experience in Generative AI patterns (RAG pipelines, context engineering, fine-tuning, guardrailing, and agentic AI systems).
Mastery of Python and robust experience with modern backend ecosystems. Familiarity with (or willingness to collaborate on) full-stack technologies like React and TypeScript is highly valued.
A builder's mindset. You are comfortable navigating ambiguity, shaping your own technical roadmap, adapt as needed and taking extreme ownership of system reliability, costs, and model performance.

Examples of desirable skills, knowledge and experience

Technical Leadership & Systems Architecture

3+ years of dedicated ML Engineering experience within a large-scale, enterprise environment (handling petabyte-scale data and working across globally distributed teams).
Proven ability to architect, scale, and secure reliable, highly observable distributed systems, with a track record of leveling up platform foundations.
Experience mentoring engineers, leading by example through high-quality code and rigorous design reviews, and fostering a culture of technical excellence.
Strong problem-solving skills with a demonstrated ability to independently drive complex projects through ambiguous spaces and collaborate cross-functionally with data engineers, full-stack teams, and analysts.

AI, LLMOps & Agentic Engineering

Hands-on proficiency in building production-grade GenAI applications and multi-agent systems using advanced LLM frameworks like LangGraph, LangChain, or Autogen. Deep understanding of agent harness primitives, state management, memory architectures, and tool-calling loop mechanics.
Experience establishing LLMOps foundations, including automated prompt tracking, LLM evaluation pipelines (e.g., Ragas, TruLens), vector database optimization, context/token management, and real-time guardrailing/moderation layers.
Deep experience in scientific computing using Python (Scikit-Learn, PyTorch, or TensorFlow) and deploying traditional systems for end-to-end training, batch/real-time inference, and model observability.

Infrastructure, Cloud & Data Platforms

Strong experience with Docker and Kubernetes for containerization and orchestration, alongside Infrastructure-as-Code tools like Terraform and public cloud ecosystems (GCP, AWS, or Azure).
Hands-on experience with modern MLOps platform tools (e.g., Airflow, Argo Workflows, ArgoCD) and data systems including BigQuery, Postgres, and robust ETL/ELT practices.
Experience with full-stack web technologies and serverless/edge environments (FastAPI, TypeScript/JavaScript, Cloudflare Workers), with the agility to contribute across a multi-language stack.
Strong foundation in continuous integration/continuous deployment (CI/CD), testing frameworks (Pytest), and robust version control practices.

Education & Communication

M.S. or Ph.D. in Computer Science, Statistics, Mathematics, or a related quantitative field.
Exceptional written and verbal communication skills, with the ability to translate complex technical architectures into clear concepts for both engineering peers and business stakeholders.

What Makes Cloudflare Special?

Sound like something you’d like to be a part of? We’d love to hear from you!

Senior Machine Learning Engineer

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

About the team

About the role

What you'll do

What We Are Looking For

Examples of desirable skills, knowledge and experience

Technical Leadership & Systems Architecture

AI, LLMOps & Agentic Engineering

Infrastructure, Cloud & Data Platforms

Education & Communication

About the team

About the role

What you'll do

What We Are Looking For

Examples of desirable skills, knowledge and experience

Technical Leadership & Systems Architecture

AI, LLMOps & Agentic Engineering

Infrastructure, Cloud & Data Platforms

Education & Communication