What you'd actually do

build out a framework and process for agent productivity evaluation

defining quantitative objectives

designing systems to measure performance

translating results into product improvements

stay on top of new developments in tools and workflows and to work with our customers to understand how they’re using coding agents with Modal and where we can be providing more value

Skills

Required

design and implement scalable agent benchmarking workflows
experience with experimental design, measurement, and statistical evaluation
up-to-date knowledge of the latest advances in coding agents
interest in developer tooling and opinions about developer ergonomics
familiarity with the use cases that Modal serves (generative AI inference, large-scale batch jobs, multi-node training, etc.)
strong communication skills and the ability to convey research insights to decision makers

Nice to have

PhD in Computer Science, Human Computer Interaction, Cognitive Science, Operations Research, or other related field
prior experience working as a Machine Learning Scientist, Quantitive UX Researcher, or other similar role on a product team

About Us:

Modal provides the infrastructure foundation for AI teams. With instant GPU access, sub-second container startups, and native storage, Modal makes it simple to train models, run batch jobs, and serve low-latency inference. We have thousands of customers who rely on us for production AI workloads, including Lovable, Scale AI, Substack, and Suno.

We're a fast-growing team based out of NYC, SF, and Stockholm. We've hit 9-figure ARR and recently raised a Series B at a $1.1B valuation. Our investors include Lux Capital, Redpoint Ventures, Amplify Partners, and Elad Gil.

Working at Modal means joining one of the fastest-growing AI infrastructure organizations at an early stage, with many opportunities to grow within the company. Our team includes creators of popular open-source projects (e.g. Seaborn, Luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience.

The Role:

Modal has always obsessed over developer experience and productivity. With rapid advancements in the capabilities of AI coding agents, the practice of developing software and the meaning of developer experience is changing. We see this as an opportunity.

We’re looking for an experienced researcher to join us and help make it even easier and more productive to build on Modal. We believe that our code-first approach to AI infrastructure is uniquely well suited to agent-based development. But we’re looking to do even better by subjecting agent productivity to rigorous evaluation and using those insights to guide the development of our platform.

You’ll work in collaboration with Modal’s SDK team and other product engineers to build out a framework and process for agent productivity evaluation. Our goal is to treat developer experience optimization as a scientific problem. You’ll be responsible for defining quantitative objectives, designing systems to measure performance, and translating results into product improvements. You’ll also be expected to stay on top of new developments in tools and workflows and to work with our customers to understand how they’re using coding agents with Modal and where we can be providing more value.

Requirements:

This is a new kind of role, and we don’t have one specific background in mind. Training in quantitative research is preferred: you might have a PhD in Computer Science, Human Computer Interaction, Cognitive Science, Operations Research, or other related field. You also might have prior experience working as a Machine Learning Scientist, Quantitive UX Researcher, or other similar role on a product team. Regardless of your exact background, we’ll be looking for the following:

Sufficient technical skills to design and implement scalable agent benchmarking workflows
Experience with experimental design, measurement, and statistical evaluation
Up-to-date knowledge of the latest advances in coding agents (with a dose of healthy skepticism about their current capabilities)
Interest in developer tooling and opinions about developer ergonomics
Familiarity with the use cases that Modal serves (generative AI inference, large-scale batch jobs, multi-node training, etc.)
Strong communication skills and the ability to convey research insights to decision makers
The ability to work in person from our New York (preferred) or San Francisco office

About Us:

The Role:

Requirements:

Sufficient technical skills to design and implement scalable agent benchmarking workflows

Experience with experimental design, measurement, and statistical evaluation

Up-to-date knowledge of the latest advances in coding agents (with a dose of healthy skepticism about their current capabilities)

Interest in developer tooling and opinions about developer ergonomics

Familiarity with the use cases that Modal serves (generative AI inference, large-scale batch jobs, multi-node training, etc.)

Strong communication skills and the ability to convey research insights to decision makers

The ability to work in person from our New York (preferred) or San Francisco office

Member of Technical Staff - Agent Dx Research

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

About Us:

The Role:

Requirements:

About Us:

The Role:

Requirements: