Software Engineer, Model Routing & Inference

at Cursor · Coding AI · New York, NY · Engineering

Software Engineer on the Model Routing & Inference team responsible for building and evolving the inference platform that powers all AI interactions in the product, focusing on speed, reliability, and cost-effectiveness at scale.

What you'd actually do

  1. Build the inference platform that powers every AI interaction in the product.
  2. Own the full inference path: making Cursor's AI faster, more reliable, and more cost-effective at a scale few teams in the world get to operate at.
  3. Build and evolve our inference gateway, a single abstraction over every provider's API semantics, so model onboarding becomes a config change.
  4. Design intelligent cross-provider failover so no single provider outage causes user-visible degradation.
  5. Design routing backpressure and admission control so traffic spikes don't cascade into providers.

Skills

Required

  • building high-throughput, low-latency distributed systems
  • inference serving
  • traffic routing
  • real-time data pipelines
  • cost/performance tradeoffs at scale
  • GPU utilization
  • provider economics
  • capacity planning
  • strong software engineering fundamentals
  • shipping production systems

Nice to have

  • reasoning about reliability, cost, latency, and user experience

What the JD emphasized

  • high-throughput, low-latency distributed systems
  • inference serving
  • millions of requests

Other signals

  • inference platform
  • high-throughput, low-latency distributed systems
  • millions of requests
Read full job description

Our mission is to automate coding. The first step in our journey is to build the best tool for professional programmers, using a combination of inventive research, design, and engineering. Our organization is very flat, and our team is small and talent dense. We particularly like people who are truth-seeking, passionate, and creative. We enjoy spirited debate, crazy ideas, and shipping code.

About the Role

As a Software Engineer on the Model Routing & Inference team at Cursor, you'll build the inference platform that powers every AI interaction in the product.

This team owns the full inference path: making Cursor's AI faster, more reliable, and more cost-effective at a scale few teams in the world get to operate at. Every agent session, every tab completion, and every chat message flows through your stack.

Example projects include...

  • Building and evolving our inference gateway, a single abstraction over every provider's API semantics, so model onboarding becomes a config change.
  • Designing intelligent cross-provider failover so no single provider outage causes user-visible degradation.
  • Designing routing backpressure and admission control so traffic spikes don't cascade into providers.

You may be a fit if

  • You have deep experience building high-throughput, low-latency distributed systems, especially in inference serving, traffic routing, or real-time data pipelines.
  • You're comfortable reasoning about cost/performance tradeoffs at scale (GPU utilization, provider economics, capacity planning).
  • You have strong software engineering fundamentals and enjoy shipping production systems that handle millions of requests.
  • You make good calls in the gray area: weighing reliability, cost, latency, and user experience when there isn't a single "right" answer.

Applying

If there appears to be a fit, we'll reach to schedule 2-3 short technicals. After, we'll schedule an onsite in our office, where you'll work on a small project, discuss ideas, and meet the team.

#LI-DNI