AI Field Engineer - Microsoft Foundry

Fireworks AI Fireworks AI · Data AI · New York, NY +1 · Go To Market

AI Field Engineer for Microsoft Foundry, focusing on making Fireworks the default inference and fine-tuning layer in Azure AI architectures. Responsibilities include building reference architectures, running benchmarks, debugging integrations, co-developing POCs, guiding customers on model selection and fine-tuning, and owning the feedback loop between partners and product development. Requires strong Python, Kubernetes, LLM inference, fine-tuning, and Azure AI stack experience.

What you'd actually do

  1. Be the technical lead on co-sell motions with Microsoft — joint reference architectures, Azure Foundry integration patterns, and shared POCs for strategic accounts.
  2. Build end-to-end POCs and MVPs alongside partner engineering teams, working inside their codebases, infrastructure, and constraints.
  3. Run load tests and establish latency, throughput, and cost baselines against realistic customer traffic profiles, and tune deployments to hit those targets.
  4. Deploy and validate new model families on inference frameworks (vLLM, SGLang), determining optimal shapes, quantization configs, and serving patterns across workloads.
  5. Guide Microsoft’s customers on model selection, fine-tuning strategy (SFT, DPO, RFT), and evaluation methodology.

Skills

Required

  • 3+ years in a pre-sales, partner engineering, forward-deployed, or technical consulting role.
  • Demonstrated ability to build production software with customers, not just advise on it.
  • Strong Python skills.
  • Comfortable reading, writing, and debugging production code.
  • Familiarity with Kubernetes and infrastructure engineering.
  • Hands-on fluency with LLM inference: latency/throughput tradeoffs, batching strategies, quantization, structured outputs, function calling.
  • Real experience with fine-tuning — LoRA at minimum, RFT a strong plus.
  • Deep familiarity with the Azure AI stack: Azure Foundry, Azure OpenAI Service, Azure ML, AKS, Entra/RBAC for AI workloads.
  • Exceptional communication: able to run a sharp discovery call, present to a VP, and debug a latency issue with an ML engineer in the same afternoon.

Nice to have

  • 5+ years in technical field or engineering roles where you've owned a technical relationship with a hyperscaler or major SI, not just supported one
  • Experience with inference serving frameworks (vLLM, SGLang, TensorRT-LLM) and tuning deployments for real workloads.
  • Prior role at a hyperscaler, AI-native cloud, or inference provider.
  • Experience with agentic frameworks (LangChain, LlamaIndex, or custom tool-use pipelines) — you understand how inference latency and reliability shapes agent behavior at scale.
  • Background in model evaluation — you understand why benchmark gaming is rampant and what rigorous evals actually look like.
  • You've written a technical blog post or reference architecture that people actually read.
  • Track record taking GenAI POCs from prototype to production-scale deployments.

What the JD emphasized

  • You have shipped code running in someone else's production environment.
  • Hands-on fluency with LLM inference: latency/throughput tradeoffs, batching strategies, quantization, structured outputs, function calling.
  • Real experience with fine-tuning — LoRA at minimum, RFT a strong plus.
  • Deep familiarity with the Azure AI stack: Azure Foundry, Azure OpenAI Service, Azure ML, AKS, Entra/RBAC for AI workloads.

Other signals

  • building the future of generative AI infrastructure
  • highest-quality models with the fastest and most scalable inference
  • leader in LLM inference speed
  • function calling and multimodal models
  • default inference and fine-tuning layer in every Azure AI architecture
  • build reference architectures, run benchmarks, debug production integrations, and co-develop POCs
  • ship code, run joint POCs with Microsoft field teams, and architect deployments
  • translate field signals into product improvements
  • own the technical relationship between Fireworks and the Microsoft ecosystem
  • scale that pattern across the partner ecosystem
  • deploy and validate new model families on inference frameworks
  • guide Microsoft’s customers on model selection, fine-tuning strategy
  • build and run fine-tuning pipelines directly with customers
  • design and implement evaluation frameworks that measure production-quality metrics
  • own the feedback loop
  • ship external technical content
  • track pipeline health
  • demonstrated ability to build production software with customers
  • hands-on fluency with LLM inference
  • real experience with fine-tuning
  • deep familiarity with the Azure AI stack
  • experience with agentic frameworks
  • background in model evaluation
  • track record taking GenAI POCs from prototype to production-scale deployments