Software Engineer, AI Infrastructure

Fireworks AI · Data AI · San Mateo, CA · Engineering

Software Engineer on the AI Infrastructure team at Fireworks AI, focusing on designing and building core systems for their generative AI platform, including infrastructure for distributed training, inference, data pipelines, CI/CD, control plane, and model serving. The role emphasizes reliability, performance, and quality of the AI system, bridging customer needs with the inference engine.

What you'd actually do

  1. Contribute to the design and development of scalable backend infrastructure that supports distributed training, inference, and data pipelines
  2. Build and maintain core backend services such as LLM CI/CD pipeline, control plane, and model serving systems
  3. Support performance optimization, cost efficiency, and reliability improvements across compute, storage, and networking layers
  4. Building frameworks and safeguards to ensure Fireworks AI has the best model quality in the industry
  5. Collaborate with performance, training, and product teams to translate research and product needs into infrastructure solutions

Skills

Required

  • Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience)
  • 3 years of experience in software engineering, with a focus on infrastructure or machine learning systems
  • Strong programming skills in Python, Go, or a similar language
  • Proven experience in ML infrastructure and tooling (e.g., PyTorch, MLflow, Vertex AI, SageMaker, Kubernetes, etc.)
  • Basic understanding of LLM knowledge (e.g., context length, disaggregated prefill, KV cache memory estimation, etc)

Nice to have

  • 5 years of experience in software engineering, with a focus on infrastructure or machine learning systems
  • Experience with open source inference engine like vLLM, Sglang, or TRT-LLM
  • Contributions to open-source infrastructure or ML projects
  • Experience in building large scale ML/MLOps infrastructure

What the JD emphasized

  • fastest and most scalable inference
  • LLM CI/CD pipeline
  • model serving systems
  • performance optimization
  • low-latency inference
  • scalable model serving

Other signals

  • building the future of generative AI infrastructure
  • highest-quality models with the fastest and most scalable inference
  • leader in LLM inference speed
  • driving cutting-edge innovation through projects like our own function calling and multimodal models