Member of Technical Staff

Fireworks AI Fireworks AI · Data AI · New York, NY · Engineering

This role focuses on designing, developing, and maintaining large-scale backend and cloud-native infrastructure for a generative AI platform, specifically supporting distributed machine learning training, inference, and data processing pipelines. The goal is to achieve fast and scalable inference, with a focus on efficiency and low latency.

What you'd actually do

  1. Design, develop, and maintain large-scale backend and cloud-native infrastructure to support distributed machine learning training, inference, and data processing pipelines for generative AI platform.
  2. Architect and build scalable, resilient backend infrastructure to support distributed training, inference, and data processing pipelines.
  3. Lead technical design discussions, mentor engineers, and establish best practices for large-scale machine learning systems.
  4. Design and implement core backend services with a focus on efficiency and low latency.
  5. Drive infrastructure optimization initiatives for compute cost, storage lifecycle management, and network performance.

Skills

Required

  • designing, building, and optimizing large-scale backend infrastructure and distributed data systems
  • cloud environments (AWS, GCP, Azure, or equivalent), including cloud-native platforms, core infrastructure components, and optimization techniques (caching, indexing, sharding, replication, transactions, ACID)
  • major server-side programming languages and frameworks (e.g., Python, C++, Go, TypeScript)
  • writing technical design documentation, leading cross-functional projects, and collaborating with cross-functional teams to achieve business impact
  • developing and maintaining data processing and API systems, including client-server communication frameworks (e.g., gRPC, Thrift)
  • conducting A/B testing and scientific experimentation (e.g., Statsig, Meta Deltoid, Optimizely) to measure software impact
  • conducting coding interviews and providing systematic feedback for engineering candidates
  • cloud-native tools and infrastructure, such as Docker and Kubernetes
  • defining and implementing data-driven metrics to support company or team goals

What the JD emphasized

  • large-scale backend and cloud-native infrastructure
  • distributed machine learning training, inference, and data processing pipelines
  • scalable, resilient backend infrastructure
  • low latency

Other signals

  • building the future of generative AI infrastructure
  • highest-quality models with the fastest and most scalable inference
  • LLM inference speed
  • function calling and multimodal models