Software Engineer, Ray Serve

Anyscale Anyscale · Data AI · Bengaluru, KA, India · Engineering

Software Engineer role focused on building and scaling Ray Serve, a production-grade serving framework for machine learning applications. The role involves developing high-performance, distributed systems for ML model deployment, including asynchronous inference, sub-millisecond routing, zero-downtime updates, state management, multi-model orchestration, and observability tools. It requires strong systems programming, distributed systems, and cloud-native infrastructure knowledge, with a focus on production reliability and performance.

What you'd actually do

  1. Asynchronous inference: Let the client submit a request and get a request handle that asks for its requests completion while not blocking the client side. Really important for image, video, or audio generation applications.
  2. Sub-millisecond Model Routing: Design and implement intelligent request routing systems that dynamically balance load across thousands of model replicas while maintaining strict latency SLAs
  3. Zero-Downtime Model Updates: Build sophisticated traffic management systems that seamlessly transition between model versions at scale, handling terabytes of inference requests without dropping a single query
  4. State Management at Scale: With many models and many replicas deployed into production, the control loop’s state management can become the bottleneck for events such as routing, autoscaling, etc. What are the architectural improvements that can shift the envelop of scale by 10x going from 1000s replicas to 10,000s replicas, etc.
  5. Multi-Model Orchestration: Architect frameworks for complex ML pipelines where dozens of models need to communicate, share resources, and maintain end-to-end latency guarantees

Skills

Required

  • Strong Systems Fundamentals: You understand operating systems, networking, concurrency, and distributed systems at a deep level and the trade-offs that different design options imply
  • Production Experience: You've built and maintained systems that serve real users at scale
  • Code Quality: Have a good taste in code quality, simplicity, generality, testing coverage. AI-agents write a lot of code in short time, you should be able to instruct them to output what is golden standard
  • Ownership Mindset: You take responsibility for your code in production—from design to deployment to incident response

Nice to have

  • Experience with distributed systems frameworks (gRPC, Ray)
  • Background in ML/AI systems or serving infrastructure
  • Contributions to major open source projects
  • Experience with performance optimization and profiling
  • Knowledge of cloud-native technologies (Kubernetes, Istio, etc.)

What the JD emphasized

  • production-grade serving framework
  • scale an ML application from their laptop to the cluster
  • high-performance machine learning serving systems
  • seamless deployment of complex ML applications in production
  • production-grade serving framework
  • fundamental computer science problems that directly impact how the world deploys AI
  • performance-critical code
  • Distributed Systems at Scale
  • Cloud-Native Infrastructure
  • ML/AI Systems
  • Production Reliability
  • Availability and performance are our key objectives as a serving infrastructure
  • Strong Systems Fundamentals
  • Production Experience
  • Ownership Mindset
  • end to end ownership

Other signals

  • building the infrastructure that powers AI applications
  • Ray Serve is the production-grade serving framework
  • deploy complex ML applications in production
  • scale an ML application from their laptop to the cluster