Reliability Engineer

Apple Apple · Big Tech · Hyderabad, India · Software and Services

Software Engineer on the Applied Machine Learning Team to architect and orchestrate high-performance, scalable enterprise platforms for Data, ML, and inferencing. Focus on ensuring availability, performance, and low latency for high-throughput applications. Manage diverse workloads across ML/Data/Inference platforms and evaluate new technologies.

What you'd actually do

  1. architecting and orchestrating the high-performance, scalable enterprise platforms that underpin our groundbreaking Data, ML and inferencing platforms
  2. ensuring unparalleled availability, optimal performance, and minimal latency for our high-throughput applications
  3. management of diverse workloads across ML/Data/Inference platforms
  4. exploration and evaluation of latest open source technologies and innovative solutions

Skills

Required

  • AWS/GCP or Kubernetes Experience
  • Proficient programming knowledge in one of the following areas: Python, Java, or Go Programming
  • ability to read and explain open source codebase
  • Understanding or exposure in Operating Systems or Networking and Security Principles

Nice to have

  • Exposure to Data processing and Model Training or FineTuning methodologies
  • Exposure to Spark/Flink and other modern cloud native big data technologies
  • Exposure to Cloud managed services like AWS BedRock/GCP Vertex AI
  • Exposure to various LLM infrastructure like GPUs, TPUs & Inferentia
  • Understanding of Networking concepts on Cloud, like VPCs, DNS, Security Groups, Kubernetes network model
  • Expertise in Performance tuning JVMs & Operating Systems like Linux

What the JD emphasized

  • high-performance, scalable enterprise platforms
  • Data, ML and inferencing platforms
  • minimal latency
  • high-throughput applications
  • ML/Data/Inference platforms

Other signals

  • ML Platforms, Solutions, and Services
  • high-performance, scalable enterprise platforms
  • Data, ML and inferencing platforms
  • high-throughput applications
  • ML/Data/Inference platforms