Sr. Member of Technical Staff

Cerebras · Semiconductors · Headquarters +1 · Software

This role focuses on developing and maintaining cloud-based deployment workflows for AI inference software, utilizing containerization and orchestration technologies like Docker and Kubernetes. The responsibilities include ensuring system resiliency, high availability, and optimizing performance for low-latency inference tasks. The role also involves debugging, monitoring, and documenting inference services, with a strong emphasis on infrastructure-as-code and CI/CD practices.

What you'd actually do

  1. Design and develop software features that support system resiliency and high availability, including automated recovery mechanisms and fault-tolerant architecture across distributed environments.
  2. Develop and maintain cloud-based deployment workflows for AI inference software using AWS tools and services to support low-latency and scalable system performance.
  3. Develop Python-based scripts and APIs to streamline data preprocessing, inference execution, and post-processing for real-time inference tasks.
  4. Develop inference software in Docker containers and define Kubernetes orchestration strategies that ensure software reliability and efficient scaling.
  5. Triage and resolve defects in the software service by analyzing logs, metrics, and distributed traces using tools like AWS CloudWatch, Grafana, or custom Python scripts.

Skills

Required

  • Terraform
  • AWS CloudFormation
  • AWS CDK
  • Ansible
  • Docker
  • Kubernetes
  • AWS EKS
  • AWS Elastic Container Service (ECS)
  • AWS Fargate
  • Helm
  • AWS EC2
  • AWS Lambda functions
  • Auto Scaling Groups
  • AWS CloudWatch
  • AWS X-Ray
  • ELK (Elasticsearch, Logstash, Kibana)
  • Prometheus
  • Grafana
  • Python
  • Node.js
  • JavaScript
  • Flask
  • PostgreSQL
  • Redis
  • NFS
  • Jenkins
  • Git

What the JD emphasized

  • low-latency
  • scalable system performance
  • real-time inference tasks
  • efficient scaling
  • distributed traces

Other signals

  • Develop and maintain cloud-based deployment workflows for AI inference software
  • Develop inference software in Docker containers and define Kubernetes orchestration strategies
  • Debug issues related to model deployment, container orchestration, networking configurations