AI Infrastructure Engineer, Model Optimization & Deployment, Optimus

Tesla Tesla · Auto · Palo Alto, CA · Tesla AI

This role focuses on optimizing and deploying ML models for Tesla's Optimus humanoid robots. The engineer will work on model optimization (latency, memory, speed), quantization, pruning, conversion to various formats, benchmarking, packaging, and deploying models as services. They will also implement CI/CD pipelines for ML models and ensure scalability and reliability in production environments, ultimately shipping models to thousands of robots.

What you'd actually do

  1. Optimize ML models for latency, memory usage, and inference speed
  2. Quantize, prune, and convert models (e.g., to ONNX, TensorRT, TFLite) for deployment on various platforms (cloud, edge, mobile)
  3. Benchmark and profile model performance across different environments
  4. Package and deploy models as REST APIs, batch jobs, or streaming services using tools like FastAPI, Flask, or gRPC
  5. Implement CI/CD pipelines for automated testing and deployment of ML models

Skills

Required

  • Python
  • PyTorch
  • model optimization tools (e.g., ONNX, TensorRT, TFLite, TVM)
  • model inference optimization and quantization
  • containerization and orchestration (Docker, Kubernetes)
  • cloud platforms (AWS, GCP, Azure)
  • serverless deployments
  • software engineering principles
  • CI/CD pipelines
  • deploying models to edge devices or mobile platforms
  • data serialization formats (e.g., protobuf, Avro)

Nice to have

  • FastAPI
  • Flask
  • gRPC
  • Prometheus
  • Grafana

What the JD emphasized

  • real-time latency constraints
  • shipping to and utilized by thousands of Humanoid Robots in real world applications

Other signals

  • Deploying models to edge devices
  • Optimizing ML models for latency, memory usage, and inference speed
  • Automate the entire workflows of training, validation, and production