Principal Machine Learning & Data Engineer

Twilio Twilio · Enterprise · United States · Remote · Engineering

This role focuses on building and operating an internal ML and data platform, including cloud-native pipelines, model-serving infrastructure, and developer tooling. It involves architecting scalable feature stores, streaming/batch pipelines, and low-latency model-serving layers on AWS, implementing MLOps best practices, and leading cross-functional engineering efforts.

What you'd actually do

  1. Architect and evolve Twilio’s end-to-end ML and real-time data platforms for reliability, security, and cost efficiency.
  2. Design scalable feature stores, streaming and batch pipelines, and low-latency model-serving layers on AWS.
  3. Implement MLOps best practices—automated testing, CI/CD, monitoring, and rollback—for hundreds of daily deployments.
  4. Own system design reviews, threat modeling, and performance tuning for high-volume communications workloads.
  5. Lead cross-functional engineering efforts, breaking down complex initiatives into executable roadmaps.

Skills

Required

  • Python
  • Java
  • Scala
  • Go
  • C++
  • Spark
  • Flink
  • SQL
  • NoSQL
  • Kafka
  • Kinesis
  • AWS
  • Terraform
  • Kubernetes
  • EKS
  • MLflow
  • Kubeflow
  • SageMaker
  • Vertex AI
  • feature engineering
  • A/B testing
  • drift detection
  • retraining

Nice to have

  • Graduate degree focused on machine learning, distributed systems, or applied statistics.
  • Contributions to open-source ML or data infrastructure projects.
  • Experience with privacy-enhancing technologies (differential privacy, homomorphic encryption) or on-device inference.
  • Background in conversational AI, real-time communications, or large-language-model deployment at scale.
  • Exposure to compliance-heavy environments (HIPAA, PCI-DSS) and secure multi-tenant design patterns.
  • Published research, patents, or conference talks in ML systems or data engineering.

What the JD emphasized

  • 7+ years building and operating production data or machine-learning systems at scale.
  • Expert fluency in Python and one compiled language (Java, Scala, Go, or C++).
  • Hands-on mastery of distributed data frameworks (Spark/Flink), SQL/NoSQL stores, and streaming platforms (Kafka/Kinesis).
  • Demonstrated success designing cloud-native architectures on AWS, including Terraform-managed infrastructure.
  • Deep knowledge of container orchestration (Kubernetes/EKS), service-mesh networking, and autoscaling strategies.
  • Practical experience implementing MLOps tooling such as MLflow, Kubeflow, SageMaker, or Vertex AI.
  • Strong grasp of model-lifecycle concerns—feature engineering, offline/online parity, A/B testing, drift detection, and retraining.
  • Proven ability to lead technical projects end-to-end and influence without authority across multiple teams.

Other signals

  • MLOps
  • model-serving
  • cloud-native
  • AWS
  • Python
  • Spark
  • Kubernetes