Machine Learning Engineer

Twilio Twilio · Enterprise · United States · Remote · Engineering

Machine Learning Engineer to drive innovation and develop cutting-edge ML-based systems for real-time applications, including anomaly detection, recommendation systems, predictive modeling, and agentic AI frameworks. This role involves designing, implementing, and maintaining scalable, low-latency ML solutions in production, building reproducible ML workflows, and implementing monitoring and evaluation frameworks.

What you'd actually do

  1. Partner with product, UX, and technical stakeholders to analyze business problems, clarify requirements, define scope, and translate them into measurable ML problem statements.
  2. Design, implement, and maintain scalable, enterprise-grade ML solutions in production.
  3. Build reproducible ML workflows for data preparation, training, evaluation, and inference using modern orchestration and MLOps tooling.
  4. Implement monitoring and evaluation frameworks to continuously improve data quality, model performance, latency, and cost through feedback loops.
  5. Partner cross-functionally with Product, Data Science/ML, Engineering, and Security to deliver resilient, scalable, and compliant ML-powered services.

Skills

Required

  • ML/AI fundamentals (statistics, probability, optimization)
  • Python
  • Java
  • SQL
  • Software engineering fundamentals (system design, testing, version control, code reviews)
  • Workflow orchestration and data pipelines (e.g., Airflow, Kubeflow)
  • Cloud data platforms/storage (e.g., SageMaker Feature Store, Snowflake, DynamoDB, OpenSearch)
  • ML lifecycle and MLOps tooling (e.g., MLflow, Metaflow, SageMaker)
  • LLM/agent frameworks (e.g., LangChain/LangGraph)
  • Model evaluation/observability tools (e.g., Galileo or similar)
  • Containerization and cloud infrastructure (Docker, Kubernetes)
  • GitOps/CI/CD tools (e.g., Argo CD)
  • Major cloud platform (AWS, GCP, or Azure)
  • Data modeling
  • Scalable systems
  • Distributed computing
  • Streaming frameworks (e.g., Spark/EMR, Flink, Kafka Streams)
  • Written and verbal communication skills
  • Agile environment experience

Nice to have

  • GPU-based implementation
  • Recommendation systems (e.g., graph-based approaches, two-tower models)
  • Time-series modeling (classical and deep learning)
  • Representation learning (e.g., embeddings)
  • Anomaly detection
  • Causal inference
  • LLMs and generative AI workflows
  • Foundation model fine-tuning
  • RAG
  • Vector databases
  • Technical leadership/impact (open-source contributions, publications, presentations)
  • Domain experience in communications, marketing automation, or customer engagement analytics
  • AI-assisted development tools (e.g., Claude, GitHub Copilot/Codex, Cursor)
  • Advanced degree (M.S. or Ph.D.) in a relevant field

What the JD emphasized

  • 5+ years of experience building, deploying, and operating data and ML systems in production.
  • Strong foundation in ML/AI (statistics, probability, optimization) with the ability to apply these concepts to real-world problems.
  • Proficient in Python, Java, and SQL; strong software engineering fundamentals (system design, testing, version control, code reviews).
  • Hands-on experience with workflow orchestration and data pipelines (e.g., Airflow, Kubeflow) and cloud data platforms/storage (e.g., SageMaker Feature Store, Snowflake, DynamoDB, OpenSearch).
  • Experience with the ML lifecycle and MLOps tooling (e.g., MLflow, Metaflow, SageMaker; LLM/agent frameworks such as LangChain/LangGraph; model evaluation/observability tools such as Galileo or similar).
  • Working knowledge of containerization and cloud infrastructure, including Docker and Kubernetes, GitOps/CI/CD tools (e.g., Argo CD), and at least one major cloud platform (AWS, GCP, or Azure).
  • Understanding of data modeling and scalable systems, including distributed computing and streaming frameworks (e.g., Spark/EMR, Flink, Kafka Streams); familiarity with GPU-based implementation is a plus.

Other signals

  • ML-based systems for real-time applications
  • streaming anomaly detection
  • recommendation systems
  • predictive modeling
  • agentic AI frameworks