Swe Intern - Machine Learning Engineer, Language Models

Apple Apple · Big Tech · Beijing, China · Machine Learning and AI

Internship role focused on building large language models (LLMs) and generative models, including pretraining, LLM architecture, and scientific scaling. The role involves developing algorithms and systems for deep learning research and applying them to Apple products, with opportunities in text, image, speech, and video modalities.

What you'd actually do

  1. We build infrastructure, datasets, and models with fundamental general capabilities such as understanding and generation of text, images, speech, videos, and other modalities and apply these models to Apple products.
  2. As an intern, you will work with a close-knit and fast-growing team of world-class engineers and scientists to tackle some of the most challenging problems in LLMs and deep learning.
  3. In this internship role, you will focus on areas such as pretraining, large language model (LLM) architecture, and scientific scaling of LLMs.
  4. Further, you will have opportunities to identify and develop novel applications of deep learning in Apple products.

Skills

Required

  • Solid understanding of deep learning concepts
  • strong interest in applying large language models to real-world products
  • Proficient programming skills in Python
  • one of the deep learning toolkits such as JAX, PyTorch, or Tensorflow
  • Ability to work in a collaborative environment

Nice to have

  • reinforcement learning
  • data research
  • kernel optimization (e.g. pallas and triton)
  • Publication record in relevant top-tier conferences (e.g., NeurIPS, ICML, ICLR, COLM, ACL, NAACL, EMNLP)
  • Experience and proven track record in computer science competitions (e.g., ACM-ICPC, NOI/IOI, or Kaggle)
  • coding and training large language models
  • reinforcement learning
  • on-policy distillation
  • LLM context lengthening techniques

What the JD emphasized

  • push the frontier of deep learning

Other signals

  • building large language models
  • pretraining
  • scientific scaling of LLMs
  • deep learning research
  • generative models