About the role

We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.

Responsibilities:

Provide infrastructure support to our ML research and product
Build tooling to diagnose cluster issues and hardware failures
Monitor deployments, manage experiments, and generally support our research
Maximize GPU allocation and utilization for both serving and training

Requirements:

4+ years of experience supporting the infrastructure within an ML environment
Experience in developing tools used to diagnose ML infrastructure problems and failures
Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)
Experience working with GPUs

Nice to have

Experience with large GPU clusters and high-performance computing/networking
Experience with supporting large language model training
Experience with ML frameworks like Pytorch/TensorFlow/JAX
Experience with GPU kernel development

About Character.AI

Character.AI empowers people to connect, learn and tell stories through interactive entertainment. Over 20 million people visit Character.AI every month, using our technology to supercharge their creativity and imagination. Our platform lets users engage with tens of millions of characters, enjoy unlimited conversations, and embark on infinite adventure_s._

In just two years, we achieved unicorn status and were honored as Google Play's AI App of the Year—a testament to our innovative technology and visionary approach.

Join us and be a part of establishing this new entertainment paradigm while shaping the future of Consumer AI!

At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.

About the role

We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.

Responsibilities:

Provide infrastructure support to our ML research and product

Build tooling to diagnose cluster issues and hardware failures

Monitor deployments, manage experiments, and generally support our research

Maximize GPU allocation and utilization for both serving and training

Requirements:

4+ years of experience supporting the infrastructure within an ML environment

Experience in developing tools used to diagnose ML infrastructure problems and failures

Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)

Experience working with GPUs

Nice to have

Experience with large GPU clusters and high-performance computing/networking

Experience with supporting large language model training

Experience with ML frameworks like Pytorch/TensorFlow/JAX

Experience with GPU kernel development

About Character.AI

In just two years, we achieved unicorn status and were honored as Google Play's AI App of the Year—a testament to our innovative technology and visionary approach.

Join us and be a part of establishing this new entertainment paradigm while shaping the future of Consumer AI!

Machine Learning Infrastructure Engineer

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

About the role

About Character.AI

About the role

About Character.AI