Sr. Machine Learning Operations Engineer at Smartsheet

What you'd actually do

Automate the deployment and retraining of ML models, from training through to production inference, by building and managing complete CI/CD/CT (Continuous Training) pipelines, adhering to MLOps best practices.

Build, fine-tune, or use pre-trained LLMs, deep learning models or traditional machine learning models.

Implement model versioning, lineage tracking, and auditing to ensure compliance with security and ethical standards.

Continuously monitor the health and performance of production machine learning models, proactively identifying and correcting model drift, staleness, and performance degradation.

Provision and manage scalable cloud infrastructure using Infrastructure as Code (IaC).

Skills

Required

5+ years of experience with creating, deploying and scaling machine learning solutions in a cloud environment (eg. AWS, GCP, Azure)
ability to use tools such as SageMaker, Glue, Lambda, Docker etc. to create ML models and data pipelines
7+ years of programming experience in languages used in AI/ML (eg python, scala etc)
4+ years of experience in developing deep learning and traditional ML models using common frameworks like pytorch, tensorflow, huggingface, scikit-learn etc.
Strong applied data science skills - ability to recognize data patterns, understand how and when to use various machine learning approaches (eg. supervised/unsupervised learning, deep learning etc.), and evaluate the performance of ML algorithms.
Proven ability to remain up-to-date with the latest advancements in Generative AI approaches (eg. OpenAI, LangChain, Stable Diffusion APIs).
Experience developing, documenting, and supporting REST APIs
A degree in Computer Science, Engineering, or a related field or equivalent practical experience.

What the JD emphasized

architect our machine learning production lifecycle

maintain and deploy ML models to a scalable, reliable, and secure production environment

design and maintain the infrastructure, automation, and monitoring systems

ensure our AI products are high-performing and cost-effective

Automate the deployment and retraining of ML models

building and managing complete CI/CD/CT (Continuous Training) pipelines

adhering to MLOps best practices

Build, fine-tune, or use pre-trained LLMs

Evaluate and recommend AI or ML solutions

Implement model versioning, lineage tracking, and auditing

ensure compliance with security and ethical standards

Continuously monitor the health and performance of production machine learning models

proactively identifying and correcting model drift, staleness, and performance degradation

manage necessary model retraining cycles

Act as the "glue" between Data Scientists (who build models) and Software Engineers (who consume them).

Partner effectively with software engineers, product managers and business functions to integrate the machine learning solutions

Provision and manage scalable cloud infrastructure using Infrastructure as Code (IaC).

Provide architectural guidance and mentorship

Distill complex ML concepts into easy-to-follow technical documentation.

For over 20 years, Smartsheet has helped people and teams achieve–well, anything. From seamless work management to smart, scalable solutions, we’ve always worked with flow. We’re building tools that empower teams to automate the manual, uncover insights, and scale smarter. But more than that, we’re creating space– space to think big, take action, and unlock the kind of work that truly matters. Because when challenge meets purpose, and passion turns into progress, that’s magic at work, and it’s what we show up for everyday.

Smartsheet is hiring a Senior Machine Learning Operations Engineer to architect our machine learning production lifecycle. Your mission is to maintain and deploy ML models to a scalable, reliable, and secure production environment. You will design and maintain the infrastructure, automation, and monitoring systems that ensure our AI products are high-performing and cost-effective.

You will report to our Director, Analytics Engineering & Data Governance and work from our Bangalore, India office.

You Will:

**Model and Pipeline Automation **

Automate the deployment and retraining of ML models, from training through to production inference, by building and managing complete CI/CD/CT (Continuous Training) pipelines, adhering to MLOps best practices.
Build, fine-tune, or use pre-trained LLMs, deep learning models or traditional machine learning models.
Evaluate and recommend AI or ML solutions for the product using any combination of vendor solutions and/or custom-built models.

Governance & Compliance

Implement model versioning, lineage tracking, and auditing to ensure compliance with security and ethical standards.

Performance Monitoring

Continuously monitor the health and performance of production machine learning models, proactively identifying and correcting model drift, staleness, and performance degradation.
Incorporate user feedback for iterative improvements and manage necessary model retraining cycles.

Cross-Functional Collaboration

Act as the "glue" between Data Scientists (who build models) and Software Engineers (who consume them).
Partner effectively with software engineers, product managers and business functions to integrate the machine learning solutions across smartsheet.

Architecture and Infrastructure Management

Provision and manage scalable cloud infrastructure using Infrastructure as Code (IaC).
Provide architectural guidance and mentorship to a team consisting of ML engineers, data scientists and analytics engineers.
Distill complex ML concepts into easy-to-follow technical documentation.

You Have:

5+ years of experience with creating, deploying and scaling machine learning solutions in a cloud environment (eg. AWS, GCP, Azure) and ability to use tools such as SageMaker, Glue, Lambda, Docker etc. to create ML models and data pipelines.
7+ years of programming experience in languages used in AI/ML (eg python, scala etc)
4+ years of experience in developing deep learning and traditional ML models using common frameworks like pytorch, tensorflow, huggingface, scikit-learn etc.
Strong applied data science skills - ability to recognize data patterns, understand how and when to use various machine learning approaches (eg. supervised/unsupervised learning, deep learning etc.), and evaluate the performance of ML algorithms.
Proven ability to remain up-to-date with the latest advancements in Generative AI approaches (eg. OpenAI, LangChain, Stable Diffusion APIs).
Experience developing, documenting, and supporting REST APIs
A degree in Computer Science, Engineering, or a related field or equivalent practical experience.

Get to Know Us:

At Smartsheet, your ideas are heard, your potential is supported, and your contributions have real impact. You’ll have the freedom to explore, push boundaries, and grow beyond your role. We welcome diverse perspectives and nontraditional paths—because we know that impact comes from individuals who care deeply and challenge thoughtfully. When you’re doing work that stretches you, excites you, and connects you to something bigger, that’s magic at work. Let’s build what’s next, together.

Equal Opportunity Employer:

Smartsheet is an Equal Opportunity (EEO) employer committed to fostering an inclusive environment with the best employees. It is our policy to provide equal employment opportunities to all qualified applicants in accordance with applicable laws in the US, UK, Australia, Germany, Costa Rica, Japan, Bulgaria, and India. All qualified applicants will receive consideration without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, protected veteran or disabled status, or genetic information.

If there are preparations we can make to help ensure you have a comfortable and positive interview experience, please let us know.

#LI-Remote