What you'd actually do

Designing, Developing and overseeing the strategy and architecture of scalable and reliable AI/ML Ops platforms / pipelines

Model Deployment: Package and deploy AI/ML services to production, ensuring they are reproducible and interpretable

CI/CD Pipeline Development: Design and implement automated CI/CD (Continuous Integration/Continuous Deployment) pipelines to accelerate model deployment using tools

Infrastructure Management: Provision and optimize infrastructure for training and serving, utilizing Docker, Kubernetes, or serverless platforms

Monitoring & Observability : Implement post-deployment monitoring for model performance, data drift, and latency using tools.

Skills

Required

Experience in building and maintaining AI/ML Ops platform systems
System design, AI/ML Frameworks and tools
AI/MLOps workflows on Databricks, MLFlow, Mosaic AI Agent Framework, Unity Catalog, Vector Search, Knowledge Graph
AI/ML frameworks like LangChain, LangGraph
Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS, Azure, or GCP)
Programming languages like Python and SQL
Modern software engineering practices like Kubernetes, CI/CD, IAC tools (Preferably Terraform), Observability, monitoring and alerting
Solution Cost Optimisations and design to cost

Nice to have

Experience in Monte Carlo is preferable
Experience in AWS hosted data platform is preferable
Experience with AWS Bedrock is preferable
Experience with Terraform

What the JD emphasized

AI/ML Ops platform systems ensuring scalability, reliability, efficiency and security

In depth experience in System design, AI/ML Frameworks and tools involving large Petabytes of data with Databricks Lakehouse ecosystem

AI/MLOps workflows on Databricks , MLFlow, Mosaic AI Agent Framework, Unity Catalog, Vector Search, Knowledge Graph

Knowledge of AI/ML frameworks like LangChain, LangGraph for AI/ML Ops pipeline integration

Other signals

AI/ML Ops platforms / pipelines

Model Deployment

CI/CD Pipeline Development

Infrastructure Management for training and serving

Monitoring & Observability for model performance

Automation of retraining and data pipeline workflows

Deployment of foundation models, fine-tuning workflows, and RAG stacks

Resource Optimization for GPU/CPU utilization

MLOps workflows on Databricks

AI/ML frameworks like LangChain, LangGraph

For over 20 years, Smartsheet has helped people and teams achieve–well, anything. From seamless work management to smart, scalable solutions, we’ve always worked with flow. We’re building tools that empower teams to automate the manual, uncover insights, and scale smarter. But more than that, we’re creating space– space to think big, take action, and unlock the kind of work that truly matters. Because when challenge meets purpose, and passion turns into progress, that’s magic at work, and it’s what we show up for everyday.

Our India Global Capability Center isn't just supporting global operations—we’re leading global innovation. After scaling rapidly into a best-in-class hub, we deliver the product innovation and enterprise capabilities that accelerate our global growth, profitability, and scale. As we expand Smartsheet India, we’re searching for Senior AI/ML Ops Engineers who crave variety and ownership. You’ll have the opportunity to work across multiple teams and disciplines, building a versatile skillset while solving the complex challenges of a global platform.

You Will:

Designing, Developing and overseeing the strategy and architecture of scalable and reliable AI/ML Ops platforms / pipelines
Model Deployment: Package and deploy AI/ML services to production, ensuring they are reproducible and interpretable
CI/CD Pipeline Development: Design and implement automated CI/CD (Continuous Integration/Continuous Deployment) pipelines to accelerate model deployment using tools
Infrastructure Management: Provision and optimize infrastructure for training and serving, utilizing Docker, Kubernetes, or serverless platforms
Monitoring & Observability : Implement post-deployment monitoring for model performance, data drift, and latency using tools. Experience in Monte Carlo is preferable
Automation: Automate retraining and data pipeline workflows to ensure models stay accurate over time.
Manage the deployment of foundation models, fine-tuning workflows, and Retrieval-Augmented Generation (RAG) stacks (Vector DBs, Knowledge Graph. Experience with AWS Bedrock is preferable
Resource Optimization: Manage GPU/CPU utilization to minimize cloud costs while maintaining low-latency inference for users
Collaboration: Work closely with data scientists, data engineers, and software engineers to bridge the gap between model development and production.
Version Control & Governance: Manage versioning for data, code, and models using tools like MLflow.
Security & Compliance: Implementing data security measures, ensuring compliance with data governance policies, and protecting sensitive data
Technology Evaluation and Innovation: Staying abreast of emerging data technologies and exploring opportunities for innovation to improve the organisation’s data infrastructure
Troubleshooting and Problem Solving: Diagnosing and resolving complex data-related issues, ensuring the stability and reliability of the data platform
Perform other duties as assigned

You Have:

Enterprise SaaS software solutions with high availability and scalability
Solution handling large scale structured and unstructured data from varied data sources
Experience in building and maintaining AI/ML Ops platform systems ensuring scalability, reliability, efficiency and security
Working with Product engineering team to influence designs with data, AI and analytics use cases in mind
In depth experience in System design, AI/ML Frameworks and tools involving large Petabytes of data with Databricks Lakehouse ecosystem
AI/MLOps workflows on Databricks , MLFlow, Mosaic AI Agent Framework, Unity Catalog, Vector Search, Knowledge Graph
Knowledge of AI/ML frameworks like LangChain, LangGraph for AI/ML Ops pipeline integration
Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS, Azure, or GCP). Experience in AWS hosted data platform is preferable
Programming languages like Python and SQL
Modern software engineering practices like Kubernetes, CI/CD, IAC tools (Preferably Terraform), Observability, monitoring and alerting
Solution Cost Optimisations and design to cost
Legally eligible to work in India on an ongoing basis

Get to Know Us:

At Smartsheet, your ideas are heard, your potential is supported, and your contributions have real impact. You’ll have the freedom to explore, push boundaries, and grow beyond your role. We welcome diverse perspectives and nontraditional paths—because we know that impact comes from individuals who care deeply and challenge thoughtfully. When you’re doing work that stretches you, excites you, and connects you to something bigger, that’s magic at work. Let’s build what’s next, together.

Equal Opportunity Employer:

Smartsheet is an Equal Opportunity (EEO) employer committed to fostering an inclusive environment with the best employees. It is our policy to provide equal employment opportunities to all qualified applicants in accordance with applicable laws in the US, UK, Australia, Germany, Costa Rica, Japan, Bulgaria, and India. All qualified applicants will receive consideration without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, protected veteran or disabled status, or genetic information.

If there are preparations we can make to help ensure you have a comfortable and positive interview experience, please let us know.

#LI-Remote