Principal Data Engineer / Architect - Individual Contributor

Smartsheet Smartsheet · Seattle · India · Engineering - Developers

Principal Data Engineer/Architect at Smartsheet focusing on building and optimizing data platforms, pipelines, and infrastructure for AI/ML use cases. The role involves designing scalable data architectures, managing large datasets (Petabytes), and implementing AI/MLOps workflows using tools like Databricks, MLFlow, and LangChain. Emphasis on enterprise SaaS, cloud platforms, and modern software engineering practices.

What you'd actually do

  1. Designing and overseeing the architecture of scalable and reliable data platforms, including data pipelines, storage solutions, and processing systems
  2. Developing and implementing data models, ensuring data quality, and establishing data governance policies
  3. Building and optimising data pipelines for ingesting, processing, and transforming large datasets from various sources
  4. Identifying and resolving performance bottlenecks in data pipelines and systems, ensuring efficient data retrieval and processing
  5. Staying abreast of emerging data technologies and exploring opportunities for innovation to improve the organisation’s data infrastructure

Skills

Required

  • Enterprise SaaS software solutions with high availability and scalability
  • Solution handling large scale structured and unstructured data from varied data sources
  • Experience in building and maintaining data platform systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure ensuring scalability, reliability, efficiency and security
  • Working with Product engineering team to influence designs with data, AI and analytics use cases in mind
  • In depth experience in System design involving large Petabytes of data with Databricks Lakehouse
  • Experience in modern AI/Data infrastructure patterns, Semantics layer Organizing data for AI agents (metadata, context)
  • AI/MLOps workflows on Databricks , MLFlow, Mosaic AI Agent Framework, Unity Catalog, Vector Search, Knowledge Graph
  • Knowledge of AI/ML frameworks like LangChain, LangGraph for AI/ML Ops pipeline integration
  • Hands-on experience with at least one major cloud provider (AWS, Azure, or GCP)
  • Programming languages like Python, SQL, and potentially Java or Scala
  • Modern software engineering practices like Kubernetes, CI/CD, IAC tools, Observability, monitoring and alerting
  • Solution Cost Optimisations and design to cost

Nice to have

  • Experience in AWS hosted data platform is preferable
  • Exposure to Snowflake and Data pipeline frameworks like Airbyte/Airflow is preferable
  • Preferably Terraform

What the JD emphasized

  • Experience in modern AI/Data infrastructure patterns
  • Semantics layer Organizing data for AI agents
  • AI/MLOps workflows on Databricks
  • MLFlow
  • Mosaic AI Agent Framework
  • Unity Catalog
  • Vector Search
  • Knowledge Graph
  • Knowledge of AI/ML frameworks like LangChain
  • LangGraph for AI/ML Ops pipeline integration
  • Petabytes of data with Databricks Lakehouse

Other signals

  • AI/MLOps workflows on Databricks
  • Organizing data for AI agents
  • AI/ML frameworks like LangChain