Principal Data Engineer, User Success

Autodesk Autodesk · Enterprise · Toronto, ON +1

Principal Data Engineer to drive the design of AI-ready data products powering analytics, machine learning, and emerging agentic experiences and insights. This role focuses on building scale batch and streaming pipelines for product telemetry to support LLMs and agentic workflows, operationalizing feature engineering and RAG-based systems, and ensuring data quality for AI-driven systems.

What you'd actually do

  1. Architect and implement scale batch and streaming pipelines for large-scale product telemetry with low-latency, high-throughput data access that support LLMs and agentic workflows optimized for:
  2. Partner with AI/ML teams to operationalize:
  3. Ensure data quality and observability meet the needs of AI-driven decision systems
  4. Enable analysts and product teams with trusted, well-modeled datasets
  5. Partner with stakeholders to translate product questions into measurable data signals

Skills

Required

  • Python
  • Spark
  • PySpark
  • advanced SQL
  • scripting
  • LLM ecosystems
  • embeddings
  • vector databases
  • Retrieval-augmented generation (RAG)
  • Agent frameworks or orchestration systems
  • streaming technologies (Kafka, Flink, Spark Streaming)
  • data governance
  • lineage
  • cataloging systems
  • ETL/ELT pipelines across batch and streaming workloads
  • modern data platforms (Iceberg, Hive, Snowflake, Redshift, Athena, or equivalent)
  • AWS services (EMR, Glue, S3, IAM, Lambda, Step Functions)
  • lead cross-functional technical initiatives
  • influence architecture
  • define engineering standards
  • mentor engineers
  • Strong communication skills

Nice to have

  • analytics engineering
  • semantic layer tools (dbt, metrics stores)
  • product analytics
  • experimentation frameworks
  • product telemetry
  • clickstream data
  • behavioral analytics
  • experimentation platforms
  • ingestion, orchestration, and transformation tools (Airflow, dbt, Fivetran, or similar)
  • modernizing data infrastructure

What the JD emphasized

  • AI-native experiences
  • agentic insights platform
  • AI-ready data products
  • LLMs and agentic workflows
  • RAG-based systems
  • agent frameworks or orchestration systems

Other signals

  • AI-native experiences
  • agentic insights platform
  • AI-ready data products
  • LLMs and agentic workflows
  • RAG-based systems