Data Science Engineer

Adobe Adobe · Enterprise · San Jose, CA

Data Science Engineer role focused on building large-scale cloud-based data and analytics platforms, including LLM agents and end-to-end data pipelines for production ML models. The role involves data engineering for personalization initiatives and requires expertise in distributed data technologies, cloud platforms, and Python/PySpark.

What you'd actually do

  1. Build LLM agents to optimize and automate data pipelines following best engineering practices.
  2. Deliver End to End Data Pipelines to run Machine Learning Models in a production platform.
  3. Help build production grade ML models and integration with operational systems.
  4. Build fault tolerant, scalable, quality data pipelines using multiple cloud- based tools.
  5. Innovative solutions to help broader organization take significant actions fast and efficiently.

Skills

Required

  • Master’s degree or equivalent experience
  • 8+ years of consistent track record as a data engineer
  • 5+ years validated ability in distributed data technologies e.g., Hadoop, Hive, Presto, Spark etc.
  • 3+ years of experience with Cloud based technologies – Databricks, S3, Azure Blob Storage, Notebooks, AWS EMR, Athena, Glue etc.
  • 2+ years’ experience with streaming data ingestion and transformation using Kafka, Kinesis etc.
  • Outstanding SQL experience
  • Proven hands - on experience in Python/PySpark/Scala
  • Experience with CI/CD tools i.e., GitHub, Jenkins etc.
  • Working experience with Open- source orchestration tools i.e., Apache Air Flow/ Azkaban etc.
  • Teammate with excellent communication/teamwork skills

Nice to have

  • open- source contributor
  • Data Governance tools e.g., Collibra
  • Collaboration tools e.g., JIRA/ Confluence
  • Adobe Experience Platform
  • Adobe Analytics
  • Customer Journey Analytics
  • Adobe Journey Optimizer
  • LLM Models/ Agentic workflows using Copilot, Claude, LLAMA, Databricks Genie etc.
  • building context and prompt engineering solutions including classical RAG, Knowledge graph, MCPs, Agentic frameworks like n8n, etc.

What the JD emphasized

  • 8+ years of consistent track record as a data engineer
  • 5+ years validated ability in distributed data technologies
  • 3+ years of experience with Cloud based technologies
  • 2+ years’ experience with streaming data ingestion and transformation
  • Outstanding SQL experience
  • Proven hands - on experience in Python/PySpark/Scala
  • Experience with CI/CD tools
  • Working experience with Open- source orchestration tools
  • Teammate with excellent communication/teamwork skills

Other signals

  • build LLM agents
  • Deliver End to End Data Pipelines to run Machine Learning Models in a production platform
  • Help build production grade ML models