Software Data Engineer, Data Platform

Augury Augury · Vertical AI · Bengaluru India · R&D

Software Data Engineer to build production-grade data services and pipelines for an Industrial AI platform. The role focuses on designing and implementing end-to-end data flows, modeling context and relationships across industrial data, and partnering with AI teams to expose data for AI agents and AI-native experiences. Emphasis on clean architecture, reliability, scalability, and testing in a streaming, lakehouse environment.

What you'd actually do

  1. Design and implement end-to-end data flows, from raw event ingestion through durable storage and modeled datasets that power products, Digital Twin experiences, and AI agents.
  2. Build reliable, incremental pipelines that support deduplication, late-arriving data, watermarking, reprocessing, and reproducible aggregations at scale.
  3. Model context and relationships across machines, lines, factories, sensors, work orders, and tenants to support structured queries and AI-driven experiences.
  4. Partner with platform and AI teams to define how datasets are stored, modeled, and exposed through APIs, Digital Twin services, and context graphs.
  5. Build clean, maintainable Python services with strong separation of concerns across validation, persistence, aggregation, and orchestration layers.

Skills

Required

  • Python
  • SQL
  • data modeling
  • backend platforms
  • distributed systems
  • data-intensive applications
  • production environments
  • cloud platform (AWS, Azure, or GCP)
  • lakehouse architectures
  • streaming or messaging systems
  • observability
  • monitoring
  • production incident response
  • written and verbal communication

Nice to have

  • industrial, manufacturing, IoT, or large-scale data platform environments
  • Digital Twin architectures
  • contextual data models
  • context graphs
  • knowledge graphs
  • relationship-based data modeling
  • supporting AI/LLM-powered products
  • RAG systems
  • tools
  • agents
  • evaluation frameworks
  • Databricks

What the JD emphasized

  • production-grade data systems
  • reliable, incremental pipelines
  • duplicate, invalid, and late-arriving events
  • AI agents

Other signals

  • building production-grade data services and pipelines
  • powering AI agents
  • modeling context and relationships across machines, lines, factories, sensors, work orders, and tenants
  • partner with platform and AI teams to define how datasets are stored, modeled, and exposed