What you'd actually do

Design and implement end-to-end data flows, from raw event ingestion through durable storage and modeled datasets that power products, Digital Twin experiences, and AI agents.

Build reliable, incremental pipelines that support deduplication, late-arriving data, watermarking, reprocessing, and reproducible aggregations at scale.

Model context and relationships across machines, lines, factories, sensors, work orders, and tenants to support structured queries and AI-driven experiences.

Partner with platform and AI teams to define how datasets are stored, modeled, and exposed through APIs, Digital Twin services, and context graphs.

Build clean, maintainable Python services with strong separation of concerns across validation, persistence, aggregation, and orchestration layers.

Skills

Required

Python
SQL
data modeling
backend platforms
distributed systems
data-intensive applications
production environments
cloud platform (AWS, Azure, or GCP)
lakehouse architectures
streaming or messaging systems
observability
monitoring
production incident response
written and verbal communication

Nice to have

industrial, manufacturing, IoT, or large-scale data platform environments
Digital Twin architectures
contextual data models
context graphs
knowledge graphs
relationship-based data modeling
supporting AI/LLM-powered products
RAG systems
tools
agents
evaluation frameworks
Databricks

Our mission is to transform how people and machines work together to push the boundaries of human productivity. A leader in Industrial AI, Augury helps the world’s manufacturers leverage real-time production insights to drive new levels of efficiency. Combining predictive and prescriptive AI technology with industry expertise, production teams can proactively address alerts, minimize downtime, reduce asset costs, and maximize yield and capacity. Our customers achieve payback in six months or less, enabling global scale. We're looking for team members excited to partner with the world's manufacturers and build the future of production together.

Our Data Intelligence Hub (DIH) is building the next generation Industrial Data Intelligence platform: a contextual layer that connects machine health, operational, maintenance, engineering, and enterprise data on top of a site Digital Twin backbone. We use this foundation to power agentic, AI-native experiences that help users explore their sites, answer complex questions, and make better decisions in one place.

You will be a core member of DIH, building production-grade data services and pipelines that power our Digital Twin, products, analytics, and AI agents. This is not a traditional ETL or BI-focused Data Engineering role. We are looking for an engineer with experience building data-intensive software systems, with a strong emphasis on clean architecture, reliability, scalability, and testing.

Working closely with peers across India, Israel, and other global locations, you will help transform industrial and operational data into trusted, scalable, and actionable context for users, applications, and AI systems.

A Day In Your Life

Production Data Systems & Pipelines

Design and implement end-to-end data flows, from raw event ingestion through durable storage and modeled datasets that power products, Digital Twin experiences, and AI agents.
Build reliable, incremental pipelines that support deduplication, late-arriving data, watermarking, reprocessing, and reproducible aggregations at scale.
Model context and relationships across machines, lines, factories, sensors, work orders, and tenants to support structured queries and AI-driven experiences.
Partner with platform and AI teams to define how datasets are stored, modeled, and exposed through APIs, Digital Twin services, and context graphs.

Software Engineering & Quality

Build clean, maintainable Python services with strong separation of concerns across validation, persistence, aggregation, and orchestration layers.
Apply strong SQL and data modeling practices, including schema design, indexing, constraints, timestamp semantics, and scalable aggregations.
Drive engineering quality through automated testing, including unit, integration, and data-focused validation for correctness and reliability.
Design for observability through metrics, logging, and tracing that support debugging, data quality monitoring, production incidents, and backfills.

Streaming, Lakehouse & Scalability

Design and evolve streaming-first architectures using lakehouse and messaging technologies, including partitioning, watermarking, replay, reprocessing, and cost-aware scaling.
Work with technologies such as Kafka, Pub/Sub, or similar systems to build reliable event-driven services and data pipelines.
Contribute to multi-tenant architectures and data contracts that enable secure, scalable access to data across products, applications, and AI agents.

Collaboration & AI-Native Experiences

Partner closely with DIH, Smart Canvas, AI, and Product teams to design scalable data models, APIs, and context services that power AI-native experiences.
Translate business and product requirements into technical solutions that balance correctness, performance, cost, and long-term maintainability.
Participate in design reviews, code reviews, and technical discussions that raise the engineering bar across the organization.
Collaborate effectively across distributed teams through clear written and verbal communication.

What You Bring

Bachelor's degree in Computer Science, Computer Engineering, Information Technology, or a related field. Advanced degrees or equivalent practical experience are also valued.
4+ years of professional software development experience building backend platforms, distributed systems, or data-intensive applications in production environments.
Strong software engineering experience in Python, SQL, and data modeling, with a track record of building production-grade data systems and reliable, incremental pipelines.
Experience designing systems that handle duplicate, invalid, and late-arriving events while maintaining correctness and reliability for downstream consumers.
Experience with at least one cloud platform (AWS, Azure, or GCP) and modern data technologies such as Databricks, Delta Lake, Spark, BigQuery, or similar lakehouse architectures.
Experience with streaming or messaging systems such as Kafka, Pub/Sub, NSQ, or similar event-driven technologies.
Strong operational and debugging skills, including observability, monitoring, backfills, schema evolution, and production incident response.
Strong written and verbal communication skills, with experience collaborating across globally distributed teams.

Nice to Have

Experience in industrial, manufacturing, IoT, or large-scale data platform environments.
Exposure to Digital Twin architectures and contextual data models.
Experience with context graphs, knowledge graphs, or relationship-based data modeling.
Experience supporting AI/LLM-powered products, including RAG systems, tools, agents, or evaluation frameworks.
Experience working with Databricks or similar lakehouse platforms.

Augury is a people-first organization. We believe in fostering an inclusive environment in which employees feel encouraged to share their unique perspectives, leverage their strengths, and act authentically. We know that diverse teams are strong teams, and we welcome those from all backgrounds and varying experiences. We are committed to providing employees with a work environment free of discrimination and harassment. We believe that diversity is more than just good intentions, and we are committed to creating an inclusive environment for all employees.

Augury is a proud equal opportunity employer, we strive to create a work environment in which everyone, all applicants, employees, customers, guests, and vendors feel safe and comfortable. We commit to maintain a workplace that is free of any type of harassment and does not tolerate anyone intimidating, humiliating, or hurting others. We prohibit willful discrimination based on age, gender, ethnicity, race, color, religion, political opinions, sexual orientation, sexual identity or expression, military or veteran status, disability or any other characteristic protected by law.