Senior Software Engineer – Data Platform

Caterpillar Caterpillar · Industrial · Brisbane, Queensland

Seeking a Senior Software Engineer to build a scalable data platform for ingesting high-frequency telemetry data from mobile machines. The role involves transforming a legacy monolith into a containerized, cloud-ready architecture supporting data science and machine learning workloads, including stream/batch processing, lakehouse persistence, and low-latency predictive model hosting. Key technologies include Apache Spark, Delta Lake, Kubernetes, and various message brokers and caching systems.

What you'd actually do

  1. Design and implement robust, scalable components for ingesting, processing, and persisting high-frequency telemetry data.
  2. Collaborate with data scientists to host, orchestrate and optimize workloads in Python, Scala, and Java.
  3. Design and build components using technologies like Apache Spark, Delta Lake, Redis/Valkey, MQTT, and PostgreSQL.
  4. Drive modernization efforts including: Containerization and deployment on Kubernetes
  5. Evaluate and integrate emerging technologies (e.g., Flink, Trino, Kafka, DuckDB, Dask, Daft) to optimize performance and scalability.

Skills

Required

  • Java
  • Scala
  • Python
  • backend development
  • streaming data processing
  • batch data processing
  • containerization (Docker)
  • orchestration (Kubernetes)
  • data lake/lakehouse architectures
  • message brokers
  • caching systems

Nice to have

  • Apache Spark
  • Delta Lake
  • Redis/Valkey
  • MQTT
  • PostgreSQL
  • Flink
  • Trino
  • Kafka
  • DuckDB
  • Dask
  • Daft
  • event sourcing
  • CQRS
  • hybrid cloud deployments
  • machine learning algorithms
  • statistical modelling
  • Mining Industry knowledge

What the JD emphasized

  • low-latency predictive model hosting

Other signals

  • enabling stream and batch data processing
  • lakehouse persistence
  • low-latency predictive model hosting
  • containerized, scalable, cloud-ready architecture