Senior Software Engineer-data Engineering

Caterpillar Caterpillar · Industrial · Chennai, Tamil Nadu +1

This role focuses on building and maintaining scalable data pipelines on AWS, with a strong emphasis on Snowflake, graph databases, and vector databases. The engineer will design, develop, and optimize solutions for AI-driven use cases, including RAG and integrating vector databases with LLM applications. Experience with data ingestion for unstructured sources and knowledge graph pipelines is also crucial.

What you'd actually do

  1. Design, develop, and maintain scalable data pipelines on AWS using services such as S3, Glue, Lambda, Redshift, and EMR.
  2. Build and optimize data warehousing solutions using Snowflake, including performance tuning and data modeling.
  3. Write efficient and reusable code in Python and SQL for data transformation and processing.
  4. Develop and optimize solutions using graph databases (e.g., Neo4j, Amazon Neptune), including query design and performance tuning.
  5. Design, build, and operate vector database solutions (e.g., Milvus, Amazon OpenSearch) to support semantic search, recommendations, RAG, and AI-driven use cases.

Skills

Required

  • AWS cloud stack
  • Snowflake
  • Python
  • SQL
  • graph databases
  • vector databases
  • data modeling
  • performance tuning
  • Git
  • Azure DevOps
  • analytical and problem-solving skills
  • communication and collaboration abilities
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field

Nice to have

  • NVIDIA ecosystem
  • AWS Step Functions
  • data governance and compliance practices
  • real-time data processing frameworks (e.g., Kafka, Spark Streaming)
  • RAPIDS libraries (cuDF, cuML, cuGraph)
  • CUDA-based tooling

What the JD emphasized

  • AWS cloud stack
  • Snowflake
  • Python
  • SQL
  • graph and vector database technologies
  • AWS cloud services, including data and AI workloads
  • Snowflake architecture, performance tuning, and best practices
  • Python and SQL for data pipelines, transformations, and services
  • graph and vector data modelling concepts and their practical applications
  • graph databases (e.g., Neo4j, Neptune)
  • vector databases (e.g., Milvus, Amazon OpenSearch)
  • data ingestion pipelines for unstructured sources
  • embedding generation at scale
  • vector databases, specifically Milvus
  • Knowledge Graph ingestion pipelines
  • pipeline engineering skills in Python
  • orchestrating multi-stage document processing workflows
  • deploying and monitoring these pipelines in production environments

Other signals

  • design, build, and operate vector database solutions
  • integrate vector databases with LLM-based applications and AI workflows
  • building Knowledge Graph ingestion pipelines
  • deploying and monitoring these pipelines in production environments