Specialist, Data Engineering

Merck Merck · Pharma · Telangana, India

Data and platform engineer responsible for a configuration-driven data pipeline solution powering ingestion, transformation, and delivery. Owns AI agents for automating configuration creation, pipeline troubleshooting, and providing technical support, integrating with an LLM. Also manages multi-cloud infrastructure and CI/CD.

What you'd actually do

  1. Own and evolve the product Configuration and Support Bot — maintain the Microsoft Teams bot adapter, integrate with the Company approved LLM, and extend features such as file attachment handling, channel thread context, and Microsoft Graph integration.
  2. Develop and maintain the core engine — build and extend Python/PySpark loaders, transformers, and writers across 23 source connectors and 19 sink connectors.
  3. Drive CI/CD automation — design, maintain, and improve 28+ GitHub Actions reusable workflows covering dataset build/deploy, framework releases, Docker image promotion, and AWS key rotation.
  4. Manage multi-cloud infrastructure — provision and maintain AWS resources (ECS, IAM, ECR, S3, Secrets Manager) and Azure using Terraform.
  5. Ensure data quality — implement and extend the rule engine for schema validation, null checks, regex patterns, and quarantine/alert actions.

Skills

Required

  • Python
  • PySpark
  • Spark SQL
  • DataFrame API
  • data lake patterns
  • data quality
  • schema validation
  • Change Data Capture (CDC)
  • Apache Airflow
  • relational databases
  • REST APIs
  • OAuth2
  • cloud object storage
  • SFTP/SMB
  • streaming systems
  • AWS
  • Azure
  • Terraform
  • multi-environment deployments
  • container management
  • cloud cost management
  • cloud networking
  • IAM policies
  • service principals
  • app registrations
  • role-based access control
  • GitHub Actions
  • Docker
  • artifact repositories
  • JFrog Artifactory

Nice to have

  • Scala
  • Java Spark extensions
  • Microsoft Graph integration
  • Microsoft Teams bot adapter
  • resource right-sizing

What the JD emphasized

  • AI agents
  • Company approved LLM
  • Python and PySpark
  • AWS
  • Azure
  • Terraform
  • GitHub Actions

Other signals

  • AI agents
  • LLM integration
  • data pipeline automation