Devops Engineer, Gps

Scale AI Scale AI · Data AI · Doha, Qatar · GPS Engineering

DevOps Engineer role focused on building and maintaining scalable, secure cloud infrastructure and backend systems for AI applications in the public sector. Responsibilities include infrastructure automation, CI/CD, deployment, scalability, and disaster recovery.

What you'd actually do

  1. Design and implement secure, scalable backend systems for customers using modern, cloud-native AI infrastructure.
  2. Collaborate with cross-functional teams to define and execute backend and infrastructure solutions tailored for secure environments.
  3. Write, maintain, and enhance Infrastructure as Code templates (e.g., Terraform, CloudFormation) for automated provisioning and management.
  4. Design and optimize CI/CD pipelines for efficient testing, building, and deployment processes.
  5. Develop and test disaster recovery plans with robust backups and failover mechanisms.

Skills

Required

  • Python
  • Typescript
  • Javascript
  • C++
  • distributed systems
  • public cloud platforms (AWS and Azure preferred)
  • Kubernetes
  • Terraform
  • Docker
  • CI/CD tooling (CircleCI, Github Actions)
  • network engineering

Nice to have

  • operations
  • LLMs
  • Gen AI landscape
  • data warehouses (Snowflake, Firebolt)
  • data pipeline/ETL tools (Dagster, dbt)
  • authentication/authorization systems (Zanzibar, Authz, etc.)
  • NoSQL document databases (MongoDB)
  • structured databases (Postgres)
  • hybrid or on-prem systems
  • orchestration platforms, such as Temporal and AWS Step Functions

What the JD emphasized

  • core platforms and software systems
  • orchestration
  • data abstraction
  • data pipelines
  • identity & access management
  • security tools
  • cloud infrastructure
  • secure, scalable backend systems
  • cloud-native AI infrastructure
  • secure environments
  • Infrastructure as Code
  • automated provisioning and management
  • networking architecture
  • CI/CD pipelines
  • containerized applications
  • Kubernetes
  • high availability and reliability
  • Disaster Recovery
  • hybrid and multi-cloud strategies