Principal Systems Reliability Engineer, Secure Federal Operations

T-Mobile T-Mobile · Telecom · Herndon, VA

This Principal Systems Reliability Engineer role focuses on designing and implementing secure, scalable, and highly reliable technology solutions across cloud platforms (Azure, AWS), networking, and cybersecurity. It involves advanced expertise in system architecture, cloud engineering, and DevSecOps practices, with responsibilities including identity and access management, patch management, automation, and Microsoft 365 administration. The role aims to improve security posture, operational efficiency, and service reliability for enterprise systems.

What you'd actually do

  1. Develop and implement system designs to improve software delivery speed and operational efficiency.
  2. Lead architecture for cross-domain programs ensuring alignment with enterprise standards.
  3. Deliver solutions that enhance service availability, scalability, latency, and efficiency.
  4. Design and deploy solutions on Azure and AWS.
  5. Build and operate cloud-native platforms (Kubernetes, service mesh, ingress, policy engines).

Skills

Required

  • systems architecture
  • platform engineering
  • site reliability engineering
  • Azure
  • AWS
  • Active Directory
  • DNS
  • 802.1X
  • certificate lifecycle management
  • Windows
  • Linux
  • TCP/IP networking
  • network security principles
  • Microsoft 365 (M365) services (Exchange Online, SharePoint, Teams)
  • PowerShell
  • Python
  • Bash
  • cloud environment (public/private)
  • incident and problem management
  • root cause analysis
  • disaster recovery planning

Nice to have

  • Automation and scripting using PowerShell, Python, or Bash preferred
  • Knowledge of containerization (Docker, Kubernetes) preferred
  • Experience in incident and problem management, root cause analysis, and disaster recovery planning preferred

What the JD emphasized

  • U.S. citizenship