Sr Systems Reliability Engineer - Legal Technology

T-Mobile T-Mobile · Telecom · Frisco, TX +1

This role is for a Senior Systems Reliability Engineer focused on T-Mobile's Legal Technology platforms. The engineer will be responsible for the reliability, availability, scalability, and performance of these mission-critical systems, which operate in high-stakes environments. The role involves applying DevOps automation, managing cloud-native infrastructure, leading incident response, and mentoring other SREs. While the company embraces AI-assisted tools, the core function of this role is not AI/ML development but ensuring the reliability of essential IT services.

What you'd actually do

  1. Apply DevOps automation for CI/CD, configuration management, and environment management (non-prod and prod)
  2. Provision and manage environments; configure pipelines and infrastructure (VMs/containers)
  3. Improve availability, scalability, latency, and efficiency of services, with emphasis on Legal Technology platforms
  4. Own reliability and performance of critical applications (LRS, E-Core, LEEP)
  5. Participate in on-call rotation (~1 week every 2 months); respond to alerts/incidents

Skills

Required

  • DevOps
  • Integration
  • Java
  • Python
  • Go
  • C/C#
  • scripting (Shell/Perl)
  • DBMS (Postgres or Oracle)
  • CI/CD tools (e.g., Jenkins)
  • DevOps tools (GitHub/GitLab, Chef/Puppet)
  • Docker
  • Kubernetes
  • APM/observability tools (e.g., Splunk, Grafana, AppDynamics)
  • troubleshooting distributed systems using logs/metrics/traces

Nice to have

  • cloud environments
  • high-availability or regulated environments
  • leveraging AI-assisted tools (e.g., Copilot, ChatGPT)
  • Cloud Computing
  • Strong troubleshooting in distributed systems
  • Ability to operate in production environments and respond to incidents
  • Ownership mindset with focus on reliability and continuous improvement

What the JD emphasized

  • U.S. citizenship
  • mission-critical systems
  • high-stakes environments
  • reliability, speed, and accuracy truly matter
  • modern, cloud-native platform that is still evolving
  • own production systems end-to-end
  • directly influence how reliability is built into the platform
  • strong adoption of AI-assisted tools
  • ownership, collaboration, and continuous improvement
  • thrives in high-impact environments
  • solving complex reliability challenges
  • make a real difference
  • Required