R-102945 Senior Principal Application Support Engineering Lead – Operations

Eli Lilly Eli Lilly · Pharma · Hyderabad, India

Senior Principal Application Support Engineering Lead responsible for end-to-end support operations in a highly regulated, high-availability healthcare environment. This role involves leading incident response, problem management, driving reliability strategy, and ensuring operational readiness and compliance. It is a strategic and hands-on position focused on organizational impact and scaling reliability through others.

What you'd actually do

  1. Lead end-to-end support operations for a technology team, ensuring consistent execution across shifts and time zones.
  2. Serve as the incident commander for the most complex, high-impact production incidents.
  3. Own Problem Management for recurring and systemic issues across the supported technology landscape.
  4. Define and drive operational reliability strategy for the technology team, aligned with SRE and production engineering principles.
  5. Set direction for observability strategy across logs, metrics, and traces, ensuring actionable insights and high signal quality.

Skills

Required

  • Lead end-to-end support operations
  • Incident response
  • Problem management
  • Root Cause Analysis (RCA)
  • Reliability strategy
  • SRE principles
  • Production engineering principles
  • SLIs/SLOs
  • Observability (logs, metrics, traces)
  • Automation
  • Deployment and release governance
  • Compliance with regulatory requirements
  • Secure operational practices
  • Auditability
  • Mentoring and talent development

Nice to have

  • Experience in highly regulated, high-availability environments
  • Experience in healthcare industry
  • Experience with CI/CD operational safety
  • Experience with validated environments

What the JD emphasized

  • highly regulated
  • high-availability environments
  • operational excellence
  • reliability
  • quality
  • senior-most operational authority
  • lead Support Operations end-to-end
  • owning operational outcomes
  • availability, incident management, readiness, and continuous reliability improvement
  • strategic and hands-on
  • health, stability, and operability of production systems
  • effectiveness and maturity of support operations
  • Shift leadership and execution
  • influencing engineering, product, and platform teams to prevent incidents
  • organizational impact
  • operational predictability
  • scale reliability through others
  • end-to-end support operations
  • primary operational leader during assigned shifts
  • incident response quality, prioritization, and decision-making
  • shift-level operating models
  • escalation paths
  • decision frameworks
  • incident commander
  • complex, high-impact production incidents
  • war-room execution
  • cross-team coordination
  • recovery strategy
  • clear, timely, and confident communication to senior technology and business stakeholders
  • rigor, consistency, and accountability
  • Problem Management
  • recurring and systemic issues
  • Root Cause Analysis (RCA)
  • corrective actions address root causes
  • long-term fixes
  • measurable reliability improvements
  • cross-product failure patterns
  • architectural or platform-level remediation
  • operational reliability strategy
  • SRE and production engineering principles
  • SLIs, SLOs, error budgets
  • reliability reporting
  • availability, performance, scalability, resilience, and recovery capabilities
  • operational readiness standards
  • runbooks, rollback plans, monitoring coverage, post-release validation
  • observability strategy
  • logs, metrics, and traces
  • actionable insights
  • high signal quality
  • automation initiatives
  • reduce manual effort, human error, and MTTR
  • standard tooling
  • reusable runbooks
  • automated remediation patterns
  • scale through systems and automation
  • Deployment, Change & Release Governance
  • senior operational oversight for releases and deployments
  • risk assessment
  • go/no-go decisions
  • CI/CD operational safety
  • post-release validation
  • operational risks identified early
  • Compliance, Security & Regulated Environment Readiness
  • comply with Lilly standards and applicable regulatory requirements
  • secure operational practices
  • auditability
  • proper handling of sensitive data
  • regulated and validated environments
  • Organizational Leadership & Talent Development
  • operational bar
  • standards, expectations, and role modeling
  • Mentor and develop R2/R3 engineers
  • deep operational expertise and leadership capability
  • Influence managers, architects, and engineering leaders