Senior Principal Software Engineer -ai Foundation Services

JPMorgan Chase JPMorgan Chase · Banking · Plano, TX +1 · Corporate Sector

Senior Principal Software Engineer to build and optimize AI Foundation Services infrastructure for GenAI and traditional AI/ML platforms. Role involves co-developing reusable services, synthesizing requirements, de-risking delivery, and driving firmwide reuse through shared architectures and baselines. Also responsible for setting strategy for agentic AI-enabled engineering, applying AI-assisted development tools, and advising leadership on AI strategy and adoption.

What you'd actually do

  1. Leads as a hands-on technical thought leader to build, integrate, and optimize AI Foundation Services infrastructure for GenAI and traditional AI/ML platforms
  2. Co-develops with Lines of Business (LOB) application teams to deliver reusable AI/ML foundational services and managed service patterns
  3. Synthesizes Lines of Business (LOB) requirements into implementable designs and drives delivery from design through launch and early operational support
  4. De-risks delivery across performance, scale, reliability, and security by defining non-functional requirements, testing strategies, and operational readiness criteria
  5. Drives reuse and standardization through shared reference architectures, playbooks, test harnesses, and GPU training/serving baselines for model hosting platforms

Skills

Required

  • Formal training or certification on software engineering concepts and 10+ years applied experience
  • Proven hands-on experience designing and operating AI/ML platform capabilities (model training, serving, feature/data access patterns, and multi-tenant controls)
  • Demonstrated experience designing and scaling agentic AI-enabled development patterns (using enterprise-authorized tools within the work environment) across teams/functions, including establishing governance for human-in-the-loop validation, traceability/auditability, and secure handling of sensitive inputs/outputs.
  • Strong understanding of responsible AI use and control expectations at scale, including security/resiliency implications, data sensitivity, and risk-based governance; ability to advise senior leaders on safe adoption, reuse, and measurable outcomes.
  • Demonstrated expertise in performance engineering and production reliability (capacity planning, load testing, Service Level Objective (SLOs) /Service Level Indicator (SLIs), incident response, and root-cause remediation)
  • Strong experience with cloud-native architecture (Kubernetes, containers, CI/CD, infrastructure-as-code using Terraform) and secure-by-design engineering practices
  • Ability to lead end-to-end technical engagements with senior stakeholders, translating requirements into delivered services with clear milestones and acceptance criteria
  • Practical experience delivering system design, application development, testing, and operational stability
  • Demonstrated prior experience with influencing across functions and teams and delivering value at scale
  • Experience applying expertise and new methods to determine solutions for complex technology problems across various technical disciplines
  • Extensive practical cloud native experience

Nice to have

  • Experience building GPU-backed model hosting platforms and optimizing inference/training performance (profiling, batching, caching, parallelism, and cost controls)
  • Experience implementing reusable reference architectures and developer enablement assets (golden paths, templates, playbooks, and automated test harnesses)
  • Experience with LLM and model serving stacks (e.g., routing, autoscaling, model gateways, online evaluation, and guardrails) in production environments
  • Experience operating in regulated environments with strong controls (security reviews, threat modeling, audit readiness, and data governance)

What the JD emphasized

  • AI Foundation Services
  • GenAI and traditional AI/ML platforms
  • agentic AI-enabled development patterns
  • responsible AI use and control expectations at scale

Other signals

  • AI Foundation Services
  • GenAI and traditional AI/ML platforms
  • scaled, secure, performance-optimized infrastructure
  • reuse through shared reference architectures
  • agentic AI-enabled engineering