What you'd actually do

Design, build, and maintain scalable, highly available Machine Learning infrastructure on AWS and Visa OnPrem.

Deploy, configure, and manage Kubernetes clusters and Kubeflow to orchestrate complex ML training and deployment pipelines.

Build robust serving infrastructure to productionize machine learning models and Large Language Models (LLMs) using modern serving frameworks (e.g., vLLM, TensorRT-LLM, KServe, Triton).

Design secure platform architectures utilizing AWS IAM (roles, policies, least privilege), VPCs, and security groups to ensure data and model security.

Architect scalable cloud systems and automate infrastructure provisioning using tools like Terraform or AWS CloudFormation.

Skills

Required

MLOps
AWS cloud architecture
Kubernetes
system design
ML platform architecture
Kubeflow
model deployment
LLMOps
serving frameworks (vLLM, TensorRT-LLM, KServe, Triton)
AWS IAM
VPCs
security groups
Terraform or AWS CloudFormation
CI/CD pipelines
observability and monitoring tools (CloudWatch, Prometheus, Grafana)

Nice to have

Visa OnPrem
Cloud-agnostic experience

What the JD emphasized

scalable cloud infrastructure

MLOps

AWS cloud architecture

Kubernetes

system design

ML platform

deployment standards

secure, scalable pipelines

serving infrastructure

Generative AI (LLM) workloads

model deployment

infrastructure automation

platform security

data scientists and AI engineers

model development lifecycle

productionize machine learning models

Large Language Models (LLMs)

modern serving frameworks

secure platform architectures

data and model security

scalable cloud systems

automate infrastructure provisioning

automated CI/CD pipelines

model training, testing, and deployment

continuous integration

compute and tooling needs

model development lifecycle

logging, monitoring, and alerting

system health

model drift

latency

modernize legacy deployment pipelines

emerging AI infrastructure technologies

ML serving infrastructure

secure architectures

automating infrastructure

CI/CD pipelines for ML model deployment

monitoring tools

About Us Visa is a world leader in payments technology, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories, dedicated to uplifting everyone, everywhere by being the best way to pay and be paid.

At Visa, you'll have the opportunity to create impact at scale — tackling meaningful challenges, growing your skills and seeing your contributions impact lives around the world.

Join Visa and do work that matters – to you, to your community, and to the world. Progress starts with you.

Job Description

The Sr. ML Engineer is responsible for designing, building, and managing the scalable cloud infrastructure that powers our AI and Machine Learning applications. Rather than focusing primarily on model building, this role is suited for a specialist with deep expertise in MLOps, AWS cloud architecture, Kubernetes, and system design. You will own key modules of the ML platform, perform architectural reviews, and implement robust deployment standards. The team is tasked with building secure, scalable pipelines and serving infrastructure for both traditional ML and modern Generative AI (LLM) workloads. The successful candidate will act as a design authority for model deployment, infrastructure automation, and platform security, shaping best practices to enable our data scientists and AI engineers to seamlessly transition models from research to production.

All roles require digital fluency, including the ability to work with emerging technologies and AI-assisted tools - such as AI coding assistants (e.g., GitHub Copilot, ChatGPT, Claude Code, CLine), advanced reasoning GenAI models, and enterprise productivity tools - to enhance engineering productivity and support everyday work.

Key Responsibilities:

ML Platform Architecture: Design, build, and maintain scalable, highly available Machine Learning infrastructure on AWS and Visa OnPrem.
Kubernetes & Kubeflow Management: Deploy, configure, and manage Kubernetes clusters and Kubeflow to orchestrate complex ML training and deployment pipelines.
Model Deployment & LLMOps: Build robust serving infrastructure to productionize machine learning models and Large Language Models (LLMs) using modern serving frameworks (e.g., vLLM, TensorRT-LLM, KServe, Triton).
Cloud Security & Access Management: Design secure platform architectures utilizing AWS IAM (roles, policies, least privilege), VPCs, and security groups to ensure data and model security.
System Design & Infrastructure as Code (IaC): Architect scalable cloud systems and automate infrastructure provisioning using tools like Terraform or AWS CloudFormation.
CI/CD & MLOps: Develop and maintain automated CI/CD pipelines for model training, testing, and deployment, ensuring seamless continuous integration.
Cross-Functional Enablement: Partner closely with Data Scientists and AI Engineers to understand their compute and tooling needs, reducing friction in the model development lifecycle.
Observability & Monitoring: Implement logging, monitoring, and alerting for ML models and underlying infrastructure (e.g., CloudWatch, Prometheus, Grafana) to track system health, model drift, and latency.
Modernization: Act as a technical guide to modernize legacy deployment pipelines and integrate emerging AI infrastructure technologies.
This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.

Qualifications

Basic Qualifications

2+ years of relevant work experience and a Bachelor's degree, OR 5+ years of relevant work experience.
Experience in developing and implementing scalable AI/ML models and algorithms.
Experience managing Kubernetes clusters and Kubeflow for ML pipelines

Preferred Qualifications

3 or more years of work experience with a Bachelor’s Degree or more than 2 years of work experience with an Advanced Degree (e.g. Masters, MBA, JD, MD).
Experience in building ML serving infrastructure (vLLM, TensorRT-LLM, KServe, Triton).
Experience in implementing secure architectures (IAM, VPCs, least privilege).
Experience with automating infrastructure with Terraform/CloudFormation.
Experience with developing CI/CD pipelines for ML model deployment.
Experience with implement monitoring tools (CloudWatch, Prometheus, Grafana).
Cloud-agnostic experience welcomed

U.S. Applicants Only

The estimated salary range for this position is $130,700.00 to $ 202,300.00 USD per year, which may include potential sales incentive payments (if applicable). Salary may vary depending on job-related factors which may include knowledge, skills, experience, and location. In addition, this position may be eligible for bonus and equity.Visa has a comprehensive benefits package for which this position may be eligible that includes Medical, Dental, Vision, 401(k), FSA/HSA, Life Insurance, Paid Time Off, and Wellness Program.

Work Hours

Varies upon the needs of the department.

Travel Requirements

This position requires travel 5-10% of the time.

Mental/Physical Requirements

This position will be performed in an office setting. The position will require the incumbent to sit and stand at a desk, communicate in person and by telephone, frequently operate standard office equipment, such as telephones and computers.

Visa is an EEO Employer

Qualified applicants will receive consideration for employment without regard to race, color religion, sex, national origin, sexual orientation, gender identity, disability or protect veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with the EEOC guidelines and applicable local law.