Machine Learning Engineer

Visa Visa · Fintech · Stockholm, Sweden, Sweden

Machine Learning Engineer at Visa responsible for designing, building, deploying, and operating production-grade ML systems, including agentic AI systems, RAG pipelines, and multi-step agents. The role involves end-to-end ownership of ML models and services, MLOps practices, and engineering scalable APIs and serving layers, with a focus on NLP workloads at global scale.

What you'd actually do

  1. Design, build, and operate production-grade machine learning systems that run at Visa’s global scale for NLP and related workloads with strict latency and throughput targets (e.g. 50k-100k+ tokens/sec @ 100+ RPS).
  2. Develop end-to-end ML pipelines covering data preparation, model training, validation, deployment, monitoring, and retraining.
  3. Build and maintain high-availability, fault-tolerant ML services and APIs, including load balancing and robust autoscaling for GPU inference.
  4. Design and implement advanced agentic AI systems: RAG pipelines, multi-step and branching agents, actor–critic control loops, validation/guardrail stages, and custom tools.
  5. Work closely with product, data, and platform teams to turn requirements into concrete ML system designs and production deployments across multiple Visa technology offerings.

Skills

Required

  • Foundational Python programming skills
  • Experience building and operating ML pipelines and models in production
  • Hands-on with PyTorch
  • GPU inference and optimization
  • Kubernetes and Docker for deploying and operating ML services
  • CI/CD for ML services and pipelines
  • Infrastructure as code with Terraform
  • Experience with agentic AI frameworks and patterns
  • Experience with Kubeflow Pipelines (KFP) or similar systems for model training workflows
  • Experience with at least one major cloud platform for ML (AWS, GCP, or Azure)

Nice to have

  • TensorFlow experience
  • experience with Triton Inference Server or similar serving stack
  • Exposure to one or more system/server programming languages is a plus (e.g., C++, Go, Rust, or Java)
  • Curiosity and passion for machine learning and data‑driven systems.
  • Comfort challenging existing solutions and learning new tools, frameworks, and platforms.
  • Interest in areas such as MLOps, model monitoring, feature engineering, and responsible AI.

What the JD emphasized

  • strict requirements for reliability, performance, security, and compliance
  • strict latency and throughput targets
  • high-availability, fault-tolerant ML services
  • production-grade machine learning systems
  • agentic AI systems
  • MLOps practices
  • production deployments

Other signals

  • design, build, and operate production-grade machine learning systems
  • design and implement advanced agentic AI systems
  • Apply MLOps practices for safe, repeatable deployment, monitoring, and lifecycle management of models and agents