Data Engineer

Visa Visa · Fintech · Singapore

Data Engineer role focused on building and operationalizing GenAI capabilities for Finance, including RAG, prompt engineering, evaluation frameworks, and agent-based workflows. Responsibilities include architecting data pipelines, semantic models, and BI solutions, while also implementing CI/CD, observability, and governance for AI services.

What you'd actually do

  1. Ship production-grade Gen AI features (retrieval-augmented generation, prompt-chaining/agents) on governed datasets - implement vectorization strategies and chunking that respect PII/SOX controls.
  2. Partner with DS/ML to train/fine-tune and evaluate models - harden prompt templates, guardrails, and content filters - track hallucination, toxicity and retrieval metrics (precision/recall, hit@k).
  3. Build reusable components (prompt libraries, evaluation harnesses, vector store abstractions) and integration SDKs/APIs for reuse across Finance use cases.
  4. Engineer high-quality batch/streaming data pipelines (SQL/Hive/PySpark) across Lake/Lakehouse to power curated finance domain marts and a governed semantic layer.
  5. Implement CI/CD for data & AI (Git, Azure DevOps/GitHub Actions), data quality tests (Great Expectations or equivalent), and model/data deployment automation (MLflow/Fabric/Azure ML).

Skills

Required

  • SQL/Hive/PySpark
  • GenAI engineering
  • RAG
  • prompt engineering
  • evaluation
  • guardrails
  • LLMs
  • vectorization
  • chunking
  • orchestration frameworks (LangChain)
  • BI solutions
  • ETL strategies
  • data model decisions
  • CI/CD
  • data quality tests
  • model/data deployment automation
  • observability
  • privacy-by-design
  • finance controls (SOX)

Nice to have

  • Machine Learning
  • Deep Learning
  • MLOps
  • Data administration (YARN, Splunk, Profiler, Perfmon, security architecture, user provisioning, audit, etc.)
  • Finance Data Analytics
  • finance domain

What the JD emphasized

  • production-grade Gen AI features
  • RAG
  • prompt engineering
  • evaluation frameworks
  • agent-based workflows
  • PII/SOX controls
  • hallucination
  • toxicity
  • retrieval metrics
  • reusable components
  • integration SDKs/APIs

Other signals

  • operationalizing GenAI capabilities
  • shipping production-grade Gen AI features
  • building reusable components for GenAI