Lead Data Engineer - Data Transformation (modeling and Architecture)

Capital One Capital One · Banking · Richmond, VA +2

Lead Data Engineer focused on data transformation, modeling, and architecture within a financial services company. The role involves building and maintaining data models, designing data ecosystems (Data Lake, Data Warehouse), supporting data pipelines with SQL, Spark, and Python, and enforcing data governance. While the role mentions contributing to AI-ready architectures and collaborating with AI/ML teams, its core function is data engineering, not direct AI/ML model development or research. It also mentions leveraging interactive AI tooling for productivity.

What you'd actually do

  1. Build and maintain comprehensive data models—spanning conceptual, logical, and physical layers—to ensure scalable architecture and high data integrity across enterprise systems.
  2. Lead design of the org data landscape by applying Consumer Driven design principles, ensuring that data structures reflect business realities and evolving organizational needs
  3. Architect and implement robust data ecosystem solutions, including Data Lake and Data Warehouse patterns, to support diverse analytical and operational requirements.
  4. Support high-performance data pipelines and complex transformations that utilize SQL, Spark, and Python to process large-scale datasets efficiently.
  5. Define and Enforce rigorous data governance standards while managing metadata frameworks to ensure data compliance and discoverability

Skills

Required

  • Bachelor's Degree
  • 4 years of experience in application development
  • 2 years of experience in big data technologies
  • 1 year experience with cloud computing (AWS, Microsoft Azure, Google Cloud)

Nice to have

  • 4+ years of experience in Data Architecture / Data Modeling
  • 7+ years of experience in application development including Python, SQL, Scala, or Java
  • 4+ years of experience with a public cloud (AWS, Microsoft Azure, Google Cloud)
  • 4+ years experience with Distributed data/computing tools (MapReduce, Hadoop, Hive, EMR, Kafka, Spark, Gurobi, or MySQL)
  • 4+ year experience working on real-time data and streaming applications
  • 4+ years of experience with NoSQL implementation (Mongo, Cassandra)
  • 4+ years of data warehousing experience (Redshift or Snowflake)
  • 4+ years of experience with UNIX/Linux including basic commands and shell scripting
  • 2+ years of experience with Agile engineering practices
  • Experience leveraging interactive AI tooling to accelerate productivity, utilizing capabilities beyond basic code completion

What the JD emphasized

  • AI-ready architectures
  • machine learning, AI, distributed microservices, and full stack systems