Principal Engineer - Data Ingestion & AI Pipeline

Bank of America Bank of America · Banking · Charlotte, NC

Principal Engineer responsible for designing and leading the engineering of scalable data ingestion pipelines for RAG and AI workloads in a fintech environment. Focuses on preparing, transforming, and validating data for downstream AI consumption, ensuring compliance with security and regulatory requirements.

What you'd actually do

  1. Design and lead enterprise data ingestion pipelines for structured, semi-structured, and unstructured data.
  2. Build scalable ingestion patterns for sources such as databases, APIs, documents, PDFs, SharePoint, file shares, data lakes, wikis, ticketing systems, code repositories, emails, and application logs.
  3. Define transformation patterns for document parsing, text extraction, normalization, deduplication, enrichment, chunking, classification, and metadata generation.
  4. Ensure ingestion pipelines comply with security, privacy, retention, and regulatory requirements.
  5. Provide technical leadership across data engineering, AI platform, cloud, and application teams

Skills

Required

  • Data Ingestion Pipeline Design
  • ETL/ELT Processes
  • Structured and Unstructured Data Handling
  • RAG Optimization
  • LLM Data Preparation
  • Data Transformation
  • Data Validation
  • Data Classification
  • Metadata Generation
  • Pipeline Orchestration
  • Monitoring and Error Handling
  • Data Quality Checks
  • Security and Privacy Compliance
  • Regulatory Compliance (Fintech)
  • Technical Leadership
  • Cloud Platforms (AWS/Azure/GCP)
  • Document Intelligence/OCR
  • Entity Extraction
  • Content Classification
  • Source Lineage
  • Data Freshness
  • Versioning
  • Access Controls
  • Auditability
  • Incremental Ingestion
  • Change Detection
  • Delta Processing
  • Reprocessing Strategies

Nice to have

  • Experience with specific RAG databases
  • Experience with LLM context engineering
  • Familiarity with various data sources (SharePoint, wikis, ticketing systems, code repositories, emails, application logs)

What the JD emphasized

  • design and lead the engineering of scalable ingestion pipelines
  • prepare enterprise data for RAG and AI workloads
  • extracting, transforming, enriching, validating, chunking, classifying, and loading structured and unstructured data into downstream RAG databases
  • clean, traceable, secure, current, and optimized for retrieval and LLM consumption
  • Partner with RAG database engineers to ensure ingested data is optimized for embedding, indexing, and retrieval.
  • Partner with context engineers to ensure data is structured and enriched in ways that support effective LLM reasoning.
  • Ensure ingestion pipelines comply with security, privacy, retention, and regulatory requirements.

Other signals

  • design and lead enterprise data ingestion pipelines
  • prepare enterprise data for RAG and AI workloads
  • optimize data for embedding, indexing, and retrieval
  • support effective LLM reasoning