Data Engineer- Full Stack

Ford Ford · Auto · United States · PD Operations and Quality

Data Engineer role focused on building and scaling end-to-end data and AI pipelines on GCP, integrating Gen AI capabilities like LLM-powered enrichment, RAG, and intelligent automation. Responsibilities include designing data models, implementing CI/CD, data governance, and mentoring junior talent, with a focus on powering AI/ML workloads and next-generation AI experiences.

What you'd actually do

  1. Design and implement end-to-end data pipelines (ETL/ELT) that ingest, process, and curate large-scale enterprise data, including telemetry/vehicle data and other structured/unstructured sources.
  2. Build and maintain Gen AI pipelines — including embedding generation, vector store indexing, retrieval-augmented generation (RAG), and LLM orchestration — to enable intelligent search, summarization, and conversational analytics over enterprise data.
  3. Migrate and modernize data assets to a centralized data platform (e.g., BigQuery) using principled data lake/warehouse architectures (Bronze/Silver/Gold or Medallion architecture) to power analytics, reporting, and AI/ML workloads.
  4. Architect scalable data models and data warehouses, optimizing for query performance, maintainability, cost efficiency, and downstream AI consumption.
  5. Develop and operate robust orchestration pipelines using Airflow/Astronomer or Schedule Query, with secure, reproducible CI/CD workflows (Terraform + Git) for both data and AI artifacts.

Skills

Required

  • Google Cloud Platform (BigQuery, Cloud Storage, Dataflow, Dataproc; Schedule Query or equivalent scheduling/orchestration)
  • Generative AI technologies (LLMs, embeddings, vector databases, RAG architectures, AI orchestration frameworks like LangChain)
  • Semantic Data layer development
  • Data pipeline design and implementation (ETL/ELT)
  • Data modeling and data warehousing
  • Orchestration tools (Airflow/Astronomer or Schedule Query)
  • CI/CD workflows (Terraform, Git)
  • Data governance and security controls
  • Cloud performance optimization

Nice to have

  • Infrastructure-as-code
  • BI tools (Looker, Tableau, Power BI, Grafana)
  • Communication skills
  • Cross-functional team collaboration

What the JD emphasized

  • 1+ years of experience working with Generative AI technologies — including LLMs, embeddings, vector databases, RAG architectures, or AI orchestration frameworks (e.g., LangChain, Semantic Kernel, LlamaIndex).
  • 1+ year experience building Semantic Data layer to serve AI agents.

Other signals

  • Gen AI capabilities
  • LLM-powered data enrichment
  • RAG
  • intelligent automation
  • embedding generation
  • vector store indexing
  • LLM orchestration
  • LangChain
  • Semantic Data layer to serve AI agents
  • prompt injection safeguards
  • responsible AI guardrails
  • agents
  • function calling
  • fine-tuning
  • multimodal models