Senior Cybersecurity Data Engineer - Ai/ml Sme

Workday Workday · Enterprise · USA.VA.Reston

Senior Data Engineer specializing in AI/ML and platform integration for a Cybersecurity team. The role focuses on optimizing the data platform for ML/GenAI workloads, managing feature stores and vector databases, and building integration fabric with APIs and data connectors. Responsibilities include designing ML data infrastructure, enterprise feature store architecture, vector infrastructure for GenAI, platform integration, MLOps collaboration, and compute optimization.

What you'd actually do

  1. Design, provision, and maintain the platform infrastructure required for end-to-end machine learning lifecycles. Optimize the platform for distributed training, model evaluation, and batch/real-time inference.
  2. Design and manage the enterprise Feature Store. Ensure consistent, low-latency feature delivery, preventing data leakage between training pipelines and real-time production inference.
  3. Architect and maintain vector databases and indexing pipelines required to support Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) patterns, and semantic search.
  4. Serve as the SME for how external applications interact with the data lakehouse. Design, build, and secure high-throughput APIs, data connectors, and reverse-ETL patterns to sync data back into business systems (e.g., CRMs, ERPs, marketing automation).
  5. Partner closely with Data Scientists and MLOps teams to establish CI/CD automation for ML (MLOps). Transition experimental, unoptimized data science notebooks into resilient, production-grade automated workflows.

Skills

Required

  • 5+ years of data engineering experience
  • 2+ years supporting machine learning platforms, MLOps, or complex platform integrations
  • AWS SageMaker, MLflow, or equivalent cloud-native ML platforms
  • Feature store frameworks (e.g., Feast, SageMaker Feature Store)
  • Vector databases (e.g., Pinecone, Milvus, Qdrant, or Pgvector)
  • Apache Spark / AWS EMR, Ray, or Dask
  • Building rest APIs
  • Webhooks
  • Streaming tools (e.g., AWS Kinesis, Kafka)
  • Python (Pandas, NumPy, Scikit-Learn)
  • SQL
  • GitHub Actions, GitLab CI, or Jenkins

Nice to have

  • Deploying and fine-tuning open-source LLMs
  • Orchestrating AI agents using frameworks like LangChain or LlamaIndex
  • Reverse-ETL tools (e.g., Census, Hightouch)
  • Enterprise integration platforms

What the JD emphasized

  • AI/ML SME
  • Feature Store
  • Vector databases
  • RAG
  • LLMs
  • Platform Integration
  • APIs
  • MLOps

Other signals

  • design and provision infrastructure for ML lifecycles
  • optimize platform for distributed training, model evaluation, and inference
  • architect and maintain vector databases and indexing pipelines for LLMs and RAG
  • build APIs, data connectors, and reverse-ETL patterns for data synchronization