Data Engineer II - Getting Customers Ready for AI

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Software Engineering

The Data Engineer II role focuses on building the data foundations for AI-native security systems. This involves designing and developing scalable data platforms, pipelines, and telemetry systems to transform raw security and operational data into high-quality datasets for analytics and AI-driven insights. The role supports downstream AI models, RAG pipelines, and model training/evaluation workflows, requiring collaboration with AI engineers and data scientists.

What you'd actually do

  1. Design and build scalable data pipelines (batch and streaming) to process large volumes of security and operational data.
  2. Design and evolve data models, schemas, and storage strategies for analytics and AI use cases.
  3. Implement data validation, quality checks, and observability frameworks to ensure data accuracy and reliability.
  4. Enable high-quality datasets for AI/ML teams, including support for feature pipelines and training data preparation.
  5. Partner with engineering, data science, and product teams to deliver end-to-end data solutions.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field
  • 2+ years of software, data, or related engineering experience
  • coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • Ability to meet Microsoft, customer, and/or government security screening requirements

Nice to have

  • Hands-on experience building or maintaining data pipelines and distributed data systems
  • Programming experience in Python, Scala, or SQL for data processing and pipeline development
  • Understanding of data modeling, ETL processes, and large-scale data processing concepts

What the JD emphasized

  • security screening requirements

Other signals

  • data foundations that power AI-native security systems
  • transform raw signals ... into high-quality, trusted datasets for analytics and AI-driven insights
  • enabling downstream AI models
  • data is structured for downstream ML pipelines, feature engineering, and analytics workloads
  • Enable high-quality datasets for AI/ML teams
  • Collaborate with AI engineers to ensure data is optimized for RAG pipelines, model training, and evaluation workflows