Qa Engineer - Gtm Applications (remote, Ind)

CrowdStrike CrowdStrike · Enterprise · India · Remote

QA Engineer for an AI Pod focused on building agentic AI systems, including LLMs, autonomous agents, and RAG pipelines. The role requires defining test strategies for non-deterministic systems, designing evaluation frameworks, building automated test suites in Python, and integrating tests into CI/CD pipelines. Experience with AI/ML testing, Python, and various testing tools is essential.

What you'd actually do

  1. Define and own the end-to-end test strategy for agentic AI workstreams, establishing quality standards that account for the probabilistic, non-deterministic nature of LLM-powered systems.
  2. Design and implement AI-specific evaluation frameworks covering hallucination detection, prompt quality scoring, agent task completion rates, and output faithfulness against ground-truth references.
  3. Build and maintain automated test suites in Python using frameworks ( pytest, robot framework etc ) covering unit, integration, and system-level scenarios across all workstream components.
  4. Develop RAG pipeline test coverage including retrieval precision and recall, semantic relevance scoring, context faithfulness, and end-to-end query-to-answer accuracy using tools such as RAGAS.
  5. Design and Build QA automation tests leveraging industry-standard tools and technologies, encompassing functional, regression, and end-to-end integration testing across connected systems and platforms.

Skills

Required

  • Python for test automation
  • pytest
  • Selenium
  • Playwright
  • REST-assured
  • QA or SDET experience
  • building and maintaining automated test frameworks
  • testing AI or ML systems
  • understanding of non-deterministic outputs
  • designing reusable test utilities, fixtures, mocks, and data factories

Nice to have

  • robot framework
  • RAGAS
  • GitHub Actions
  • Copado
  • Jenkins
  • Postman
  • Salesforce integrations
  • Slack applications
  • cloud infrastructure

What the JD emphasized

  • non-deterministic AI systems
  • LLMs, autonomous agents, and probabilistic retrieval pipelines
  • AI-specific evaluation frameworks
  • hallucination detection
  • prompt quality scoring
  • agent task completion rates
  • output faithfulness against ground-truth references
  • RAG pipeline test coverage
  • retrieval precision and recall
  • semantic relevance scoring
  • context faithfulness
  • end-to-end query-to-answer accuracy
  • non-deterministic outputs
  • acceptable variance thresholds
  • snapshot-based comparisons
  • statistical scoring methods
  • AI-specific metrics

Other signals

  • AI-native platform
  • Agentic AI technologies
  • non-deterministic AI systems
  • LLMs, autonomous agents, and probabilistic retrieval pipelines
  • AI-specific evaluation frameworks