What you'd actually do

Define and own the end-to-end test strategy for agentic AI workstreams, establishing quality standards that account for the probabilistic, non-deterministic nature of LLM-powered systems.

Design and implement AI-specific evaluation frameworks covering hallucination detection, prompt quality scoring, agent task completion rates, and output faithfulness against ground-truth references.

Build and maintain automated test suites in Python using frameworks ( pytest, robot framework etc ) covering unit, integration, and system-level scenarios across all workstream components.

Develop RAG pipeline test coverage including retrieval precision and recall, semantic relevance scoring, context faithfulness, and end-to-end query-to-answer accuracy using tools such as RAGAS.

Design and Build QA automation tests leveraging industry-standard tools and technologies, encompassing functional, regression, and end-to-end integration testing across connected systems and platforms.

Skills

Required

Python for test automation
pytest
Selenium
Playwright
REST-assured
QA or SDET experience
building and maintaining automated test frameworks
testing AI or ML systems
understanding of non-deterministic outputs
designing reusable test utilities, fixtures, mocks, and data factories

Nice to have

robot framework
RAGAS
GitHub Actions
Copado
Jenkins
Postman
Salesforce integrations
Slack applications
cloud infrastructure

What the JD emphasized

non-deterministic AI systems

LLMs, autonomous agents, and probabilistic retrieval pipelines

AI-specific evaluation frameworks

hallucination detection

prompt quality scoring

agent task completion rates

output faithfulness against ground-truth references

RAG pipeline test coverage

retrieval precision and recall

semantic relevance scoring

context faithfulness

end-to-end query-to-answer accuracy

non-deterministic outputs

acceptable variance thresholds

snapshot-based comparisons

statistical scoring methods

AI-specific metrics

As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn’t changed — we’re here to stop breaches, and we’ve redefined modern security with the world’s most advanced AI-native platform. Our customers span all industries, and they count on CrowdStrike to keep their businesses running, their communities safe and their lives moving forward. We’re also a mission-driven company. We cultivate a culture that gives every CrowdStriker both the flexibility and autonomy to own their careers. We’re always looking to add talented CrowdStrikers to the team who have limitless passion, a relentless focus on innovation and a fanatical commitment to our customers, our community and each other. Ready to join a mission that matters? The future of cybersecurity starts with you.

About the Role:

This role is part of CrowdStrike’s Core Tech, Go To Market IT Apps Team — a team of Architects, Engineers, QA, BSAs, and Product Owners delivering highly Reliable, Scalable, and Secure Infrastructure and Automation Services across GTM Applications to Accelerate Business Velocity and Operational Excellence. As part of the GTM AI Pod, every member is expected to embrace Agentic AI technologies, operate with an open-source AI engineering mindset, and actively contribute to building the next generation of intelligent GTM workflows.

As the QA Engineer for the GTM AI Pod, you are the quality guardian for a workstream that builds non-deterministic AI systems — and that distinction matters. Traditional QA playbooks were designed for deterministic software; they break down the moment you introduce LLMs, autonomous agents, and probabilistic retrieval pipelines. This role requires you to rethink quality from first principles: designing evaluation frameworks that account for variable outputs, defining what ‘correct’ means for an agentic workflow, and building repeatable test suites that give engineers and stakeholders genuine confidence across every release. You will be embedded in the delivery team from requirements through production, owning the test strategy, automation framework, and quality bar for all workstream deliverables — spanning Salesforce integrations, Slack applications, RAG pipelines, agentic workflows, and the cloud infrastructure that ties them together.

**What You’ll Do: **

Define and own the end-to-end test strategy for agentic AI workstreams, establishing quality standards that account for the probabilistic, non-deterministic nature of LLM-powered systems.
Design and implement AI-specific evaluation frameworks covering hallucination detection, prompt quality scoring, agent task completion rates, and output faithfulness against ground-truth references.
Build and maintain automated test suites in Python using frameworks ( pytest, robot framework etc ) covering unit, integration, and system-level scenarios across all workstream components.
Develop RAG pipeline test coverage including retrieval precision and recall, semantic relevance scoring, context faithfulness, and end-to-end query-to-answer accuracy using tools such as RAGAS.
Design and Build QA automation tests leveraging industry-standard tools and technologies, encompassing functional, regression, and end-to-end integration testing across connected systems and platforms.
Build and execute Slack integration test suites validating bot response correctness, Workflow Builder trigger fidelity, agentic Slack bot state management, and error handling under edge-case inputs.
Integrate automated tests into CI/CD pipelines (GitHub Actions, Copado, Jenkins) so every pull request is gated by a defined quality bar before merge.
Design and execute performance and load tests for LLM-powered APIs, measuring latency percentiles, token throughput, and degradation patterns under concurrent load.
Conduct security and adversarial testing including prompt injection attempts, output validation for sensitive data leakage, and collaboration with the DevSecOps team on SAST/DAST pipeline findings.
Develop a regression strategy for non-deterministic outputs, defining acceptable variance thresholds, snapshot-based comparisons, and statistical scoring methods that flag genuine regressions without false positives.
Validate observability stack completeness — confirm that distributed tracing, structured logging, SLOs, and AI-specific metrics (latency, token throughput, hallucination rates) are instrumented correctly and alerting as expected.
Collaborate with engineers and the AI Product Owner from requirements grooming through sprint review, contributing testability requirements, acceptance criteria, and definition-of-done checklists.
Own API contract testing across internal and third-party integrations (Salesforce, Marketo, Snowflake, Gong, Clari, G-Suite) using tools such as Postman or REST-assured.
Drive defect lifecycle ownership: triage, severity classification, root cause analysis, regression prevention, and post-release quality retrospectives that feed back into the test strategy.
Champion a shift-left quality culture, coaching engineers to write testable code, instrument their own unit tests, and treat quality as a shared team responsibility rather than a gate at the end of the sprint.

**What You’ll Need: **

Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related field.
6+ years of QA or SDET experience, with a track record of building and maintaining automated test frameworks in production environments.
Hands-on experience testing AI or ML systems, with a solid understanding of why non-deterministic outputs require different evaluation strategies than conventional software.
Strong proficiency in Python for test automation, including designing reusable test utilities, fixtures, mocks, and data factories.
Experience with test frameworks and tooling such as pytest, Selenium, Playwright, REST-assured, Postman, or equivalent.
Practical understanding of LLM behaviour including temperature effects, token limits, prompt sensitivity, and failure modes that impact test reproducibility.
Hands-on Salesforce QA testing experience encompassing validation of Lightning Web Component (LWC) behaviors, Platform Event flows, and API integrations.
API testing proficiency across REST endpoints, including contract validation, schema conformance, payload verification, and error-path coverage.
Experience integrating automated tests into CI/CD pipelines so quality gates are enforced automatically on every code change.
Familiarity with RAG evaluation metrics — faithfulness, answer relevance, context recall — and tooling such as RAGAS or LangSmith for structured AI output evaluation.
Experience with performance and load testing tools (Locust, k6, JMeter, or similar) to validate LLM-powered API behaviour under realistic and peak load.
Working knowledge of security testing basics: prompt injection, output sanitisation, OWASP top-10 awareness, and coordination with DevSecOps tooling.

**Bonus Points: **

ISTQB Advanced Test Analyst certification or equivalent recognised QA certification.
Salesforce certifications such as Platform App Builder, Platform Developer I, or Salesforce Administrator that support deeper integration test design.
Experience with AI evaluation frameworks and observability platforms such as RAGAS, LangSmith, Datadog LLM Observability, or OpenTelemetry for AI workloads.
Hands-on experience testing Slack applications, including event-driven webhook testing, slash command validation, and agentic Slack bot conversation-flow coverage.
Contributions to open-source test tooling, evaluation libraries, or QA frameworks relevant to AI systems or enterprise integrations.
Exposure to chaos engineering or adversarial ML testing methodologies, including fault-injection, boundary testing, and red-teaming LLM-powered agents.
Familiarity with observability and APM tooling (Datadog, Grafana, OpenTelemetry) and the ability to validate that instrumentation is correctly implemented.
Prior experience in a cybersecurity, fintech, or high-compliance software environment where quality standards carry regulatory or contractual weight.

#LI-DP1

#LI-Remote

**Benefits of Working at CrowdStrike: **

Market leader in compensation and equity awards
Comprehensive physical and mental wellness programs
Competitive vacation and holidays for recharge
Paid parental and adoption leaves
Professional development opportunities for all employees regardless of level or role
Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
Vibrant office culture with world class amenities
Great Place to Work Certified™ across the globe

CrowdStrike is proud to be an equal opportunity employer. We are committed to fostering a culture of belonging where everyone is valued for who they are and empowered to succeed. We support veterans and individuals with disabilities through our affirmative action program.

CrowdStrike is committed to providing equal employment opportunity for all employees and applicants for employment. The Company does not discriminate in employment opportunities or practices on the basis of race, color, creed, ethnicity, religion, sex (including pregnancy or pregnancy-related medical conditions), sexual orientation, gender identity, marital or family status, veteran status, age, national origin, ancestry, physical disability (including HIV and AIDS), mental disability, medical condition, genetic information, membership or activity in a local human rights commission, status with regard to public assistance, or any other characteristic protected by law. We base all employment decisions--including recruitment, selection, training, compensation, benefits, discipline, promotions, transfers, lay-offs, return from lay-off, terminations and social/recreational programs--on valid job requirements.

If you need assistance accessing or reviewing the information on this website or need help submitting an application for employment or requesting an accommodation, please contact us at recruiting@crowdstrike.com for further assistance.