What you'd actually do

Designs and builds end-to-end evaluation frameworks for AI platforms including LLM-based chat bots and voice bots , covering accuracy, relevance, hallucination detection, latency, and response quality.

Implements AI model evaluation pipelines using RAGAS, DeepEval, and LangChain to benchmark and validate LLM outputs against ground truth.

Architects and builds robust, scalable, and maintainable test automation frameworks for API (REST/SOAP), web, and mobile applications using pytest, Selenium, WebdriverIO, and Appium.

Develops test strategies for conversational AI — validating intent recognition, slot filling, dialogue flow, fallback handling, and multi-turn context retention.

Builds voice bot quality validation covering speech recognition accuracy, TTS quality, call flow logic, and DTMF handling.

At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We’re looking for people who are determined to make life better for people around the world.

We are a global healthcare leader headquartered in Indianapolis, Indiana. Our 42,000+ employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We're looking for people who are determined to make life better for people around the world. Come build advanced software capabilities to accelerate our digital transformation and support Lilly's evolution to be the leader in Pharma-tech! THE ROLE The Software Product Engineering (SPE) organization is actively looking for a motivated Principal Engineer – QE (R3) who is deeply technical and passionate about engineering quality at scale. This is a core individual contributor role for a hands-on quality engineer with deep expertise in building evaluation frameworks for AI platforms — including chat and voice bot applications — alongside robust test automation frameworks for API, web, and mobile testing. Are you a quality-engineering craftsperson who thrives on solving complex technical problems? Do you have a passion for AI quality validation, framework architecture, and continuous innovation? Do you want to shape Lilly's engineering direction as a leader in Pharma-tech? If so, please apply. WHAT YOU'LL BE DOING In this role, you will be at the technical forefront of quality engineering — designing and building evaluation frameworks for AI-based platforms including LLM-powered chat and voice bots, and developing scalable automation frameworks for API, web, and mobile applications. You will work hands-on with Python, Java, and JavaScript/TypeScript to build production-grade test infrastructure, integrate with CI/CD pipelines, and apply advanced AI testing methodologies. You will also contribute to performance testing, cloud-based test execution, and the continuous improvement of quality practices across a global cross-functional team. KEY RESPONSIBILITIES AI Platform & Evaluation Framework Engineering ▪ Designs and builds end-to-end evaluation frameworks for AI platforms including LLM-based chat bots and voice bots , covering accuracy, relevance, hallucination detection, latency, and response quality. ▪ Implements AI model evaluation pipelines using RAGAS, DeepEval, and LangChain to benchmark and validate LLM outputs against ground truth. ▪ Develops test strategies for conversational AI — validating intent recognition, slot filling, dialogue flow, fallback handling, and multi-turn context retention. ▪ Builds voice bot quality validation covering speech recognition accuracy, TTS quality, call flow logic, and DTMF handling. ▪ Defines quality metrics for AI systems including precision, recall, F1-score, BLEU/ROUGE scores, and semantic similarity thresholds. ▪ Collaborates with AI/ML engineers and data scientists to integrate evaluation frameworks into model training and deployment pipelines. Test Automation Framework Development ▪ Architects and builds robust, scalable, and maintainable test automation frameworks for API (REST/SOAP), web, and mobile applications using pytest, Selenium, WebdriverIO, and Appium. ▪ Designs framework components including page object models, data-driven layers, reporting modules, parallel execution engines, and retry mechanisms. ▪ Develops API automation suites covering contract testing, schema validation, authentication flows, and end-to-end service integration testing. ▪ Implements mobile test automation for both iOS and Android platforms, covering native, hybrid, and responsive web applications. ▪ Builds reusable test libraries and utility modules in Python, Java, and JavaScript/TypeScript to accelerate framework adoption across teams. ▪ Integrates test frameworks with CI/CD pipelines using GitHub Actions, Jenkins, or Bamboo for continuous test execution and reporting. Technology Leadership & Engineering Excellence ▪ Serves as a key technical contributor and subject matter expert in quality engineering technologies, automation architectures, and testing methodologies. ▪ Demonstrates deep understanding of software design patterns, SOLID principles, and architectural best practices applied to test automation. ▪ Proactively adopts and evaluates new programming languages and tools including Go, Rust, JavaScript, Python, and TypeScript to improve automation quality and coverage. ▪ Develops and maintains CI/CD pipeline integrations, ensuring test suites are reliable, fast, and part of every deployment gate. Performance, Cloud & Infrastructure Testing ▪ Designs and executes performance, load, and stress testing strategies using JMeter to validate system reliability, throughput, and scalability under realistic and peak conditions. ▪ Drives cloud-based test execution strategies leveraging AWS, Kubernetes, and Docker for dynamic, scalable, and containerised test environments. ▪ Implements infrastructure-as-code test provisioning to enable repeatable and version-controlled test environment setup. ▪ Conducts API security and reliability testing, including authentication, rate limiting, error handling, and timeout validation. Process Improvement & Quality Advocacy ▪ Champions shift-left testing practices, embedding quality validation early in the development lifecycle and reducing defect escape rates. ▪ Identifies and implements test optimisation techniques — including parallel execution, smart test selection, and coverage prioritisation — to reduce pipeline duration without sacrificing quality. ▪ Implements best practices in test design, maintenance, and documentation, ensuring automation assets remain production-grade over time. ▪ Advocates for quality-first engineering across development and product teams, providing data-driven insights on defect trends, test coverage, and automation ROI. Mentorship & Knowledge Sharing ▪ Supports and mentors junior and mid-level QA engineers, providing technical guidance on automation frameworks, AI evaluation, and CI/CD integration. ▪ Conducts framework and code reviews, ensuring best practices and engineering standards are consistently applied. ▪ Shares knowledge through internal talks, documentation, and community-of-practice contributions, advancing the QE capability across SPE. ▪ Encourages a culture of continuous learning, staying current with industry trends in AI testing, automation tooling, and software quality. Innovation & Thought Leadership ▪ Researches and evaluates emerging testing technologies including AI-based test generation, self-healing automation, and autonomous testing agents. ▪ Promotes innovative approaches to quality engineering, driving adoption of AI-driven test automation and intelligent defect prediction within the team. ▪ Contributes to the SPE QE community of practice — publishing patterns, reusable components, and technical guidance for cross-team benefit. Agile & DevOps Collaboration ▪ Ensures seamless test automation integration within CI/CD pipelines, supporting continuous delivery and deployment across all supported environments. ▪ Participates actively in Agile ceremonies including sprint planning, backlog refinement, and defect triaging, contributing technical quality perspectives. ▪ Works closely with development, DevOps, and product teams to embed testing within Agile workflows and ensure alignment on release readiness. REQUIRED TECHNICAL SKILLS & QUALIFICATIONS ▪ Bachelor's or Master's degree in Computer Science, Engineering, or a related field. ▪ 10+ years of Quality Engineering experience in a high-tech, life sciences, or related field. ▪ Hands-on experience building evaluation frameworks for AI platforms including LLM-based chat bots and voice bot applications. ▪ Proficiency in Python, Java, and JavaScript/TypeScript with the ability to build production-grade automation code, not just scripts. ▪ Deep expertise in test automation framework design and development for API, web, and mobile using pytest / Selenium / WebdriverIO / Appium. ▪ Strong experience in AI/ML testing methodologies including LLM evaluation metrics, hallucination detection, and model benchmarking. ▪ Hands-on experience with RAGAS, DeepEval, or equivalent model evaluation frameworks. ▪ Strong experience in test strategy formulation, test planning, and risk-based testing approaches. ▪ Proficiency in performance and load testing using JMeter or equivalent tools. ▪ Strong understanding of cloud computing and containerised testing using AWS, Kubernetes, and Docker. ▪ Experience with CI/CD tools (GitHub Actions, Jenkins, Bamboo) and integration of automated tests into delivery pipelines. ▪ Experience with GitHub, JIRA, Xray, and Agile delivery methodologies. ▪ Excellent written and verbal communication skills, with the ability to convey complex technical concepts to both technical and non-technical audiences. PREFERRED QUALIFICATIONS ▪ Experience with LangChain, LangGraph, or similar LLM orchestration frameworks. ▪ Hands-on experience testing multi-modal AI applications (text, voice, and image inputs). ▪ Knowledge of speech recognition quality metrics (WER, CER) for voice bot evaluation. ▪ Experience in multi-region software testing and global test infrastructure. ▪ Knowledge of security and compliance testing in regulated environments (e.g., GxP, HIPAA). ▪ Contributions to open-source test automation frameworks or AI evaluation tooling.

Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form (https://careers.lilly.com/us/en/workplace-accommodation) for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.

Lilly does not discriminate on the basis of age, race, color, religion, gender, sexual orientation, gender identity, gender expression, national origin, protected veteran status, disability or any other legally protected status.

#WeAreLilly

Principal Engineer - Quality Engineering

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals