What you'd actually do

Develop and orchestrate multi-agent AI systems for automated test generation, test execution, and end-to-end development workflow optimization using frameworks like LangGraph, AutoGen, or the Anthropic Agent SDK (Claude Code)

Design and implement agentic workflows that coordinate multiple AI agents to autonomously drive test automation across UI, API, integration, and system levels, from test case synthesis to result evaluation, ensuring seamless integration with existing developer tools and MCP-compatible services

Build evaluation frameworks and custom benchmarks for agentic systems, including comparisons of AI agents against commercial solvers, using tools like AgentBench and Langfuse

Evaluate MCP server and tool performance across agentic pipelines, measuring latency, accuracy, context fidelity, and end-to-end task completion rates

Skills

Required

Python
ML frameworks (PyTorch, Transformers, scikit-learn)
Large Language Models applied to software understanding or test generation
AI evaluation methodologies and metrics for agentic task completion and test quality
statistical analysis
experimental design

Nice to have

software engineering or QA
test automation frameworks (e.g., Playwright, Selenium, Pytest, Appium)
CI/CD pipelines
benchmarks that compare AI agents against commercial or domain-specific solvers
MCP (Model Context Protocol)
LangGraph
AutoGen
Anthropic Agent SDK / Claude Code
vision-language models or multi-modal AI
Azure AI Foundry/ML or AWS cloud ML platforms

What the JD emphasized

rigorously evaluate intelligent agentic systems

benchmarking AI agents against commercial solvers

Evaluate MCP server and tool performance across agentic pipelines

Knowledge of AI evaluation methodologies and metrics for agentic task completion and test quality

Experience designing benchmarks that compare AI agents against commercial or domain-specific solvers

Job Requisition ID #

26WD96920

Position Overview

As a Software Developer on the Fusion platform services team within Product Development and Manufacturing Solutions (PDMS), you'll be part of a team of technologists dedicated to creating cutting-edge AI and generative AI solutions that enhance developer productivity and experience. You'll work closely with AI engineers, software architects, and product engineering teams to build and rigorously evaluate intelligent agentic systems — including benchmarking AI agents against commercial solvers — and develop MCP (Model Context Protocol)-based tooling that integrates seamlessly with IDEs such as VS Code and Cursor.

Responsibilities

Develop and orchestrate multi-agent AI systems for automated test generation, test execution, and end-to-end development workflow optimization using frameworks like LangGraph, AutoGen, or the Anthropic Agent SDK (Claude Code)
Design and implement agentic workflows that coordinate multiple AI agents to autonomously drive test automation across UI, API, integration, and system levels, from test case synthesis to result evaluation, ensuring seamless integration with existing developer tools and MCP-compatible services
Build evaluation frameworks and custom benchmarks for agentic systems, including comparisons of AI agents against commercial solvers, using tools like AgentBench and Langfuse
Evaluate MCP server and tool performance across agentic pipelines, measuring latency, accuracy, context fidelity, and end-to-end task completion rates

Minimum Qualifications

BS/MS in Computer Science, Machine Learning, or a related applied AI field
Expertise in Python and ML frameworks (PyTorch, Transformers, scikit-learn)
Experience with Large Language Models applied to software understanding or test generation
Knowledge of AI evaluation methodologies and metrics for agentic task completion and test quality
Strong foundation in statistical analysis and experimental design
Experience with developer workflow and productivity measurement frameworks

** Preferred Qualifications **

Background in software engineering or QA with close collaboration with development teams
Familiarity with test automation frameworks (e.g., Playwright, Selenium, Pytest, Appium) and CI/CD pipelines
Experience designing benchmarks that compare AI agents against commercial or domain-specific solvers
Hands-on experience with MCP (Model Context Protocol), building, evaluating, and optimizing MCP servers and tool integrations within agentic pipelines
Experience with agentic AI frameworks including LangGraph, AutoGen, or the Anthropic Agent SDK / Claude Code
Knowledge in vision-language models or multi-modal AI for UI and system-level understanding and evaluation
Experience with Azure AI Foundry/ML or AWS cloud ML platforms

Aperçu du poste

En tant que développeur logiciel au sein de l’équipe des services de la plateforme Fusion, dans le groupe Développement de produits et solutions de fabrication (PDMS), vous ferez partie d’une équipe de technologues dédiée à la création de solutions d’IA et d’IA générative de pointe visant à améliorer la productivité et l’expérience des développeurs. Vous collaborerez étroitement avec des ingénieurs en IA, des architectes logiciels et des équipes d’ingénierie produit afin de concevoir et d’évaluer rigoureusement des systèmes intelligents agentiques — notamment en comparant les agents d’IA à des solveurs commerciaux — et de développer des outils basés sur le MCP (Model Context Protocol) qui s’intègrent harmonieusement à des environnements de développement intégrés (IDE) comme VS Code et Cursor.

Responsabilités

Développer et orchestrer des systèmes d’IA multi-agents pour la génération automatisée de tests, l’exécution de tests et l’optimisation des flux de développement de bout en bout, à l’aide de cadres comme LangGraph, AutoGen ou le SDK Agent d’Anthropic (Claude Code)
Concevoir et mettre en œuvre des flux de travail agentiques coordonnant plusieurs agents d’IA afin d’automatiser les tests aux niveaux interface utilisateur (UI), API, intégration et système, de la création des cas de test jusqu’à l’évaluation des résultats, tout en assurant une intégration fluide avec les outils de développement existants et les services compatibles MCP
Construire des cadres d’évaluation et des bancs d’essai personnalisés pour les systèmes agentiques, incluant des comparaisons entre agents d’IA et solveurs commerciaux, à l’aide d’outils comme AgentBench et Langfuse
Évaluer la performance des serveurs MCP et des outils au sein de pipelines agentiques, en mesurant la latence, la précision, la fidélité du contexte et les taux de complétion des tâches de bout en bout

Qualifications minimales

Baccalauréat ou maîtrise en informatique, en apprentissage automatique ou dans un domaine connexe de l’IA appliquée
Expertise en Python et en cadres d’apprentissage automatique (PyTorch, Transformers, scikit-learn)
Expérience avec les grands modèles de langage appliqués à la compréhension logicielle ou à la génération de tests
Connaissance des méthodologies d’évaluation en IA et des métriques liées à l’exécution de tâches agentiques et à la qualité des tests
Solide base en analyse statistique et en conception expérimentale
Expérience avec les flux de travail des développeurs et les cadres de mesure de la productivité

Qualifications privilégiées

Expérience en génie logiciel ou en assurance qualité (QA), avec collaboration étroite avec des équipes de développement
Familiarité avec des cadres d’automatisation des tests (p. ex. Playwright, Selenium, Pytest, Appium) et les pipelines CI/CD
Expérience dans la conception de bancs d’essai comparant des agents d’IA à des solveurs commerciaux ou spécialisés
Expérience pratique avec le MCP (Model Context Protocol), incluant la conception, l’évaluation et l’optimisation de serveurs MCP et d’intégrations d’outils dans des pipelines agentiques
Expérience avec des cadres d’IA agentique tels que LangGraph, AutoGen ou le SDK Agent d’Anthropic / Claude Code
Connaissances des modèles vision-langage ou de l’IA multimodale pour la compréhension et l’évaluation des interfaces utilisateur et des systèmes
Expérience avec les plateformes infonuagiques d’apprentissage automatique comme Azure AI Foundry/ML ou AWS

Learn More

About Autodesk

Welcome to Autodesk! Amazing things are created every day with our software – from the greenest buildings and cleanest cars to the smartest factories and biggest hit movies. We help innovators turn their ideas into reality, transforming not only how things are made, but what can be made.

We take great pride in our culture here at Autodesk – it’s at the core of everything we do. Our culture guides the way we work and treat each other, informs how we connect with customers and partners, and defines how we show up in the world.

When you’re an Autodesker, you can do meaningful work that helps build a better world designed and made for all. Ready to shape the world and your future? Join us!

Salary transparency

Salary is one part of Autodesk’s competitive compensation package. For Canada based roles, we expect a starting base salary between $88,000 and $128,700. Offers are based on the candidate’s experience and geographic location, and may exceed this range. In addition to base salaries, our compensation package may include annual cash bonuses, commissions for sales roles, stock grants, and a comprehensive benefits package.

Belonging We take pride in cultivating a culture of belonging where everyone can thrive. Learn more here: https://www.autodesk.com/company/global-belonging

**Are you an existing contractor or consultant with Autodesk? **

Please search for open jobs and apply internally (not on this external site).