What you'd actually do

Design and implement autonomous multi-agent systems using Reinforcement Learning (RL) loops that can interact with our infrastructure to perform safe, automated remediation actions

Build GenAI agents capable of digesting logs, traces, and metrics to provide "Human-in-the-loop" root cause analysis and conversational debugging for our SREs

Develop and deploy deep learning models (Transformers, LSTMs, etc.) for forecasting and anomaly detection on high-cardinality, high-volume time series data

Optimize inference pipelines to run with low latency on streaming telemetry data (Kafka/Flink), ensuring we catch issues the moment they happen

Own the lifecycle of your models—from feature engineering on petabyte-scale datasets to training, deployment, and monitoring in production Kubernetes environments

Skills

Required

8+ years of professional experience in Machine Learning Engineering or Data Science
PyTorch or TensorFlow
Time Series analysis (forecasting/anomaly detection)
NLP
building applications using LLMs (RAG pipelines, LangChain, vector databases)
technical domains (code analysis, log parsing)
RL concepts (policies, rewards, agents)
optimization or control problems
distributed data processing and streaming technologies (Apache Spark, Kafka, Flink)
software engineering fundamentals (Python, C++, or Go)
CI/CD for ML
deploying models via APIs (FastAPI, Triton Inference Server)

Nice to have

the "three pillars" (Logs, Metrics, Traces)
Prometheus, Grafana, OpenTelemetry, or Jaeger
AutoGen, CrewAI, or Ray RLlib
AWS/GCP/Azure
Kubernetes (K8s) orchestration
control theory
causal inference

What the JD emphasized

moving beyond simple anomaly detection

Multi-Agent Systems

Reinforcement Learning

Large Language Models (LLMs)

detect incidents in real-time but to troubleshoot and resolve them autonomously

massive datasets (billions of telemetry points)

solve real-world reliability challenges

petabyte-scale datasets

low latency on streaming telemetry data

catch issues the moment they happen

Company Overview Docusign brings agreements to life. Over 1.5 million customers and more than a billion people in over 180 countries use Docusign solutions to accelerate the process of doing business and simplify people’s lives. With intelligent agreement management, Docusign unleashes business-critical data that is trapped inside of documents. Until now, these were disconnected from business systems of record, costing businesses time, money, and opportunity. Using Docusign’s Intelligent Agreement Management platform, companies can create, commit, and manage agreements with solutions created by the #1 company in e-signature and contract lifecycle management (CLM). What you'll do We are looking for a Senior Machine Learning Engineer to redefine how we operate our global services. You won't just be building dashboards; you will be building the "brain" of our infrastructure. We are moving beyond simple anomaly detection. We are building a self-healing ecosystem where Multi-Agent Systems and Reinforcement Learning (RL) loops work in tandem with Large Language Models (LLMs) to not only detect incidents in real-time but to troubleshoot and resolve them autonomously. If you are passionate about applying complex AI architectures to massive datasets (billions of telemetry points) to solve real-world reliability challenges, this is the role for you. This position is an individual contributor role reporting to the Sr. Director, Software Engineering. Responsibility Design and implement autonomous multi-agent systems using Reinforcement Learning (RL) loops that can interact with our infrastructure to perform safe, automated remediation actions Build GenAI agents capable of digesting logs, traces, and metrics to provide "Human-in-the-loop" root cause analysis and conversational debugging for our SREs Develop and deploy deep learning models (Transformers, LSTMs, etc.) for forecasting and anomaly detection on high-cardinality, high-volume time series data Optimize inference pipelines to run with low latency on streaming telemetry data (Kafka/Flink), ensuring we catch issues the moment they happen Own the lifecycle of your models—from feature engineering on petabyte-scale datasets to training, deployment, and monitoring in production Kubernetes environments Collaborate with Applied Scientists to translate bleeding-edge research (e.g., causal inference, decision transformers) into production-hardened AIOps tools Job Designation Hybrid: Employee divides their time between in-office and remote work. Access to an office location is required. (Frequency: Minimum 2 days per week; may vary by team but will be weekly in-office expectation) Positions at Docusign are assigned a job designation of either In Office, Hybrid or Remote and are specific to the role/job. Preferred job designations are not guaranteed when changing positions within Docusign. Docusign reserves the right to change a position's job designation depending on business needs and as permitted by local law. What you bring Basic 8+ years of professional experience in Machine Learning Engineering or Data Science Experience with PyTorch or TensorFlow, specifically regarding Time Series analysis (forecasting/anomaly detection) and NLP Experience building applications using LLMs (RAG pipelines, LangChain, vector databases) specifically for technical domains (code analysis, log parsing) Experience with RL concepts (policies, rewards, agents) and experience applying them to optimization or control problems Experience with distributed data processing and streaming technologies (Apache Spark, Kafka, Flink) Expereience with software engineering fundamentals (Python, C++, or Go), CI/CD for ML, and experience deploying models via APIs (FastAPI, Triton Inference Server) Preferred Familiarity with the "three pillars" (Logs, Metrics, Traces) and tools like Prometheus, Grafana, OpenTelemetry, or Jaeger Experience with frameworks like AutoGen, CrewAI, or Ray RLlib Deep experience with AWS/GCP/Azure and Kubernetes (K8s) orchestration A background in control theory or causal inference Wage Transparency Pay for this position is based on a number of factors including geographic location and may vary depending on job-related knowledge, skills, and experience. Based on applicable legislation, the below details pay ranges in the following locations: California: $186,100.00 - $300,550.00 base salary Washington, Maryland, New Jersey and New York (including NYC metro area): $178,900.00 - $262,825.00 base salary This role is also eligible for the following: Bonus: Sales personnel are eligible for variable incentive pay dependent on their achievement of pre-established sales goals. Non-Sales roles are eligible for a company bonus plan, which is calculated as a percentage of eligible wages and dependent on company performance. Stock: This role is eligible to receive Restricted Stock Units (RSUs). Global benefits provide options for the following: Paid Time Off: earned time off, as well as paid company holidays based on region Paid Parental Leave: take up to six months off with your child after birth, adoption or foster care placement Full Health Benefits Plans: options for 100% employer paid and minimum employee contribution health plans from day one of employment Retirement Plans: select retirement and pension programs with potential for employer contributions Learning and Development: options for coaching, online courses and education reimbursements Compassionate Care Leave: paid time off following the loss of a loved one and other life-changing events Life at Docusign Working here Docusign is committed to building trust and making the world more agreeable for our employees, customers and the communities in which we live and work. You can count on us to listen, be honest, and try our best to do what’s right, every day. At Docusign, everything is equal. We each have a responsibility to ensure every team member has an equal opportunity to succeed, to be heard, to exchange ideas openly, to build lasting relationships, and to do the work of their life. Best of all, you will be able to feel deep pride in the work you do, because your contribution helps us make the world better than we found it. And for that, you’ll be loved by us, our customers, and the world in which we live. Accommodation Docusign is committed to providing reasonable accommodations for qualified individuals with disabilities in our job application procedures. If you need such an accommodation, or a religious accommodation, during the application process, please contact us at accommodations@docusign.com. If you experience any issues, concerns, or technical difficulties during the application process please get in touch with our Talent organization at taops@docusign.com for assistance. Applicant and Candidate Privacy Notice States Not Eligible for Employment This position is not eligible for employment in the following states: Alaska, Hawaii, Maine, Mississippi, North Dakota, South Dakota, Vermont, West Virginia and Wyoming. Equal Opportunity Employer It's important to us that we build a talented team that is as diverse as our customers and where all employees feel a deep sense of belonging and thrive. We encourage great talent who bring a range of perspectives to apply for our open positions. Docusign is an Equal Opportunity Employer and makes hiring decisions based on experience, skill, aptitude and a can-do approach. We will not discriminate based on race, ethnicity, color, age, sex, religion, national origin, ancestry, pregnancy, sexual orientation, gender identity, gender expression, genetic information, physical or mental disability, registered domestic partner status, caregiver status, marital status, veteran or military status, or any other legally protected category. EEO Know Your Rights poster #LI-Hybrid

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals