What you'd actually do

Own the production outcome: Take full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies.

Ensure Full-Stack integrity: Oversee the end-to-end health of the platform, ensuring seamless integration between the AI core and all full-stack components, from APIs to UI, to maintain a responsive and production-ready environment.

Scale the feedback loop: Build automated systems to monitor model performance and data drift across geographically dispersed environments, ensuring the right levels of reliability.

Navigate global compliance: Manage the technical lifecycle within diverse regulatory frameworks.

Incident command: Lead the response for production issues in mission-critical environments, ensuring rapid resolution and building the guardrails to prevent them from happening again.

Skills

Required

6+ years in a high-impact technical role (SRE, FDE or MLOps)
experience in the public sector
familiarity with international government security standards and the complexities of deploying sovereign AI
proven experience maintaining production-grade applications with a deep understanding of the full request lifecycle-connecting frontend/API layers to the backend and AI core
proficiency in coding and the modern AI infrastructure
ownership mentality
understanding of public sector reliability needs
ability to explain technical performance to high-ranking officials

Nice to have

Kubernetes
vector databases
agentic development
LLM observability tools

Role Overview

Scale’s rapidly growing Global Public Sector team is focused on using AI to address critical challenges facing the public sector around the world. Our core work consists of:

Creating custom AI applications that will impact millions of citizens
Generating high-quality training data for national LLMs
Upskilling and advisory services to spread the impact of AI

As a Principal AI Ops Architect, you will design and develop the production lifecycle of full-stack AI applications, while supporting end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and the resilient cloud infrastructure required for our international government partners.

At Scale, we’re not just building AI solutions—we’re enabling the public sector to transform their operations and better serve citizens through cutting-edge technology. If you’re ready to shape the future of AI in the public sector and be a founding member of our team, we’d love to hear from you.

You will:

Own the production outcome: Take full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies.
Ensure Full-Stack integrity: Oversee the end-to-end health of the platform, ensuring seamless integration between the AI core and all full-stack components, from APIs to UI, to maintain a responsive and production-ready environment.
Scale the feedback loop: Build automated systems to monitor model performance and data drift across geographically dispersed environments, ensuring the right levels of reliability.
Navigate global compliance: Manage the technical lifecycle within diverse regulatory frameworks.
Incident command: Lead the response for production issues in mission-critical environments, ensuring rapid resolution and building the guardrails to prevent them from happening again.
Bridge the gap: Translate deep technical performance metrics into clear insights for senior international government officials.
Drive product evolution: Partner with our Engineering and ML teams to ensure the lessons learned in the field directly influence the technical architecture and decisions of future use cases.

Ideally, you have:

Experience: 6+ years in a high-impact technical role (SRE, FDE or MLOps) with experience in the public sector.
Global perspective: Familiarity with international government security standards and the complexities of deploying sovereign AI.
System architecture proficiency: Proven experience maintaining production-grade applications with a deep understanding of the full request lifecycle-connecting frontend/API layers to the backend and AI core.
Modern AI Stack expertise: Proficiency in coding and the modern AI infrastructure, including Kubernetes, vector databases, agentic development, and LLM observability tools.
Ownership: You treat every production deployment as your own. You race toward solving hard problems before the customer even sees them.
Reliability: You understand that in the public sector, a model failure may be a risk to public safety or privacy.
Customer communication: The ability to explain to a high-ranking official why the performance of the system has degraded and how we are fixing it.

_PLEASE NOTE: _Our policy requires a 90-day waiting period before reconsidering candidates for the same role. This allows us to ensure a fair and thorough evaluation of all applicants.

About Us:

At Scale, our mission is to develop reliable AI systems for the world's most important decisions. Our products provide the high-quality data and full-stack technologies that power the world's leading models, and help enterprises and governments build, deploy, and oversee AI applications that deliver real impact. We work closely with industry leaders like Meta, Cisco, DLA Piper, Mayo Clinic, Time Inc., the Government of Qatar, and U.S. government agencies including the Army and Air Force. We are expanding our team to accelerate the development of AI applications.

_We believe that everyone should be able to bring their whole selves to work, which is why we are proud to be an inclusive and equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability status, gender identity or Veteran status. _

We are committed to working with and providing reasonable accommodations to applicants with physical and mental disabilities. If you need assistance and/or a reasonable accommodation in the application or recruiting process due to a disability, please contact us at accommodations@scale.com. Please see the United States Department of Labor's _Know Your Rights poster_ for additional information.

_We comply with the United States Department of Labor's Pay Transparency provision. _

**PLEASE NOTE: **We collect, retain and use personal data for our professional business purposes, including notifying you of job opportunities that may be of interest and sharing with our affiliates. We limit the personal data we collect to that which we believe is appropriate and necessary to manage applicants’ needs, provide our services, and comply with applicable laws. Any information we collect in connection with your application will be treated in accordance with our internal policies and programs designed to protect personal data. Please see our privacy policy for additional information.

Role Overview

Scale’s rapidly growing Global Public Sector team is focused on using AI to address critical challenges facing the public sector around the world. Our core work consists of:

Creating custom AI applications that will impact millions of citizens
Generating high-quality training data for national LLMs
Upskilling and advisory services to spread the impact of AI

You will:

Own the production outcome: Take full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies.
Ensure Full-Stack integrity: Oversee the end-to-end health of the platform, ensuring seamless integration between the AI core and all full-stack components, from APIs to UI, to maintain a responsive and production-ready environment.
Scale the feedback loop: Build automated systems to monitor model performance and data drift across geographically dispersed environments, ensuring the right levels of reliability.
Navigate global compliance: Manage the technical lifecycle within diverse regulatory frameworks.
Incident command: Lead the response for production issues in mission-critical environments, ensuring rapid resolution and building the guardrails to prevent them from happening again.
Bridge the gap: Translate deep technical performance metrics into clear insights for senior international government officials.
Drive product evolution: Partner with our Engineering and ML teams to ensure the lessons learned in the field directly influence the technical architecture and decisions of future use cases.

Ideally, you have:

Experience: 6+ years in a high-impact technical role (SRE, FDE or MLOps) with experience in the public sector.
Global perspective: Familiarity with international government security standards and the complexities of deploying sovereign AI.
System architecture proficiency: Proven experience maintaining production-grade applications with a deep understanding of the full request lifecycle-connecting frontend/API layers to the backend and AI core.
Modern AI Stack expertise: Proficiency in coding and the modern AI infrastructure, including Kubernetes, vector databases, agentic development, and LLM observability tools.
Ownership: You treat every production deployment as your own. You race toward solving hard problems before the customer even sees them.
Reliability: You understand that in the public sector, a model failure may be a risk to public safety or privacy.
Customer communication: The ability to explain to a high-ranking official why the performance of the system has degraded and how we are fixing it.

_PLEASE NOTE: _Our policy requires a 90-day waiting period before reconsidering candidates for the same role. This allows us to ensure a fair and thorough evaluation of all applicants.

About Us:

_We comply with the United States Department of Labor's Pay Transparency provision. _

Principal AI Ops Architect, Gps

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Role Overview

You will:

Ideally, you have:

Role Overview

You will:

Ideally, you have: