Platform Engineer — Cloud Infrastructure (smts)

Salesforce Salesforce · Enterprise · Redwood City, CA

Salesforce is seeking a Senior Member of Technical Staff (SMTS) for their Platform Engineering team within Cloud Infrastructure. This role focuses on applying AI/ML solutions to infrastructure and operations problems, building intelligent, self-healing platform tools. The engineer will write core platform services in Go and Python, design multi-agent workflows for automation, build RAG systems for documentation, and act as an AI amplifier for the engineering organization. The primary focus is on integrating AI, LLMs, and autonomous agents into multi-cloud platform services to improve reliability, reduce toil, and enhance developer experience.

What you'd actually do

  1. Design, build, and operate platform services and infrastructure automation in Go and Python, embedding AI capabilities directly into the core platform software.
  2. Architect and implement intelligent, closed-loop automation systems (AIOps) that leverage LLMs and autonomous agents to detect anomalies, perform root-cause analysis, and execute self-healing remediation playbooks.
  3. Build and maintain Retrieval-Augmented Generation (RAG) applications over internal platform documentation, runbooks, and historical incident data to drastically reduce engineering MTTR.
  4. Develop custom tools, CLI plugins, and Model Context Protocol (MCP) integrations that connect our cloud infrastructure APIs to agentic coding tools (like Claude Code), turning standard automation into autonomous workflows.
  5. Partner with SRE, security, and platform specialists to identify highly repetitive operational work and build agentic solutions that delegate that toil to AI.

Skills

Required

  • 5+ years of professional experience in software engineering, platform engineering, or DevOps, with a recent, heavy focus on building and implementing AI solutions.
  • Strong understanding of core AI and ML concepts applied practically to software engineering, including LLM context window optimization, embedding models, semantic search, vector databases, and prompt engineering/tuning.
  • Experience building with agentic frameworks and LLM orchestration tooling to execute multi-step, autonomous tasks.
  • Good programming skills in Golang and Python, with the ability to build production-grade backend services, APIs, and microservices.
  • Solid fundamental knowledge of cloud-native infrastructure, with hands-on experience in Kubernetes and multi-cloud environments (AWS, Azure, GCP, or OCI).
  • Familiarity with continuous deployment and infrastructure-as-code concepts (GitOps with Flux/Argo CD, Pulumi, or Terraform).
  • Demonstrated agentic and automation mindset — you have a proven track record of using AI to automate complex workflows and can speak deeply on how you design AI systems to handle edge cases, tool-calling errors, and non-deterministic outputs.
  • Strong communication and collaboration skills, with a passion for teaching, raising the team’s AI literacy, and evangelizing AI solutions across engineering boundaries.

Nice to have

  • Hands-on experience building custom extensions, plugins, or Model Context Protocol (MCP) servers for agentic developer tools like Claude Code or GitHub Copilot.
  • Experience applying AI specifically to observability data (parsing logs, analyzing metrics, or correlating distributed traces) for predictive scaling or automated alerting.
  • Deep experience working with vector databases (e.g., Pinecone, Qdrant, Milvus, pgvector) inside platform applications.
  • Experience operating AI-driven tools within compliance-driven environments (FedRAMP, SOC 2), ensuring strong data privacy boundaries, LLM guardrails, and secure handling of sensitive cloud credentials.
  • Experience with internal developer platforms (IDPs), platform APIs, or building developer experience

What the JD emphasized

  • strong AI/ML software engineering expertise
  • applying AI solutions directly to infrastructure and operations problems
  • design multi-agent workflows to automate complex operational tasks
  • build RAG systems over engineering documentation
  • act as the core AI amplifier
  • architecting the intelligent systems that multiply the entire engineering organization's output
  • embedding AI capabilities directly into the core platform software
  • leverage LLMs and autonomous agents
  • agentic coding tools
  • build agentic solutions that delegate that toil to AI
  • proven track record of using AI to automate complex workflows
  • design AI systems to handle edge cases, tool-calling errors, and non-deterministic outputs

Other signals

  • AI/ML software engineering expertise
  • applying AI solutions directly to infrastructure and operations problems
  • design multi-agent workflows to automate complex operational tasks
  • build RAG systems over engineering documentation
  • architecting the intelligent systems that multiply the entire engineering organization's output