AI Engineering Platform Development Engineer

NVIDIA NVIDIA · Semiconductors · Yokneam, Israel

NVIDIA is seeking an AI Engineering Platform Development Engineer to design and build a core platform for autonomous AI agents, enabling agentic workflows to accelerate engineering development. The role involves developing the agent's reasoning engine, tool orchestration, runtime infrastructure, and microservices ecosystem, with a focus on reliability and observability. The position requires strong system-level understanding, experience with Kubernetes and microservices, and familiarity with LLM application development, RAG, and vector databases.

What you'd actually do

  1. Design and build the core platform that powers an autonomous AI agent — including its reasoning engine, tool orchestration, and the runtime infrastructure it operates on
  2. Develop and evolve the Micro-services ecosystem that gives the agent its capabilities — from knowledge retrieval and log analysis to code execution and workflow automation
  3. Own features end-to-end: from requirements analysis and architecture, through implementation, to production deployment and iteration based on real usage
  4. Instrument, evaluate, and improve the platform's reliability — build observability, track quality, and feed signals back into the system to make the agent more effective over time
  5. Collaborate with engineering teams across the organization to identify high-impact workflows and translate them into AI-assisted automation that boosts developer productivity

Skills

Required

  • B.Sc. in Computer Science, Computer Engineering, or a related field
  • Solid system-level understanding with experience designing and delivering production services
  • Ability to architect solutions, guide AI tools effectively, and reason about system behavior end-to-end
  • Familiarity with containerization and orchestration (Docker, Kubernetes)
  • Understanding of REST APIs, microservice architectures, and distributed systems
  • Ability to learn complex concepts in a fast-paced environment
  • A teammate with a can-do attitude, high energy, and excellent interpersonal skills

Nice to have

  • Familiarity with Kubernetes operators, Helm charts, and cluster management
  • Experience with LLM application development — prompt engineering, agentic frameworks (ReAct, tool-use), or RAG pipelines
  • Hands-on experience with FastAPI, async Python, or similar modern Python web frameworks
  • Experience with vector databases, semantic search, or embedding models
  • Knowledge of OAS (OpenAPI Specification), MCP (Model Context Protocol), and A2A (Agent-to-Agent) protocol ecosystem

What the JD emphasized

  • core platform that powers an autonomous AI agent
  • reasoning engine
  • tool orchestration
  • runtime infrastructure
  • Micro-services ecosystem
  • knowledge retrieval
  • log analysis
  • code execution
  • workflow automation
  • reliability
  • observability
  • developer productivity
  • agentic workflows
  • autonomous, long living agentic workflows

Other signals

  • AI agentic workflows
  • autonomous AI agent
  • reasoning engine
  • tool orchestration
  • runtime infrastructure
  • Micro-services ecosystem
  • knowledge retrieval
  • log analysis
  • code execution
  • workflow automation
  • reliability
  • observability
  • developer productivity
  • Python services
  • Kubernetes infrastructure
  • data stores
  • CI/CD pipelines
  • developer-facing tools
  • LLM application development
  • prompt engineering
  • agentic frameworks
  • ReAct
  • tool-use
  • RAG pipelines
  • vector databases
  • semantic search
  • embedding models