Remote -principal Software Developer- Agentic Ai, Healthcare AI

Oracle Oracle · Enterprise · United States

Principal Software Developer focused on Agentic AI and Healthcare AI, responsible for architecting, designing, developing, deploying, and operating production-grade AI systems powered by LLMs, agents, retrieval, and enterprise data. The role involves building agentic systems with tool use, memory, planning, orchestration, and workflow automation, as well as designing scalable RAG, search, retrieval, and ranking systems. Optimization for quality, latency, reliability, scalability, observability, and cost efficiency is critical, along with establishing evaluation frameworks. The role also includes building cloud-native services, APIs, SDKs, and distributed systems, and supporting asynchronous communication patterns and event-driven architectures. Collaboration with applied scientists, engineers, and product managers is expected, along with driving architecture decisions and mentoring engineers.

What you'd actually do

  1. Architect, design, develop, deploy, and operate production-grade AI systems powered by LLMs, agents, retrieval, and enterprise data.
  2. Build agentic AI systems that leverage tool use, memory, planning, orchestration, and workflow automation to solve complex business problems.
  3. Design and implement scalable RAG, search, retrieval, ranking, and knowledge systems across structured and unstructured data sources.
  4. Develop LLM-powered applications, including prompt and context engineering, tool integrations, model routing, guardrails, and workflow orchestration.
  5. Optimize AI systems for quality, latency, reliability, scalability, observability, and cost efficiency.

Skills

Required

  • Python
  • Java
  • distributed systems
  • concurrency
  • APIs
  • data structures
  • algorithms
  • testing
  • debugging
  • production operations
  • backend services
  • cloud-native systems
  • distributed applications
  • model-serving platforms
  • LLM ecosystems
  • prompt engineering
  • tool calling
  • retrieval-augmented generation
  • model serving
  • evaluation
  • deployment
  • search, retrieval, ranking, vector database, or enterprise knowledge systems
  • asynchronous communication patterns
  • message queues
  • pub/sub systems
  • data streaming platforms
  • event-driven architectures
  • containerized applications
  • Kubernetes-based deployments
  • technical judgment
  • ownership of complex technical problems

Nice to have

  • agentic AI systems
  • planning
  • tool orchestration
  • memory
  • autonomous workflows
  • multi-agent architectures
  • LangGraph
  • LangChain
  • LlamaIndex
  • Semantic Kernel
  • MCP
  • retrieval and ranking technologies
  • Elasticsearch
  • OpenSearch
  • vector databases
  • hybrid search
  • reranking
  • query understanding systems
  • production LLM serving platforms
  • latency
  • throughput
  • reliability
  • scalability
  • observability
  • cost efficiency
  • fine-tuning
  • LoRA
  • distillation
  • preference optimization
  • domain adaptation
  • evaluation systems
  • A/B testing frameworks
  • human-in-the-loop review processes
  • AI quality measurement platforms
  • OCI
  • AWS
  • Azure
  • GCP
  • Docker
  • Kubernetes
  • Kafka
  • Spark
  • tools, frameworks, APIs, or platforms used by applied scientists, data scientists, or machine learning engineers
  • AI systems in healthcare
  • enterprise SaaS
  • regulated environments
  • privacy-sensitive domains
  • technical leadership
  • architecture ownership
  • mentoring
  • cross-functional collaboration
  • delivery of high-impact initiatives

What the JD emphasized

  • production-grade AI systems
  • agentic AI systems
  • LLMs
  • RAG
  • search, retrieval, ranking
  • tool use
  • memory
  • planning
  • orchestration
  • workflow automation
  • prompt and context engineering
  • tool integrations
  • model routing
  • guardrails
  • production ML, AI, LLM, RAG, search, recommendation, conversational AI, or agentic AI systems at scale
  • distributed systems
  • production operations
  • production software
  • backend services
  • cloud-native systems
  • APIs
  • distributed applications
  • model-serving platforms
  • LLM ecosystems
  • prompt engineering
  • tool calling
  • retrieval-augmented generation
  • model serving
  • evaluation
  • deployment
  • search, retrieval, ranking, vector database, or enterprise knowledge systems
  • asynchronous communication patterns
  • message queues
  • pub/sub systems
  • data streaming platforms
  • event-driven architectures
  • containerized applications
  • Kubernetes-based deployments
  • production operation
  • planning
  • tool orchestration
  • memory
  • autonomous workflows
  • multi-agent architectures
  • retrieval and ranking technologies
  • vector databases
  • hybrid search
  • reranking
  • query understanding systems
  • production LLM serving platforms
  • latency
  • throughput
  • reliability
  • scalability
  • observability
  • cost efficiency
  • fine-tuning
  • LoRA
  • distillation
  • preference optimization
  • domain adaptation
  • evaluation systems
  • A/B testing frameworks
  • human-in-the-loop review processes
  • AI quality measurement platforms
  • regulated environments

Other signals

  • LLM
  • agents
  • RAG
  • production systems
  • enterprise data