Architect

Suki AI Suki AI · Vertical AI · Suki, India · Engineering

Software Architect role focused on maintaining architectural coherence for an AI voice solution in healthcare, specifically Suki Assistant. The role involves owning the load-bearing substrate, driving structural decomposition, bridging AI research and production infrastructure, and mentoring engineers. Key responsibilities include managing ML inference, GPU fleet operations, agentic framework direction, and ensuring system resilience at scale.

What you'd actually do

  1. Maintain architectural coherence across a complex and wide surface area.
  2. Own the load-bearing substrate through future growth and failure scenarios. The system must survive several multiples of current enterprise scale and every class of infrastructure event we can name.
  3. Drive the structural decomposition of platform and domain boundaries, service boundaries across the microservice portfolio, agentic framework direction, auth and access-control architecture, and cross-cutting concerns like observability and rate limiting.
  4. Bridge AI research and production infrastructure. Translate LLM and model requirements into efficient, production-grade services. Lead the agentic framework direction — memory architectures, tool orchestration, A2A patterns, semantic caching. Operate a GPU fleet at high duty cycle and understand dynamic batching, cross-session batching tradeoffs, and HPA calibration for GPU workloads.
  5. Document architectural patterns and deviations. Write Architectural Design Records, RFCs, and breaking-points documents that stand alone for readers unfamiliar with the context. Document deviations from established patterns with their reasons — not as failures, but as data points.

Skills

Required

  • Go or Python
  • real-time streaming systems
  • offline/batch architectures
  • GPU inference infrastructure
  • LLM APIs
  • agent framework design
  • memory architectures
  • tool orchestration
  • observability
  • rate limiting
  • client-side implications across native mobile, web, SDK, and browser extension

Nice to have

  • healthcare-adjacent technology

What the JD emphasized

  • proven track record operating at an Architect, Staff, or Principal level
  • Expert-level proficiency in Go or Python
  • Production ownership of a real-time streaming system
  • Experience operating multi-TB storage systems with active OLTP traffic
  • Hands-on experience with offline/batch architectures
  • Experience designing and operating GPU inference infrastructure at scale
  • Experience integrating and scaling systems using LLM APIs or foundation models

Other signals

  • architectural coherence
  • agentic capabilities
  • ML inference footprint
  • GPU fleet operation
  • LLM APIs
  • agent framework design