Principal Engineer I - Cloud Observability

Confluent Confluent · Data AI · India · Remote · Engineering

Principal Engineer role focused on advancing Confluent's Cloud Observability features, ensuring quality, reliability, and scalability for customers. The role involves technical leadership, architecture, design, code review, mentoring, and driving the roadmap for observability capabilities in both cloud and hybrid environments. Requires extensive experience in building and operating large-scale distributed systems, with a strong emphasis on operational excellence and customer issue resolution.

What you'd actually do

  1. You will technically drive the advancements in an expanding Confluent Observability charter.
  2. You will design and build technology for this.
  3. You will architect and design solutions and review design implementations and code.
  4. You will be one of the leading technologists across the Confluent Observability organization.
  5. You will have the opportunity to work across the breadth and depth of technology in Confluent.

Skills

Required

  • 15+ years of hands-on software development experience
  • Experience building and operating large-scale systems
  • Solid understanding of basic systems operations (disk, network, operating systems, etc)
  • Experience running production services in the cloud
  • Strong fundamentals in distributed systems design and development
  • Solid fundamentals in concurrent and multi-threading programming
  • Ability to work effectively in teams
  • Proactively identifying the symptoms of technical issues and reason about their causes
  • Fixing the root causes
  • Timely shipping of deliverables
  • Ability to trade-off short term technical decisions with the long term
  • Ability to influence the team, peers and upper management in technology decisions
  • Degree in Computer Science, Engineering or equivalent experience
  • Ability to be pragmatic and trade off their usage in production
  • Ability to take on intense customer production issues on-call
  • Debugging and mitigating them
  • Patient log and metrics analysis with solid reasoning

Nice to have

  • Experience in designing and developing effective solutions for systems observability problems, including effective enablement of metrics, logging, events, or traces capabilities
  • Experience using and operating Apache Kafka, Apache Flink, Apache Druid, and OpenSearch
  • Interest in evangelism (giving talks at tech conferences, writing blog posts evangelizing Kafka)
  • Experience working on stream processing technology or query processing systems

What the JD emphasized

  • rock-solid reliability
  • 10x scale
  • open-ended strategy
  • explore and experiment with new ideas
  • 15+ years of hands-on software development experience
  • Taking ideas to production
  • ship the product to production
  • large-scale systems
  • running production services in the cloud
  • distributed systems design and development
  • concurrent and multi-threading programming
  • Proactively identifying the symptoms of technical issues and reason about their causes
  • fixing the root causes
  • Timely shipping of deliverables
  • trade-off short term technical decisions with the long term
  • Move fast, build in increments, and iterate
  • sense of urgency
  • mindset towards achieving results
  • excellent prioritization skills
  • intense customer production issues on-call
  • debugging and mitigating them
  • patient log and metrics analysis
  • solid reasoning to nail the issue