Software Development Engineer Ii, Rds Telemetry Platform

Amazon Amazon · Big Tech · Redmond, WA · Software Development

Software Development Engineer II on the RDS Telemetry Platform team at Amazon. This role involves building and operating infrastructure for collecting, ingesting, and routing database telemetry across the RDS fleet. The team develops automated engines using traditional and generative-AI techniques to detect performance anomalies, diagnose root causes, and provide tuning recommendations. The role requires developing products that capture and store telemetry, and exploring solutions to leverage this data for database performance optimization and security threat detection. It emphasizes solving complex engineering challenges in distributed systems at massive scale, with a focus on reliability, efficiency, and fault tolerance. Experience with ML/LLM fundamentals is preferred.

What you'd actually do

  1. You will be responsible for developing a product that captures and stores telemetry from Aurora and RDS databases and surfaces them to our customers through multiple downstream products.
  2. You will also be responsible for exploring and building innovative solutions that leverage this telemetry to help customers troubleshoot and optimize database performance and detect security threats.
  3. This product operates on millions of instances, and helps our customers monitor and tune the performance and security of their databases through timely and relevant metrics, as well as recommendations.
  4. This is an area that requires solving the hardest engineering challenges in distributed systems at massive scale.
  5. As a customer-facing AWS service, our solutions have to be super solid, scalable, efficient, and highly fault tolerant.

Skills

Required

  • 3+ years of non-internship professional software development experience
  • 2+ years of programming using a modern programming language such as Java, C++, or C#, including object-oriented design experience
  • 1+ years of contributing to new and current systems architecture and design (architecture, design patterns, reliability and scaling) experience
  • Bachelor's degree or equivalent
  • Knowledge of professional software engineering & best practices for full software development life cycle, including coding standards, software architectures, code reviews, source control management, continuous deployments, testing, and operational excellence

Nice to have

  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Knowledge of Machine Learning and LLM fundamentals, including transformer architecture, training/inference lifecycles, and optimization techniques
  • Experience in debugging, profiling, and implementing software engineering best practices in large-scale systems

What the JD emphasized

  • highly-skilled, experienced, and motivated software engineer
  • hardest engineering challenges in distributed systems at massive scale
  • super solid, scalable, efficient, and highly fault tolerant

Other signals

  • develops automated engines that detect performance anomalies, diagnose root causes, and deliver tuning recommendations using both traditional and generative-AI techniques
  • exploring and building innovative solutions that leverage this telemetry to help customers troubleshoot and optimize database performance and detect security threats
  • Knowledge of Machine Learning and LLM fundamentals, including transformer architecture, training/inference lifecycles, and optimization techniques