Data Scientist, Network Fabric Engineering

Amazon Amazon · Big Tech · NSW, Australia +1 · Data Science

The Data Scientist will develop risk and reliability models for network availability using telemetry data, build operational analytics and dashboards, and design experiments to evaluate automation, including agentic systems. The role focuses on providing evidence for decisions, measuring outcomes of operational changes, and improving data quality for availability programs.

What you'd actually do

  1. Develop predictive risk and reliability models for network availability — using historical device failures, alarm telemetry, ticket data, and traffic signals to identify the devices, fabrics, and event types most likely to escalate
  2. Provide the evidence base behind program decisions: surface where availability is at risk, where automation is ready to expand, and where human engineering effort has the highest leverage
  3. Build operational analytics and dashboards (in Amazon QuickSight, Amazon CloudWatch, and Python) that our leaders use to track network health and the impact of the operational changes we are making
  4. Design and run experiments to evaluate the automation we are rolling out — including agentic systems that support engineers on incidents — comparing automated decisions against runbooks and human engineers, and measuring whether each rollout improved availability
  5. Improve the data quality and classification underlying our availability program — from event categorisation to root-cause attribution — so the metrics we report and the decisions we make rest on solid ground

Skills

Required

  • 1+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
  • 2+ years of data/research scientist, statistician or quantitative analyst in an internet-based company with complex and big data sources experience

Nice to have

  • Knowledge of statistical packages and business intelligence tools such as SPSS, SAS, S-PLUS, or R
  • Knowledge of machine learning concepts and their application to reasoning and problem-solving
  • Experience with clustered data processing (e.g., Hadoop, Spark, Map-reduce, and Hive)
  • Experience working with or evaluating AI systems
  • Experience applying quantitative analysis to solve business problems and making data-driven business decisions
  • Master's degree or equivalent in Science, Technology, Engineering, or Mathematics (STEM)

What the JD emphasized

  • agentic systems
  • evaluate the automation
  • measuring whether either is working
  • measure whether they delivered the outcomes we expected
  • measure whether it worked
  • make sure we measure whether it worked
  • outcome measurement
  • measuring whether each rollout improved availability

Other signals

  • agentic systems
  • automation
  • risk and reliability models
  • evaluations