Ontologist II - Amz20452.4

Amazon Amazon · Big Tech · Santa Barbara, CA · Corporate Operations

This role focuses on building and maintaining scalable data pipelines, designing and implementing ontology structures, and enabling query grounding on knowledge graph systems. It involves using ETL software, programming languages, and LLMs for data cleaning, manipulation, and mapping. The role also includes analyzing pipeline performance, creating documentation, and performing quality assurance and root cause analysis on issues.

What you'd actually do

  1. Build and maintain scalable data pipelines using extract, transform, and load (ETL) software including Pentaho Data Integration, Amazon Business Data Technologies Cradle, and Amazon Knowledge Graph Data Lake to perfom data cleaning and manipulation on large-scale datasets.
  2. Design and build solutions by leveraging off the shelf services like AWS Glue; programming languages including Javascript, SQL, SparkSQL, and Python; custom made tools including Graphiq Imports and Data Lake S3 Crawler; and LLMs (Large Language Models) like Cedric Personas and LLM Batch Inference.
  3. Design and implement ontology structures that effectively represent a knowledge domain both conceptually in the real world and based on structured data while maintaining flexibility for future expansion.
  4. Own ontology review documents, host and actively participate in ontology discussions, submit Change Requests (CRs), and merge CRs in the ontology codebase.
  5. Use generative AI tooling, like Rapid Ontology Creation for KEs (ROCK), to automate ontology and data mapping processes, while integrating expertise at critical decision points.

Skills

Required

  • ETL software (Pentaho Data Integration, AWS Glue)
  • Programming languages (Javascript, SQL, SparkSQL, Python)
  • Data modeling
  • Ontology design and implementation
  • Knowledge graph systems (Neptune Graphs)
  • Cypher queries
  • Root cause analysis
  • Monitoring systems

Nice to have

  • LLMs (Cedric Personas, LLM Batch Inference)
  • Generative AI tooling (ROCK)
  • Jinja templates
  • Graphiq Imports
  • Data Lake S3 Crawler
  • Wikidata
  • FireTV
  • IMDb

What the JD emphasized

  • LLMs
  • generative AI tooling
  • ontology
  • data pipelines
  • knowledge graph

Other signals

  • LLMs
  • generative AI tooling
  • knowledge graph
  • data pipelines
  • ontology