Applied Scientist, Selection Monitoring

Amazon Amazon · Big Tech · IN, KA, Bengaluru · Applied Science

This role focuses on developing and deploying advanced ML/AI technologies for catalog expansion, including information extraction, website comprehension, and agentic systems for multi-step decision-making. It involves working with large-scale data, deep learning, NLP, and image processing to extract and structure information from various document types, with an emphasis on scalable solutions and leveraging recent advances in RL-based fine-tuning methods.

What you'd actually do

  1. Use AI, NLP and advances in LLMs/SLMs and agentic systems to create scalable solutions for business problems.
  2. Efficiently Crawl web, Automate extraction of relevant information from large amounts of Visually Rich Documents and optimize key processes.
  3. Design, develop, evaluate and deploy, innovative and highly scalable ML models, esp. leveraging latest advances in RL-based fine tuning methods like DPO, GRPO etc.
  4. Work closely with software engineering teams to drive real-time model implementations.
  5. Establish scalable, efficient, automated processes for large scale model development, model validation and model maintenance.

Skills

Required

  • Java
  • C++
  • Python
  • SQL
  • RDBMS
  • Data Warehouse

Nice to have

  • Experience implementing algorithms using both toolkits and self-developed code
  • Publications at top-tier peer-reviewed conferences or journals

What the JD emphasized

  • responsibility for success of the system
  • agents to take multi-step decisions
  • depth and breadth of knowledge in text mining, information extraction from Visually Rich Documents, semi structured data (HTML) and advanced machine learning
  • programming and design skills to manipulate Semi-Structured and unstructured data and systems that work at internet scale
  • Scale (build models to handle billions of pages)
  • Accuracy (requirements for precision and recall)
  • Speed (generate predictions for millions of new or changed pages with low latency)
  • Diversity (models need to work across different languages, market places and data sources)
  • Build a scalable system which can algorithmically extract information from world wide web.
  • Intelligently cluster web pages, segment and classify regions, extract relevant information and structure the data available on semi-structured web.
  • Build systems that will use existing Knowledge Base to perform open information extraction at scale from visually rich documents.
  • latest advances in RL-based fine tuning methods like DPO, GRPO etc.

Other signals

  • Develop advanced ML/AI technologies
  • Develop ML models for website comprehension and agents to take multi-step decisions
  • Build systems that will use existing Knowledge Base to perform open information extraction at scale from visually rich documents
  • Use AI, NLP and advances in LLMs/SLMs and agentic systems to create scalable solutions for business problems
  • Design, develop, evaluate and deploy, innovative and highly scalable ML models, esp. leveraging latest advances in RL-based fine tuning methods like DPO, GRPO etc.