Research Engineer / Scientist -ai for Databases

ByteDance ByteDance · Big Tech · Seattle, WA · Infrastructure

Research Engineer/Scientist role focusing on applying AI/ML to database management systems, including query optimization, indexing, workload forecasting, and developing self-managing databases. The role involves research and development, integrating AI models into production systems, analyzing large datasets, and publishing findings. Requires a PhD and strong publication record in AI/databases/systems, with experience in database internals and ML frameworks.

What you'd actually do

  1. Conduct research and development in applying AI/ML techniques to database management systems.
  2. Develop intelligent algorithms for tasks such as query planning, indexing, storage management, and workload prediction/scheduling.
  3. Collaborate with data infrastructure and engineering teams to integrate AI models into production systems.
  4. Analyze large-scale datasets from database workloads to uncover optimization opportunities.
  5. Publish findings in top-tier conferences and journals (VLDB, SIGMOD, ICDE, NeurIPS, etc.).

Skills

Required

  • PhD in Computer Science, Data Science or a related field with a focus on databases, systems, or machine learning
  • Strong publication record
  • Database internals
  • Machine learning frameworks

Nice to have

  • Python
  • C++
  • Java
  • cloud database platforms
  • LLM
  • reinforcement learning
  • neural architecture search
  • automated database tuning

What the JD emphasized

  • Strong publication record in accredited venues (e.g., SIGMOD, VLDB, ICDE, NeurIPS, etc.) related to the AI4DB area.
  • Strong background in database internals (e.g., PostgreSQL, MySQL, or any modern cloud-native databases or BigData platform).
  • Hands-on experience with machine learning frameworks (e.g. XGBoost, LightGBM, TensorFlow, PyTorch, scikit-learn).

Other signals

  • AI for Databases
  • intelligent infrastructure optimization
  • LLM-based developer tools
  • advanced VectorDBs
  • multi-modal databases