Research Engineer / Scientist -ai for Databases

ByteDance ByteDance · Big Tech · San Jose, CA · Infrastructure

Research Engineer/Scientist focused on applying AI/ML to database management systems, including query optimization, indexing, and workload forecasting, with a goal of building AI-native data infrastructure and intelligent optimization. The role involves research and development, integrating models into production, and publishing findings.

What you'd actually do

  1. Conduct research and development in applying AI/ML techniques to database management systems.
  2. Develop intelligent algorithms for tasks such as query planning, indexing, storage management, and workload prediction/scheduling.
  3. Collaborate with data infrastructure and engineering teams to integrate AI models into production systems.
  4. Analyze large-scale datasets from database workloads to uncover optimization opportunities.
  5. Publish findings in top-tier conferences and journals (VLDB, SIGMOD, ICDE, NeurIPS, etc.).

Skills

Required

  • PhD in Computer Science, Data Science or a related field with a focus on databases, systems, or machine learning
  • Strong publication record in accredited venues (e.g., SIGMOD, VLDB, ICDE, NeurIPS, etc.) related to the AI4DB area
  • Strong background in database internals (e.g., PostgreSQL, MySQL, or any modern cloud-native databases or BigData platform)
  • Hands-on experience with machine learning frameworks (e.g. XGBoost, LightGBM, TensorFlow, PyTorch, scikit-learn)

Nice to have

  • Proficiency in Python, C++, or Java
  • Experience with cloud database platforms (AWS, GCP, Azure)
  • Strong analytical, problem-solving, and communication skills
  • Familiarity with LLM, reinforcement learning, neural architecture search, or automated database tuning

What the JD emphasized

  • Strong publication record in accredited venues (e.g., SIGMOD, VLDB, ICDE, NeurIPS, etc.) related to the AI4DB area.

Other signals

  • AI-native data infrastructure
  • intelligent infrastructure optimization
  • LLM-based developer tools
  • high-performance cache systems for distributed storage and LLM inference
  • applying AI/ML techniques to database management systems
  • intelligent algorithms for query planning, indexing, storage management, and workload prediction/scheduling
  • integrate AI models into production systems