Data Scientist

Intel Intel · Semiconductors · Penang, Malaysia

Data Scientist role focused on accelerating pre and post silicon validation using AI/ML. Responsibilities include designing and deploying ML algorithms and generative AI pipelines, architecting end-to-end AI systems (data pipelines, training, inference, MLOps), developing advanced AI models for debug efficiency, and applying LLMs/RAG for log summarization and triage automation. The role requires strong Python, ML framework, SQL, and software engineering skills, with preferred experience in validation environments, transformer models, LLM fine-tuning, and RAG.

What you'd actually do

  1. Lead AI/ML strategy for Post Silicon (post Si) validation by defining technical direction, model architectures, and data foundations that scale across products and sites.
  2. Architect and drive end to end AI systems: data pipelines, feature stores, training workflows, inference services, and MLOps governance.
  3. Develop and deploy advanced AI models (e.g., transformers for time series/logs, anomaly detection, root cause prediction, clustering) to accelerate debug and reduce TTR.
  4. Apply LLMs and RAG to automate triage, summarize complex logs, and recommend next debug steps using historical knowledge.
  5. Partner with validation, design, FW/BIOS, ATE, and product engineering teams to influence debug methodology and integrate AI insights into execution workflows.

Skills

Required

  • Python
  • ML frameworks (PyTorch/TensorFlow)
  • SQL
  • designing production grade ML systems
  • statistics
  • time series analysis
  • experiment design
  • algorithmic decision making
  • Git
  • testing
  • CI/CD
  • packaging
  • API design
  • cloud/on prem data stacks
  • cross team technical alignment
  • communicate clearly
  • influence technical partners

Nice to have

  • pre and post silicon validation/lab environments
  • hardware telemetry
  • debug artifacts
  • functional validation workflows and KPIs
  • transformer-based models
  • LLM fine tuning
  • RAG pipelines
  • domain specific model adaptation (LoRA/PEFT)
  • anomaly detection
  • root cause modeling
  • graph ML
  • large-scale triage automation
  • distributed compute (Spark/PySpark)
  • MLOps frameworks (MLflow, model registry)
  • containerization (Docker)
  • GenAI methods (search, summarization, triage)
  • dashboards (Power BI/Tableau)
  • scalable cross product reuse
  • synthetic data generation
  • bias/quality checks
  • model interpretability (e.g SHAP)

What the JD emphasized

  • production grade ML systems
  • LLM fine tuning
  • RAG pipelines

Other signals

  • design and deploy machine learning algorithms
  • generative AI-augmented analytics pipelines
  • architect and drive end to end AI systems
  • apply LLMs and RAG
  • partner with validation, design, FW/BIOS, ATE, and product engineering teams