Performance Engineer Intern, Deep Learn… at NVIDIA

What you'd actually do

Benchmark, profile, and analyze the performance of AI workloads specifically tailored for large-scale LLM training and inference, as well as High-Performance Computing (HPC) on NVIDIA supercomputers and distributed systems.

Aggregate and produce written reports with the testing data for internal sales, marketing, SW, and HW teams.

Develop Python scripts to automate the testing of various applications.

Collaborate with internal teams to debug and improve performance issues.

Assist with the development of tools and processes that improve our ability to perform automated testing.

Skills

Required

programming and debugging with scripting languages such as Python or Unix shell
Strong data analysis skills
summarize findings in a written report
Hands-on experience with Linux based systems
Familiarity using a container platform such as Docker or Singularity
Experience with compiling and running software from source code

Nice to have

CI/CD pipelines and modern DevOps practices
cloud provisioning and scheduling tools (Kubernetes, SLURM)
Curiosity about GPUs, TPUs, cloud and performance benchmarking
Familiar with ML/DL techniques, algorithms and frameworks like TensorFlow or PyTorch
Experience in AI model inference deployment and training launching
Background of system-level problem solving

We are now looking for a Performance Engineer Intern to support our growing investments in perf testing of various company datacenter products and applications. Today, NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world, all while striving to deliver the highest possible performance of our products.

You will be part of global Performance Lab team, improving our capacity to expertly and accurately benchmark state-of-the-art datacenter applications and products. We also work to develop infrastructures and solutions that enhance the team’s ability to gather data through automation and designing efficient processes for testing a wide variety of applications and hardware. The data that we collect drives marketing/sales collaterals as well as engineering studies for future products. You will have the opportunity to work with multi-functional teams and in a dynamic environment where multiple projects will be active at once and priorities may shift frequently.

What you’ll be doing:

Benchmark, profile, and analyze the performance of AI workloads specifically tailored for large-scale LLM training and inference, as well as High-Performance Computing (HPC) on NVIDIA supercomputers and distributed systems.
Aggregate and produce written reports with the testing data for internal sales, marketing, SW, and HW teams.
Develop Python scripts to automate the testing of various applications.
Collaborate with internal teams to debug and improve performance issues.
Assist with the development of tools and processes that improve our ability to perform automated testing.
Setup and configure systems with appropriate hardware and software to run benchmarks.

What we need to see:

Currently pursuing a bachelor's degree (or higher) in Computer Science, Electrical Engineering, or a related field.
Experienced in programming and debugging with scripting languages such as Python or Unix shell.
Strong data analysis skills and the ability to summarize findings in a written report.
Hands-on experience with Linux based systems. Familiarity using a container platform such as Docker or Singularity. Experience with compiling and running software from source code.
Good English verbal and written skills to improve collaboration with coworkers.
Fast and self-learning capabilities.

Ways to stand out from the crowd:

Experience with CI/CD pipelines and modern DevOps practices. Familiar with cloud provisioning and scheduling tools (Kubernetes, SLURM).
Curiosity about GPUs, TPUs, cloud and performance benchmarking.
Familiar with ML/DL techniques, algorithms and frameworks like TensorFlow or PyTorch. Experience in AI model inference deployment and training launching.
Background of system-level problem solving.

We have some of the most forward thinking and hardworking people in the world working for us and our best-in-class engineering teams are rapidly growing. We are building a team that will help shape the future of data center computing. If you are passionate about new technologies, care about improving efficiency and quality, and want to be at the forefront of AI & HPC & Gaming, we would love for you to join us.