Senior AI and Mlops Engineer - Security and Networking Research

NVIDIA NVIDIA · Semiconductors · Tel Aviv, Israel +3

Senior AI/MLOps Engineer focused on building and maintaining infrastructure, tools, and processes for the AI lifecycle in a production environment, specifically for security and networking AI models and agents. The role involves optimizing models, deploying agentic systems and LLMs, designing training/inference pipelines, and collaborating with various engineering teams.

What you'd actually do

  1. Developing, improving and optimizing scalable infrastructure for handling and deploying security and networking AI models and agents in production, ensuring high availability, scalability, reproducibility, and performance.
  2. Optimizing AI models and agents for performance, scalability, and resource utilization, considering factors such as latency, efficiency, and cost.
  3. Monitoring and deploying agentic systems, LLMs, and ML models in production.
  4. Designing and implementing frameworks/pipelines for AI training, inference, and experimentation.
  5. Collaborating closely with data scientists, security architects and software engineers to operationalize and deploy AI models and agents, including packaging and integration with existing systems. Participate in developing and reviewing code, design documents, use case reviews, and test plan reviews.

Skills

Required

  • Python
  • Java
  • Scala
  • TensorFlow
  • PyTorch
  • microservices architecture
  • container orchestration
  • cloud platforms
  • scalable infrastructure
  • inference optimization
  • CI/CD tools
  • GitLab
  • GitHub Actions
  • Jenkins

Nice to have

  • network protocols
  • Linux internals
  • security protocols
  • network architectures
  • firewalls
  • intrusion detection systems
  • generative models
  • network security principles

What the JD emphasized

  • at least 5 years of experience
  • deploying and monitoring AI/ML models, LLMs and agents to production systems at scale
  • Proficiency in microservices architecture, container orchestration, cloud platforms, and scalable infrastructure for training and inference workloads
  • Knowledge of inference optimization techniques

Other signals

  • MLOps
  • production deployment
  • agent development
  • infrastructure