Sr Research Engineer, Computer Vision

Autodesk Autodesk · Enterprise · London, United Kingdom +2

Senior Software Engineer focused on Computer Vision and Multimodal AI to build perception and understanding systems. Develops end-to-end pipelines combining vision models with multimodal reasoning and contextual signals, blending applied research with production-minded implementation for cloud-scale batch processing and interactive workflows.

What you'd actually do

  1. Design, build, and improve multi-stage computer vision pipelines that may include segmentation, detection, tracking, and VLM-based analysis, producing structured outputs (entities, attributes, actions/events, confidence, provenance)
  2. Build systems that handle real-world variability in visual inputs (for example: low resolution, poor lighting, motion blur, בכל scenes, inconsistent capture devices)
  3. Fuse visual evidence with contextual inputs such as metadata, documents, and sensor streams to improve recognition quality and reduce ambiguity
  4. Evaluate and integrate state-of-the-art vision and vision-language foundation models, including open-vocabulary recognition, grounded perception, segmentation, and multimodal reasoning
  5. Apply fine-tuning or adaptation approaches when needed; partner with ML teams on training, data strategy, and infrastructure best practices

Skills

Required

  • Python
  • deep learning for computer vision
  • PyTorch
  • ML prototypes into reliable pipelines
  • evaluation
  • monitoring
  • failure analysis
  • cloud or backend workflows
  • batch processing

Nice to have

  • vision-language models (VLMs)
  • multimodal systems
  • grounded vision
  • open-vocabulary recognition
  • retrieval-augmented multimodal reasoning
  • multimodal fusion
  • video pipelines
  • tracking
  • temporal aggregation
  • long-video processing
  • real-world datasets
  • data curation
  • labelling strategy
  • augmentation
  • quality control
  • limited data constraints
  • reusable platform components

What the JD emphasized

  • 4+ years of experience building computer vision systems using Python
  • Strong experience with deep learning for computer vision (detection, segmentation, and/or video understanding) using modern frameworks such as PyTorch
  • Experience taking ML prototypes into reliable pipelines, including evaluation, monitoring, and failure analysis
  • Experience building or integrating ML systems into cloud or backend workflows (batch processing and/or services)

Other signals

  • end-to-end pipelines
  • multimodal reasoning
  • production-minded implementation
  • cloud-scale batch processing
  • interactive workflows