Senior Machine Learning Engineer

Cognite Cognite · Industrial · India · Engineering

Senior Machine Learning Engineer role focused on building and deploying AI/ML models for industrial digitalization. The role involves designing, training, testing, and deploying models for document parsing, layout analysis, and entity matching, with a strong emphasis on production-grade code, robust APIs, and scalable infrastructure. Responsibilities include building ML models as software components, implementing CI/CD pipelines, optimizing inference, and monitoring deployed models. Experience with NLP, Vision-Language Models, MLOps, and cloud platforms is required.

What you'd actually do

  1. Take product requirements and independently design, train, test, and deploy ML models (e.g., NLP, Vision-Language Models) for document parsing, layout analysis, and entity matching.
  2. Write high-quality, scalable production code (Python). Wrap your ML models into robust RESTful or gRPC APIs and integrate them seamlessly into existing industrial master data systems and workflows.
  3. Implement and maintain CI/CD pipelines for your models. Navigate complex deployment environments, manage containerized applications (Docker/Kubernetes), and optimize inference bottlenecks.
  4. Work with Principal engineers to translate high-level system architectures into concrete, scalable data pipelines and production-ready microservices.
  5. Design automated testing for ML pipelines. Monitor deployed models in production for data drift, latency, and accuracy, proactively implementing retraining strategies.

Skills

Required

  • Python
  • backend web frameworks (e.g., FastAPI, Flask, Django)
  • PyTorch, TensorFlow, Hugging Face Transformers, or LangChain
  • containerization (Docker, Kubernetes)
  • Linux environments
  • CI/CD tools (GitHub Actions, Jenkins)
  • cloud platforms (AWS, Azure, or GCP)
  • software architecture
  • data structures
  • algorithms
  • distributed data processing frameworks (e.g., Apache Spark, Ray, Dask)
  • orchestrating complex data workflows (e.g., Apache Airflow, Dagster, Prefect)

Nice to have

  • fine-tuning multimodal models
  • deploying multimodal models
  • RAG pipelines
  • Vector/Graph Databases
  • orchestrating multi-step reasoning using frameworks like LangGraph or AutoGen
  • advanced OCR
  • layout parsing
  • structured JSON extraction from complex industrial PDFs
  • Manufacturing/OT dataset experience
  • optimize LLMs/VLMs for inference latency (quantization, TensorRT, vLLM)
  • on-premises environments
  • modern data lake/lakehouse architectures (Delta Lake, Iceberg, Databricks)

What the JD emphasized

  • engineering-first role
  • production-grade code
  • robust APIs
  • complex infrastructure problems
  • ML models as software components
  • highly scalable architecture
  • maintainability, rigorous testing, and automated pipelines for reliable ML production
  • design, train, test, and deploy ML models
  • document parsing, layout analysis, and entity matching
  • Wrap your ML models into robust RESTful or gRPC APIs
  • integrate them seamlessly into existing industrial master data systems and workflows
  • Implement and maintain CI/CD pipelines for your models
  • optimize inference bottlenecks
  • Design automated testing for ML pipelines
  • Monitor deployed models in production
  • proactively implementing retraining strategies
  • backend web frameworks (e.g., FastAPI, Flask, Django) to serve models
  • frameworks like PyTorch, TensorFlow, Hugging Face Transformers, or LangChain
  • deploying models on cloud platforms (AWS, Azure, or GCP)
  • distributed data processing frameworks (e.g., Apache Spark, Ray, Dask)
  • orchestrating complex data workflows (e.g., Apache Airflow, Dagster, Prefect)
  • Independently define the technical ML/SWE approach for ambiguous feature requests and deliver to production
  • manage aspects like API rate limiting, asynchronous processing, and robust data routing around ML components
  • designing resilient systems against poor data quality
  • fine-tuning and deploying multimodal models
  • Built robust, production-grade RAG pipelines
  • orchestrating multi-step reasoning using frameworks like LangGraph or AutoGen
  • advanced OCR, layout parsing, and techniques for consistent structured JSON extraction
  • optimize LLMs/VLMs for inference latency (quantization, TensorRT, vLLM)

Other signals

  • building AI and data solutions
  • contextual AI for Industrial Operations
  • transforming unstructured, complex industrial data
  • leverage state-of-the-art Deep Learning, Generative AI, and Computer Vision
  • building the models and the surrounding infrastructure
  • parse complex industrial documents, extract multimodal entities, and interpret intricate engineering diagrams
  • engineering-first role
  • production-grade code
  • robust APIs
  • complex infrastructure problems
  • ML models as software components
  • highly scalable architecture
  • maintainability, rigorous testing, and automated pipelines for reliable ML production
  • design, train, test, and deploy ML models (e.g., NLP, Vision-Language Models)
  • document parsing, layout analysis, and entity matching
  • Wrap your ML models into robust RESTful or gRPC APIs
  • integrate them seamlessly into existing industrial master data systems and workflows
  • Implement and maintain CI/CD pipelines for your models
  • optimize inference bottlenecks
  • Design automated testing for ML pipelines
  • Monitor deployed models in production
  • proactively implementing retraining strategies
  • define ML capabilities
  • backend web frameworks (e.g., FastAPI, Flask, Django) to serve models
  • frameworks like PyTorch, TensorFlow, Hugging Face Transformers, or LangChain
  • deploying models on cloud platforms (AWS, Azure, or GCP)
  • distributed data processing frameworks (e.g., Apache Spark, Ray, Dask)
  • orchestrating complex data workflows (e.g., Apache Airflow, Dagster, Prefect)
  • Independently define the technical ML/SWE approach for ambiguous feature requests and deliver to production
  • manage aspects like API rate limiting, asynchronous processing, and robust data routing around ML components
  • designing resilient systems against poor data quality
  • fine-tuning and deploying multimodal models
  • Built robust, production-grade RAG pipelines
  • orchestrating multi-step reasoning using frameworks like LangGraph or AutoGen
  • advanced OCR, layout parsing, and techniques for consistent structured JSON extraction
  • optimize LLMs/VLMs for inference latency (quantization, TensorRT, vLLM)