Senior Machine Learning Engineer

Cognite · Industrial · India · Engineering

Senior Machine Learning Engineer role focused on building and deploying AI/ML models for industrial digitalization. The role involves designing, training, testing, and deploying models for document parsing, layout analysis, and entity matching, with a strong emphasis on production-grade code, robust APIs, and scalable infrastructure. Responsibilities include building ML models as software components, implementing CI/CD pipelines, optimizing inference, and monitoring deployed models. Experience with NLP, Vision-Language Models, MLOps, and cloud platforms is required.

What you'd actually do

Take product requirements and independently design, train, test, and deploy ML models (e.g., NLP, Vision-Language Models) for document parsing, layout analysis, and entity matching.
Write high-quality, scalable production code (Python). Wrap your ML models into robust RESTful or gRPC APIs and integrate them seamlessly into existing industrial master data systems and workflows.
Implement and maintain CI/CD pipelines for your models. Navigate complex deployment environments, manage containerized applications (Docker/Kubernetes), and optimize inference bottlenecks.
Work with Principal engineers to translate high-level system architectures into concrete, scalable data pipelines and production-ready microservices.
Design automated testing for ML pipelines. Monitor deployed models in production for data drift, latency, and accuracy, proactively implementing retraining strategies.

Skills

Required

Python
backend web frameworks (e.g., FastAPI, Flask, Django)
PyTorch, TensorFlow, Hugging Face Transformers, or LangChain
containerization (Docker, Kubernetes)
Linux environments
CI/CD tools (GitHub Actions, Jenkins)
cloud platforms (AWS, Azure, or GCP)
software architecture
data structures
algorithms
distributed data processing frameworks (e.g., Apache Spark, Ray, Dask)
orchestrating complex data workflows (e.g., Apache Airflow, Dagster, Prefect)

Nice to have

fine-tuning multimodal models
deploying multimodal models
RAG pipelines
Vector/Graph Databases
orchestrating multi-step reasoning using frameworks like LangGraph or AutoGen
advanced OCR
layout parsing
structured JSON extraction from complex industrial PDFs
Manufacturing/OT dataset experience
optimize LLMs/VLMs for inference latency (quantization, TensorRT, vLLM)
on-premises environments
modern data lake/lakehouse architectures (Delta Lake, Iceberg, Databricks)

What the JD emphasized

engineering-first role
production-grade code
robust APIs
complex infrastructure problems
ML models as software components
highly scalable architecture
maintainability, rigorous testing, and automated pipelines for reliable ML production
design, train, test, and deploy ML models
document parsing, layout analysis, and entity matching
Wrap your ML models into robust RESTful or gRPC APIs
integrate them seamlessly into existing industrial master data systems and workflows
Implement and maintain CI/CD pipelines for your models
optimize inference bottlenecks
Design automated testing for ML pipelines
Monitor deployed models in production
proactively implementing retraining strategies
backend web frameworks (e.g., FastAPI, Flask, Django) to serve models
frameworks like PyTorch, TensorFlow, Hugging Face Transformers, or LangChain
deploying models on cloud platforms (AWS, Azure, or GCP)
distributed data processing frameworks (e.g., Apache Spark, Ray, Dask)
orchestrating complex data workflows (e.g., Apache Airflow, Dagster, Prefect)
Independently define the technical ML/SWE approach for ambiguous feature requests and deliver to production
manage aspects like API rate limiting, asynchronous processing, and robust data routing around ML components
designing resilient systems against poor data quality
fine-tuning and deploying multimodal models
Built robust, production-grade RAG pipelines
orchestrating multi-step reasoning using frameworks like LangGraph or AutoGen
advanced OCR, layout parsing, and techniques for consistent structured JSON extraction
optimize LLMs/VLMs for inference latency (quantization, TensorRT, vLLM)

Other signals

building AI and data solutions
contextual AI for Industrial Operations
transforming unstructured, complex industrial data
leverage state-of-the-art Deep Learning, Generative AI, and Computer Vision
building the models and the surrounding infrastructure
parse complex industrial documents, extract multimodal entities, and interpret intricate engineering diagrams
engineering-first role
production-grade code
robust APIs
complex infrastructure problems
ML models as software components
highly scalable architecture
maintainability, rigorous testing, and automated pipelines for reliable ML production
design, train, test, and deploy ML models (e.g., NLP, Vision-Language Models)
document parsing, layout analysis, and entity matching
Wrap your ML models into robust RESTful or gRPC APIs
integrate them seamlessly into existing industrial master data systems and workflows
Implement and maintain CI/CD pipelines for your models
optimize inference bottlenecks
Design automated testing for ML pipelines
Monitor deployed models in production
proactively implementing retraining strategies
define ML capabilities
backend web frameworks (e.g., FastAPI, Flask, Django) to serve models
frameworks like PyTorch, TensorFlow, Hugging Face Transformers, or LangChain
deploying models on cloud platforms (AWS, Azure, or GCP)
distributed data processing frameworks (e.g., Apache Spark, Ray, Dask)
orchestrating complex data workflows (e.g., Apache Airflow, Dagster, Prefect)
Independently define the technical ML/SWE approach for ambiguous feature requests and deliver to production
manage aspects like API rate limiting, asynchronous processing, and robust data routing around ML components
designing resilient systems against poor data quality
fine-tuning and deploying multimodal models
Built robust, production-grade RAG pipelines
orchestrating multi-step reasoning using frameworks like LangGraph or AutoGen
advanced OCR, layout parsing, and techniques for consistent structured JSON extraction
optimize LLMs/VLMs for inference latency (quantization, TensorRT, vLLM)

Read full job description

What Cognite is: Relentless to achieve

Cognite operates at the forefront of industrial digitalization, building AI, and data solutions that solve the world’s hardest, highest-impact problems. With unmatched industrial heritage and a comprehensive suite of AI capabilities, including low-code AI agents, Cognite accelerates the digital transformation to drive operational improvements.

We thrive in challenges. We challenge assumptions. We execute with speed and ownership. If you view obstacles as signals to step forward - not backwards - you’ll feel right at home here.

Our Moonshot is bold: Unlock $100B in customer value by 2035, and redefine how global industry works. Join us in this venture where AI and data meet ingenuity, and together, we will forge the path to a smarter, more connected industrial future.

About the Opportunity

We are building the next generation of contextual AI for Industrial Operations. Our team focuses on transforming unstructured, complex industrial data—ranging from technical manuals to complex piping and instrumentation diagrams (P&IDs)—into structured, actionable intelligence. We leverage state-of-the-art Deep Learning, Generative AI, and Computer Vision to drive efficiency, safety, and operational excellence.

About the Role

As a Senior Machine Learning Engineer, you will be the engine of our contextualization initiatives, taking independent ownership of complex ML features from conception to production. You will bridge the gap between data science and software engineering, building the models and the surrounding infrastructure that parse complex industrial documents, extract multimodal entities, and interpret intricate engineering diagrams.

To be clear: this is an engineering-first role. We are not just looking for researchers to build isolated models; we need builders who write production-grade code, build robust APIs, and solve complex infrastructure problems. You will treat ML models as software components integrated into a highly scalable archite

How you’ll demonstrate Ownership** **

System-minded, focused on maintainability, rigorous testing, and automated pipelines for reliable ML production.
Thrives in ambiguity, independently defining the technical path, selecting tools judiciously, and driving solutions to completion.
Elevates team code quality through constructive reviews and informal mentorship, bridging the gap between research and production

The Impact you bring to Cognite

Key Responsibilities

Take product requirements and independently design, train, test, and deploy ML models (e.g., NLP, Vision-Language Models) for document parsing, layout analysis, and entity matching.
Write high-quality, scalable production code (Python). Wrap your ML models into robust RESTful or gRPC APIs and integrate them seamlessly into existing industrial master data systems and workflows.
Implement and maintain CI/CD pipelines for your models. Navigate complex deployment environments, manage containerized applications (Docker/Kubernetes), and optimize inference bottlenecks.
Work with Principal engineers to translate high-level system architectures into concrete, scalable data pipelines and production-ready microservices.
Design automated testing for ML pipelines. Monitor deployed models in production for data drift, latency, and accuracy, proactively implementing retraining strategies.
Collaborate effectively with product managers to define ML capabilities. Provide code reviews and informal technical mentorship to engineers.

Required Skills and Qualifications

Bachelor’s or Master's degree in Computer Science, Data Science, Software Engineering, or a related field.
6–10 years of industry experience in software engineering with a strong focus on machine learning, MLOps, and API development.
Strong programming skills in Python with experience using backend web frameworks (e.g., FastAPI, Flask, Django) to serve models.
Solid expertise in frameworks like PyTorch, TensorFlow, Hugging Face Transformers, or LangChain.
Hands-on experience with containerization (Docker, Kubernetes), Linux environments, CI/CD tools (GitHub Actions, Jenkins), and deploying models on cloud platforms (AWS, Azure, or GCP).
Solid understanding of software architecture, data structures, and algorithms to ensure performant code.
Hands-on experience with distributed data processing frameworks (e.g., Apache Spark, Ray, Dask) and orchestrating complex data workflows (e.g., Apache Airflow, Dagster, Prefect).

What Sets your Role Apart

Independently define the technical ML/SWE approach for ambiguous feature requests and deliver to production.
Recognize the model is a small part of the solution; expertly manage aspects like API rate limiting, asynchronous processing, and robust data routing around ML components.
Possess hands-on experience maintaining production ML models and designing resilient systems against poor data quality..

Preferred Qualifications

Proven experience fine-tuning and deploying multimodal models for visual data and engineering diagrams in live environments.
Built robust, production-grade RAG pipelines, expertly utilizing Vector/Graph Databases for high-accuracy entity retrieval.
Familiar with orchestrating multi-step reasoning using frameworks like LangGraph or AutoGen for real-world problem-solving.
Strong understanding of advanced OCR, layout parsing, and techniques for consistent structured JSON extraction from complex industrial PDFs. Manufacturing/OT dataset experience preferred.
Ability to optimize LLMs/VLMs for inference latency (quantization, TensorRT, vLLM), potentially in constrained, on-premises environments.
Experience with modern data lake/lakehouse architectures (Delta Lake, Iceberg, Databricks) for efficient querying and preprocessing of petabyte-scale unstructured data for LLM/VLM training and RAG.

Learn more about us

Impact 2025
Cognite's Industrial AI: Moonshot
We’re globally recognized domain experts with an international presence that spans Phoenix, Houston, Oslo Tokyo, Bengaluru, and Abu Dhabi.

Equal Opportunity

Cognite is committed to creating a diverse and inclusive environment at work and is proud to be an equal opportunity employer. All qualified applicants will receive the same level of consideration for employment.