ROLE Summary
We are seeking a highly skilled ML Engineer to design, implement and operate scalable, secure, and production grade machine learnings platforms on Databricks. The role focused on enabling reliable model development, deployment, monitoring, and lifecycle management across large-scale AI workloads.
Your role will also include overseeing, supervising and reviewing tasks performed by team members to ensure effective execution of work; managing end‑to‑end processes and projects for both internal and external clients with responsibility for timely and accurate delivery; issuing clear instructions and directions to team members on tasks to be performed; and mentoring and guiding junior colleagues to support their skill development, professional growth, and overall success
Key Responsibilities
- Design, implement, and operate a scalable, production-grade machine learning platform on Databricks.
- Enable end-to-end ML lifecycle management including experimentation, model versioning, deployment, and monitoring.
- Build and maintain standardized automation frameworks for ML workflows using CI/CD best practices.
- Implement governed experiment tracking, model registry, and artifact management to ensure reproducibility and auditability.
- Deploy and operate production model inference solutions supporting real-time and batch workloads.
- Establish monitoring and observability for deployed models, including performance, data quality, and drift indicators.
- Enable shared and governed feature management capabilities to support reuse across ML use cases.
- Apply centralized governance, access control, and lineage for data, features, and models.
- Optimize ML workloads for scalability, cost efficiency, reliability, and security.
- Provide operational support, maintenance, and continuous improvement for production ML systems.
Must-Have Skills
- Strong experience in MLOps, ML Engineering.
- Hands-on expertise with Azure Databricks for ML training and execution.
- Solid experience with MLflow (experiment tracking, model registry, artifact management).
- Strong understanding of Unity Catalog for data, feature, and model governance.
- Experience deploying and managing model serving / inference endpoints.
- Experience with containerization (Docker) and ML deployment workflows.
- Knowledge of model monitoring, performance tracking, and data / concept drift.
- Strong understanding of Databricks architecture.
- Proficient in programming with Python, SQL, and PySpark.
Good-to-Have Skills
- Experience with Databricks Feature Store or equivalent feature management platforms.
- Experience with governance, compliance, and auditability in regulated environments.
- Familiarity with cost optimization strategies for large-scale ML workloads.
- Knowledge of blue/green, canary, or Champion Challenger deployments for ML models.