About the role
The Senior ML Engineer (MLOps) owns the transition from ad-hoc ML deployments to a registered, monitored, governed ML platform — the lifecycle every data scientist and ML practitioner across the company uses. The role also curates a wrapper layer over open-source classical ML libraries (forecasting, causal, recommender, tabular) so retail algorithms ship on company-standard adapters rather than per-team reinventions.
Key Responsibilities:
- Define and document the end-to-end MLOps lifecycle (experiment → train → register → approve → deploy → monitor → retrain) and enforce it via CI/CD gates.
- Stand up and operate the Model Registry (MLflow / Databricks Unity Catalog Models) as the single source of truth; ensure 100% of production models are registered, versioned, and tagged with model cards.
- Implement data drift, prediction drift, and performance-degradation monitoring with appropriate alerting and retraining triggers.
- Lead build / buy evaluation for a Feature Store; deploy a POC and eliminate train / serve skew end-to-end.
- Audit existing production ML models; register, document, and migrate each into the standard lifecycle; retire or consolidate models that cannot be justified.
- Curate and own a company-standard wrapper layer over open-source classical ML libraries (Prophet, statsmodels, DoWhy / EconML, LightFM, scikit-learn, XGBoost, LightGBM) with standard interfaces, lineage hooks, eval-harness integration, and CI/CD templates.
- Partner with Data Governance on the model-governance gate in the deployment pipeline; support audit and compliance evidence.
- Mentor data scientists on engineering discipline (reproducibility, lineage, rollback) and lead incident response for degraded production models.
Requirements
- Bachelor's or Master's degree in Computer Science, Statistics, Applied Mathematics, or a related discipline.
- 5+ years building and operating ML systems in production (not only notebooks).
- Deep MLOps experience: model registry, experiment tracking, CI/CD for training and serving, versioning, approval gates.
- Built or operated drift detection for data and predictions in production; understands the difference and the right alert thresholds.
- Strong Python and Spark / PySpark; SQL fluency; cloud and Databricks (or equivalent lakehouse) production experience.
- Comfortable designing train / serve parity patterns and feature pipelines.
- Experience with at least one major MLOps stack (MLflow, Kubeflow, Vertex AI, SageMaker).
- Can write runbooks, lead incident response, and translate business KPIs into model SLOs.
Preferred Qualifications
- Feature store production experience (Databricks Feature Store, Feast, Tecton).
- Retail ML use cases — demand forecasting, pricing optimisation, assortment, recommender, churn, uplift modelling.
- Causal inference and experimentation (A/B, switchback, geo-experiments) using DoWhy or EconML.
- Vendor or industry certifications such as Databricks Machine Learning Professional or Azure AI Engineer Associate.