About the role
We are seeking an experienced Senior Data Engineer to own the pipeline-standardization and data-quality program for the enterprise lakehouse. This role ships compliance gates that block non-compliant deployments, stands up the data-quality framework, builds the dashboards business users trust, and drives measurable reductions in data incidents across the retail data estate.
Key Responsibilities:
- Design, ship, and operate a pipeline-compliance checker that validates naming, metadata, config schema, DQ-rule declarations, and cluster-policy reference on every new deployment.
- Deploy a data-quality framework (Great Expectations, Databricks DQ Rules, or equivalent) across new production pipelines; build a domain onboarding template; configure alert routing by severity.
- Build and publish the Data Quality Dashboard — quality health by domain, source, table; near-real-time refresh; freshness, completeness, accuracy.
- Establish Source Change Management agreements with key source systems (SLA contracts, change-request process, automated schema-change alerting); map source lineage end-to-end.
- Lead the migration playbook to bring the legacy pipeline estate to standard; mentor engineers executing migration; own the playbook, not every migration.
- Drive data-incident reduction through prevention (compliance gate, DQ framework, DCM, lineage), not reactive firefighting; lead incident response and post-mortems for major DQ failures.
- Partner with platform engineering on Event Stream domain-event schemas and data-product contracts.
- Author runbooks, code review at senior level, and contribute to engineering culture.
Requirements
- Bachelor's degree in Computer Science, Data Engineering, or a related discipline.
- 5+ years designing, building, and operating production data pipelines on a major lakehouse or warehouse (Databricks, Snowflake, BigQuery).
- Strong PySpark and SQL; understands Spark performance tuning at production scale.
- Deep experience with data-quality frameworks (Great Expectations, dbt tests, Soda, Monte Carlo) — has defined SLAs, set thresholds, tuned alert noise.
- Built and operated medallion / multi-layer lakehouse architectures with explicit transformation layers.
- Solid Git / CI experience for data code; opinions on testing data transformations.
- Comfortable defining and enforcing standards (naming, partitioning, retention, PII tagging) and reviewing PRs against them.
- Cloud platform experience (Azure preferred; AWS / GCP transferable).
Preferred Qualifications
- Streaming experience (Spark Structured Streaming, Delta Live Tables, Flink, Kafka Streams).
- Data modeling discipline (Kimball, Data Vault 2.0) with clear rationale; Unity Catalog production experience (lineage, tags, RLS).
- Retail data exposure — POS, inventory, replenishment, loyalty — and BI optimization for Power BI consumption.
- Vendor certifications such as Databricks Data Engineer Professional or Azure Data Engineer Associate.