Companies Veepee SRE - DataPlatform

About the role

Veepee · Hybrid

Join a transversal SRE community embedded in a product-oriented Data Platform team of 40–50 engineers, analysts, and data scientists across France and Spain. You'll drive the reliability and scalability of a next-generation Lakehouse platform — anchored on Trino, Iceberg, and on-prem object storage — while leading the transition from public cloud to a resilient hybrid/on-prem architecture.

🎯 TASKS

Platform Reliability & SRE foundations 

  • Own reliability of core data services: Trino, Iceberg, S3 / Ceph, Kafka, Kafka Connect, Schema Registry
  • Define and enforce SLIs/SLOs, error budgets, and on-call runbooks — solid SRE foundations are non-negotiable
  • Build full-stack observability with Prometheus and Grafana: metrics, dashboards, alerting pipelines, and anomaly detection
  • Manage and harden PostgreSQL clusters via Patroni for high-availability control-plane services
  • Kafka ecosystem — Connect & Schema governance

  • Operate and scale Kafka Connect clusters: connector lifecycle, offset management, dead-letter queues, and task rebalancing
  • Maintain the Schema Registry as the single source of truth for Avro/Protobuf/JSON schemas — enforce compatibility rules and schema evolution policies
  • Monitor consumer lag, connector throughput, and broker health via Prometheus JMX exporters and Grafana dashboards
  • Ensure end-to-end data contract integrity between producers and Iceberg/S3 consumers
  • Kubernetes, Kube-in-Kube & Crossplane

  • Operate production Kubernetes clusters (GKE/EKS + on-prem) — capacity planning, upgrades, PodDisruptionBudgets, resource quotas
  • Architect and manage Kube-in-Kube topologies to provide strong tenant isolation for data platform workloads — each team gets a dedicated virtual cluster without the overhead of a full physical cluster
  • Automate infrastructure and resource provisioning with Crossplane: define composite resources (XRDs) so data teams can self-serve Kafka topics, Trino namespaces, and S3 buckets through Kubernetes-native APIs
  • Maintain GitOps pipelines for platform deployment and configuration drift detection
  • Lakehouse architecture & cloud migration

  • Migrate from public cloud data warehouse to VeepeeCloud Iceberg-based lakehouse — managing coexistence, schema evolution, and time-travel
  • Architect resilient ingestion, transformation, and serving layers around Trino + S3
  • Optimize Trino query performance: memory limits, spilling, cost-based optimizer tuning
  • Agentic & developer enablement

  • Build agentic self-service tooling so data teams can provision Trino/Iceberg resources and Kafka Connect pipelines autonomously via Crossplane — reducing toil and ops bottlenecks
  • Develop FinOps dashboards (compute, storage, query cost) with Grafana and Prometheus-based cost exporters
  • Write clear technical documentation, runbooks, and internal ADRs
  • Multi-DC resilience & DRP

  • Design and implement multi-datacenter strategies across FR1 / NL1 — active-active and active-passive topologies
  • Leverage Fast Erasure Coding on object storage (Ceph/S3) to maximize durability with minimal replication overhead
  • Ensure data replication consistency across sites for Iceberg table metadata, Trino catalogs, and Schema Registry subjects
  • Lead DRP exercises: failover playbooks, RTO/RPO validation, postmortems
  • 👉 MUST HAVE skills

    Must have

    • Strong experience with Kubernetes in production environments
    • Experience with Kube-in-Kube technologies (vCluster or similar)
    • Solid understanding of SRE principles (SLIs/SLOs, error budgets)
    • Experience with Prometheus and Grafana
    • Experience with Infrastructure as Code (Terraform or similar)
    • Experience with Crossplane
    • Familiarity with GitOps workflows
    • Experience with S3 and object storage technologies
    • Experience with PostgreSQL and Patroni
    • Experience with Kafka, Kafka Connect, and Schema Registry
    • Fluent in English
    •  
     

    👉 NICE TO HAVE skills

  • Experience with multi-datacenter architectures (FR1/NL1)
  • Experience designing disaster recovery plans and failover playbooks
  • Experience with Fast Erasure Coding (Ceph/S3)
  • Experience with Trino, Iceberg, and Lakehouse technologies
  • Experience with Airflow
  • Experience building agentic self-service platforms
  • Knowledge of FinOps and cost optimization practices
  • Programming experience in Python, Java, or Go
  • ✅ BENEFITS

  • Variable bonus
  • E-learning platform (self-education courses)
  • Meetups & conferences (local and international)
  • Flexible office — up to 2 days remote
  • International teams (France & Spain)
  • ⚙️ RECRUITMENT PROCESS

  • 1️⃣ 30-minute HR Screen with a Veepeeᵀᵉᶜʰ  Recruiter

  • 2️⃣ General Technical exchange

  • 3️⃣ Technical exchange with the manager

  • 4️⃣ Team Interview


  • We are convinced that it is up to you to define the way you work, to develop yourself and to progress.

    At Veepee we guarantee that you can just be yourself!

    For the service of diversity and inclusion, Veepee is committed to reviewing all applications received on an equal basis.  

    🔗COMPANY For more information about our ecosystem :  https://careers.veepee.com/en/home-page-en/ 

    Ready to apply to Veepee?
    Apply to Veepee

    Similar jobs

    Sign up for suggestions tailored to the jobs you open and the searches you save.

    Apply now
    🤖

    Whoa — hold up

    JobsRadar was built for real people having a rough time in their job search — not for automated requests. You're clicking way too fast and you're now temporarily blocked.

    Come back later. If you're genuinely job hunting, we've got your back — just act like a human.

    Catch your next role the second it’s posted.

    Create a free account and we’ll watch the boards for you — the instant a job matches your search, it lands in your inbox or Telegram. No digging, no refreshing.

    Create free account

    Free forever · takes 30 seconds · already have one?

    Get the worldwide-remote edge.

    Join our Telegram channel for the stuff that helps you land the role — salary benchmarks, the weekly market pulse, and new-feature drops. No spam, just signal.

    Join the channel — it's free