Staff Data Engineer at LVT

About the role

LVT · Onsite

ABOUT LVT

LVT is redefining how businesses operate in the physical world, moving beyond traditional security solutions to deliver AI-driven, actionable intelligence that makes sites smarter, safer, and more secure. Since pioneering our first mobile, solar-powered units, our commitment to scrappy, hands-on innovation has made us an established leader and one of the fastest-growing companies in intelligent site technology. We are building the next generation of solutions—from our physical units in the field to a powerful Agentic AI platform—that allows our customers to gain unprecedented visibility and control over safety, compliance, and operations. This is your chance to join a cutting-edge team that isn't just watching the world change, but actively building the technology that is changing it.

We’re a team that’s focused on growth and innovation, and we’re proud that our crew, products, and leadership are being recognized for it.

A Top-Tier Growth Company: Named one of the Financial Times’ Fastest Growing Companies 2025 and #10 on the Inc. 5000 Rocky Mountain Regional list for 2025.
Innovative Leadership: Our CEO, Ryan Porter, was named an EY Entrepreneur of the Year 2025, and our CTO, Steve Lindsey, was inducted into the Silicon Slopes CTO Hall of Fame in 2024.
Product & Software Excellence: We were named one of The Software Report’s Top 100 Software Companies of 2023 and are a winner of the Security Today Govies Award for 2025.

ABOUT THIS ROLE

LVT's AI systems are only as good as the data behind them. As we move toward Physical AI, the binding constraint shifts from model architecture to the data flywheel.

We are seeking a Staff Data Engineer to own that flywheel end to end including logs, sensor telemetry, labels and annotations, evaluation and benchmark sets. Every AI team trains and evaluates from a single stack that transforms data from the raw source through standardized, versioned, governed datasets.

This is a senior individual-contributor and technical-leadership role; formal people management is not required. You will partner closely with AI/ML research, the ML platform / MLOps function. You own the data side of the contract that defines what a model consumes and emits and annotation, edge, and infrastructure teams. You should be equally comfortable discussing dataset schema design, storage and partitioning trade-offs for multimodal data, versioning and migration strategy, and the governance controls that keep sensitive video and sensor data safe.

ROLE RESPONSIBILITIES

Data Flywheel Ownership: Own the end-to-end loop that converts raw edge telemetry and video into labeled training data, frozen evaluation sets and feeds model outputs back into the next round.
Layered Dataset Pipelines: Build and own the pipelines that register raw source data, standardize it into a single well-defined schema, and join and aggregate it into curated datasets so every team trains, validates, and benchmarks from one consistent store through one reader, rather than copying and reformatting data per use case.
Labels & Annotation Data Lifecycle: Own how labels and semantic annotations are appended to datasets without rewriting source data, then versioned, quality-checked, and served, partnering with annotation and data-operations teams on label production and verification while you own the dataset, storage, and serving side.
Evaluation & Benchmark Sets: Own the frozen, versioned validation and benchmark datasets that make model comparisons valid over time stable enough that an accuracy delta reflects the model, not a shifting dataset including the review and scrubbing discipline required before any set is shared externally.
Dataset Versioning: Own schema and content versioning so producers can evolve datasets without breaking consumers opt-in versions, append-without-rewrite for new fields, and the reader/writer indirection that lets data migrate underneath clients on a controlled rollout instead of forced lockstep migrations.
Framework Integration & Self-Serve Access: Own the read/write libraries and integrations researchers depend on PyTorch/Lightning dataloaders, a simple record-level CRUDL API, and Spark/analytics access and self-service so AI teams stay focused on model development.
Governance Enforced: Make governance machine-enforced in the flywheel rather than documented after the fact classification of clips, frames, labels, and embeddings; scrubbing and anonymization in load jobs; and lineage and provenance for every dataset version, annotation campaign, and training input.
Technical Mentorship: Set the data-engineering standards for the flywheel schema conventions, dataset contracts, quality gates and mentor IC work toward them, growing the function as the team forms.

OUR IDEAL CANDIDATE

Data Engineering Depth: 8+ years building and operating large-scale data pipelines and data-lake or lakehouse systems in production ingestion, ETL/ELT, partitioning and storage-format decisions, and the reader/writer libraries consumers rely on.
ML Data Specialty: Has built data pipelines for model training and evaluation, labeled data, and evaluation/benchmark sets with a working understanding of how data quality and versioning move model results.
Lakehouse Architecture: Strong experience with medallion-style layered data architectures and modern table/lake formats (e.g. Iceberg, Delta, Parquet, or comparable), including schema evolution and dataset versioning.
Multimodal Data at Scale: Experience with large multimodal data video, image, sensor/telemetry and the storage and access patterns that make it queryable at scale (denesting, repartitioning, binary-inline vs. reference storage).
Framework Integration: Hands-on with the data side of ML frameworks PyTorch/Lightning dataloaders and Spark and strong Python knowledge.
Governance & Provenance: Practical experience enforcing data governance in pipelines classification, access control, lineage and provenance, retention, particularly for privacy sensitive data.
Technical Leadership: A track record of setting data-engineering direction and leveling up engineers (technical leadership; formal management not required).
Education: Bachelor's or Master's in Computer Science, Engineering, or a related field, or equivalent practical experience.

PREFERRED QUALIFICATIONS

Streaming or near-real-time ingestion from edge/IoT sources into a data lake (e.g. Kafka, Lambda, EMR, or similar).
Append-without-rewrite and hash-indexed dataset techniques on open table formats, and dataset/feature-versioning systems.
Generative-AI data work: fine-tuning and evaluation dataset curation for LLMs/VLMs.
Exposing datasets to AI agents through MCP-style query interfaces, with semantic schema and plain-language documentation for retrieval.
Computer-vision / video annotation tooling and workflows (e.g. Encord, Labelbox, or similar).

COMPENSATION

The beginning annual salary range for this role is $171,900 - $221,000 USD and is determined by location, job-related experience, and education/training. Your total earning potential is amplified by a bonus structure tied to meeting goals, and you will become an owner from day one through our employee equity program.

BENEFITS

We believe you do your best work when your whole life is supported. We invest in our crew’s health, families, and financial futures with a benefits package designed to support you inside and outside the office. Full-time benefits include, but not limited to: Comprehensive health, dental and vision coverage, retirement benefits (401k match up to 4%), and flexible PTO.

LVT IS PROUD TO BE AN EQUAL OPPORTUNITY EMPLOYER. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status. All candidates must pass a drug screening and background check upon employment. Some roles may also require passing a federal background check and fingerprinting. Must be authorized to work in the U.S. If reasonable accommodation is needed to participate in the job application or interview process, and/or to perform essential job functions, please reach out to your recruiter.

Ready to apply to LVT?

Apply to LVT

About the role

Similar jobs

Whoa — hold up

About the role

Similar jobs

Whoa — hold up

Catch your next role the second it’s posted.

Get the worldwide-remote edge.