Project Overview:
You will be a key member of the delivery team on a high-stakes data platform migration for a multi-brand iGaming operator. The engagement replaces a legacy Pentaho/Postgres architecture with a modern SQLMesh + Dagster + BigQuery stack, completing before a 3–4X data volume surge in July 2026.
This is hands-on build work. You will work directly in the codebase from day one, embedded within a squad led by a Fractional CTO/Data Architect.
Scope of Work Phase 1a — Core Migration (Mar–Jul 2026)
● Migrate 128 Pentaho ETL jobs to idempotent SQLMesh models following Medallion architecture (Bronze → Silver → Gold)
● Convert 20+ complex PL/pgSQL trigger functions to event-driven Dagster assets and SQLMesh models
● Build and validate a Reverse ETL pipeline (BigQuery → Postgres) covering 107 tables across 3 SLA tiers: Hot ≤5 min, Warm 15–30 min, Cold daily
● Execute 7-Day Consistency Check with row-count and column-sum reconciliation as the production cutover gate
● Support UAT, production cutover, and 1-month hypercare post-cutover
Phase 1b — BigQuery Modernisation (Jul–Sep 2026)
● Rewrite 164 BigQuery Scheduled Queries from EXTERNAL_QUERY to native BQ reads
● Rewrite 32 BigQuery Stored Procedures to eliminate CloudSQL dependencies
● Retire 12 CloudSQL connections across 10 brands after 7-day burn-in validation
● 1-month hypercare support post-Phase 1b cutover.
Technical Requirements — Must Have
● SQLMesh: Incremental models, full-refresh strategies, model testing, Medallion architecture
● Dagster: Asset pipelines, sensors, scheduling, upstream/downstream dependency management
● BigQuer: Partitioning, clustering, MERGE with partition pruning, cost-aware query design, INFORMATION_SCHEMA
● PostgreSQL: Triggers, stored functions, PL/pgSQL fluency, trigger-to-model conversion patterns
● Python: Apache Arrow, Polars (or equivalent) for high-throughput Reverse ETL data movement
● Data validation: row-count reconciliation, column-sum diffing, data-diff tooling
● CI/CD for data: Git PR workflows, automated model testing, Azure DevOps experience
● Idempotency discipline: all models must be re-runnable with zero side effects
Nice to Have
● iGaming or sportsbook domain knowledge: member tiers, bet/deposit fact tables, fiscal period logic
● Experience with Datastream, Airbyte, or other CDC ingestion tools
● dbt experience (cross-applicable to SQLMesh patterns)
● Terraform or Pulumi on GCP
● Grafana or Datadog for pipeline observability
Ideal Candidate
● 5+ years in data engineering with a track record of shipping production pipelines — not just building in staging
● Has executed at least one legacy-to-modern DWH migration (not just contributed to one)
● Comfortable reading, debugging, and rewriting PL/pgSQL business logic
● Writes clean, testable SQL and Python — and holds the same standard for teammates
● Works well under an experienced technical lead: takes direction, flags risk early, executes independently
● Does not need hand-holding on tooling — picks up SQLMesh or Dagster quickly if not already proficient
● Available to start March 2026; committed to Phase 1b continuation preferred.
Code Quality Standards:
All models and assets produced must adhere to:
● Explicit partitioning and clustering keys on all destination tables
● No broad DISTINCT — use GROUP BY or window functions
● MERGE with partition pruning for upserts; no DELETE+INSERT anti-patterns
● No temp tables or CTEs materialised to disk
● Zero material drift on financial columns (Turnover, Revenue) — the mandatory acceptance gate
● Python adhering to PEP 8; SQL modular, idempotent, and reusable Working Norms
● 3x per week standup with the Fractional CTO — focused on blockers
● Weekly sync reviewing milestones, risks, and forward planning
● All work submitted via Pull Request — reviewed and approved by Client Data Engineering before merge
● Named to the engagement — no rotation without client approval
Application Confirmation
You're applying for the role below: