Back to Portfolio
Intermarché: Enterprise Data Processing Infrastructure
Data Engineering & Cloud

Intermarché: Enterprise Data Processing Infrastructure

Multi-cloud ETL infrastructure with Dataform, BigQuery, and Cloud Functions orchestrating data pipelines across GCP and Azure.

Project Overview

The Challenge

Organizations operating across multiple cloud providers (GCP and Azure) face significant challenges synchronizing data between disparate systems, leading to data inconsistencies and manual intervention requirements.

Traditional ETL processes struggle with reliability at scale, lacking proper error handling, retry mechanisms, and monitoring capabilities needed for processing millions of records daily.

Managing complex data transformation workflows with dependencies, scheduling, and version control requires sophisticated orchestration tools that integrate seamlessly with cloud infrastructure.

Manual deployment of cloud functions and data workflows across multiple environments (TEST, UAT, PRD) creates inconsistencies, deployment errors, and extended downtime during updates.

Ensuring data accuracy and consistency across multiple data sources and destinations requires comprehensive validation mechanisms and automated cleanup processes.

Architected and developed enterprise-grade data processing infrastructure on Google Cloud Platform, orchestrating complex ETL pipelines across multi-cloud environments. Built six specialized Cloud Functions for data synchronization between Azure Storage, GCS buckets, BigQuery data warehouse, and PostgreSQL operational databases.

Designed and implemented Dataform-based data transformation workflows with Git version control, automated compilation and validation, and cron-scheduled execution. Developed custom Python client for Dataform API integration, managing release configs, workflow configs, and invocation tracking across multiple environments.

Built comprehensive CI/CD pipeline using CircleCI with automated deployment across TEST, UAT, and PRD environments. Implemented reusable deployment commands with environment-specific configurations, VPC connectivity for secure database access, and shared Python libraries reducing code duplication by 70%.

Technical Architecture

Click diagram to zoom

CircleCI CI/CD Pipeline: Automated deployment orchestration with environment-specific configurations (TEST/UAT/PRD), gcloud CLI integration, and approval workflows for production releases

Cloud Functions Ecosystem: Six specialized Python 3.12 functions with shared libraries, VPC connectivity for database access, configurable memory/CPU/timeout settings, and event-driven triggers

Dataform Workflows: SQL-based data transformation pipelines with version control, dependency management, scheduled execution via cron, and comprehensive compilation/validation

BigQuery Data Warehouse: Centralized analytics storage with optimized schemas, partitioned tables, and integration with Dataform for complex transformations

PostgreSQL Operational Database: Production database for real-time operations with Cloud SQL connectivity, automated cleanup, and BigQuery synchronization

Shared Python Libraries: Reusable modules for common operations including settings management, utility functions, and client-specific schemas ensuring consistency across functions

Key Challenges & Solutions

1

Cross-Cloud Data Migration (Azure to GCP)

Implemented copy-azure-to-gcp Cloud Function with 8GB memory and 2 CPUs for parallel processing, added retry logic with exponential backoff, created comprehensive error logging and monitoring, and achieved 600-second timeout optimization for large file transfers.

2

Real-Time ETL Processing at Scale

Built event-driven load-data-to-bigquery function triggered on bucket uploads, implemented automatic schema inference and table creation, added data quality validation before loading, and created comprehensive error handling with detailed logging for troubleshooting.

3

Dataform Workflow Orchestration

Developed custom Dataform client with Python SDK integration, implemented release config management with Git commitish versioning, created workflow config automation with cron scheduling, and built compilation result tracking with automatic validation and error reporting.

4

VPC Connectivity for Secure Database Access

Configured VPC connectors for bigquery-to-postgres and clean-postgres-data functions, implemented service account authentication with least-privilege IAM policies, added connection pooling and timeout management, and created automated retry mechanisms for transient network failures.

5

Automated Multi-Environment Deployment

Built CircleCI workflows with environment-specific parameters and approval gates, created reusable deployment commands with parameterized configurations, implemented automatic library injection replacing ...libs imports with .libs, and added comprehensive deployment validation and rollback capabilities.

Impact & Results

Multi-environment CI/CD pipeline supporting TEST, UAT, and PRD with approval workflows, reducing deployment time by 80% and eliminating manual configuration errors

Scalable cloud functions processing data with configurable resources (up to 8GB memory, 2 CPUs), 600-1200 second timeouts, and automatic scaling to 100 instances

Dataform integration with Git-based versioning, automated compilation and validation, cron-scheduled execution, and comprehensive dependency management

Shared library architecture reducing code duplication by 70%, ensuring consistency across all functions, and simplifying maintenance and updates

Comprehensive monitoring with detailed logging, automated health checks and pod resets for GKE clusters, and real-time alerting for pipeline failures

Key Features

  • Multi-cloud data synchronization (Azure to GCP)
  • Six specialized Cloud Functions for ETL operations
  • Dataform workflows with Git-based versioning
  • Automated CI/CD across TEST, UAT, and PRD
  • VPC connectivity for secure database access
  • Shared Python libraries reducing duplication by 70%

Technologies Used

PythonGoogle Cloud PlatformDataformBigQueryPostgreSQLCloud FunctionsCircleCIKubernetes

Project Gallery

Data Processing Infrastructure
Data Processing Infrastructure

Project Details

Client

Mercio (Intermarché Project)

Timeline

2024

Role

Data Engineer & Cloud Architect

© 2025 Firas Jday. All rights reserved.

0%