CAE

2020 – 2022 & 2023 – 2024

Data engineering and cloud infrastructure for a global leader in simulation and training.

Overview

Long-term engagement across two periods, building and optimizing the enterprise data warehouse, designing ETL pipelines, deploying real-time dashboards, and architecting serverless data ingestion solutions on Azure.

Key Contributions

Implemented ETL solutions via Azure Data Factory for data ingestion from multiple sources, including SQL Server and Oracle on-premise, to populate the Azure Synapse data warehouse.

Designed numerous data flows in Azure Data Factory and developed Databricks jobs using PySpark for comprehensive data transformation.

Participated in developing a standardized ETL Python/PySpark library to streamline various data cleansing and modeling processes.

Optimized and created multiple stored procedures in T-SQL to nourish dimension and fact tables in the data warehouse.

Performed data analysis, liaised with various business domain teams to identify fact and dimension tables, and architected the star schema for the data warehouse.

Deployed a near real-time updating Dash dashboard for the Healthcare Cloud team, employing time series and Python Azure Functions.

Designed and implemented a serverless architecture using Azure Functions V3 in .NET Core C# for data ingestion from SAP ARIBA into the data lake.

Ensured the optimal functioning of the cloud infrastructure, utilizing ARM Templates and managing the Azure DevOps pipeline.

Enhanced the development and source control process for a team of over 10 data engineers.

Participated in onboarding new employees by familiarizing them with the solution and setting up their working environment.

Years across two engagements

10+

Data engineers supported

Star Schema

Data warehouse architecture

Technologies Used

AzureAzure Data FactoryAzure SynapseDatabricksPySparkT-SQLAzure Functions.NET CoreC#ARM TemplatesAzure DevOpsDash

Need enterprise data engineering expertise?

Get in Touch