CAE
2020 – 2022 & 2023 – 2024
Data engineering and cloud infrastructure for a global leader in simulation and training.
Overview
Long-term engagement across two periods, building and optimizing the enterprise data warehouse, designing ETL pipelines, deploying real-time dashboards, and architecting serverless data ingestion solutions on Azure.
Key Contributions
Implemented ETL solutions via Azure Data Factory for data ingestion from multiple sources, including SQL Server and Oracle on-premise, to populate the Azure Synapse data warehouse.
Designed numerous data flows in Azure Data Factory and developed Databricks jobs using PySpark for comprehensive data transformation.
Participated in developing a standardized ETL Python/PySpark library to streamline various data cleansing and modeling processes.
Optimized and created multiple stored procedures in T-SQL to nourish dimension and fact tables in the data warehouse.
Performed data analysis, liaised with various business domain teams to identify fact and dimension tables, and architected the star schema for the data warehouse.
Deployed a near real-time updating Dash dashboard for the Healthcare Cloud team, employing time series and Python Azure Functions.
Designed and implemented a serverless architecture using Azure Functions V3 in .NET Core C# for data ingestion from SAP ARIBA into the data lake.
Ensured the optimal functioning of the cloud infrastructure, utilizing ARM Templates and managing the Azure DevOps pipeline.
Enhanced the development and source control process for a team of over 10 data engineers.
Participated in onboarding new employees by familiarizing them with the solution and setting up their working environment.
Years across two engagements
Data engineers supported
Data warehouse architecture