top of page

Telco Prime - Scalable & Auditable Dataflow Gen2 Solution for Customer Data

Image by Jameson Zimmer

Project Overview

Telco Prime has staked its future on a "Customer 360" initiative, but a foundational data quality crisis is undermining the entire project. For years, customer data has been ingested into the Bronze layer with no validation. The result is a chaotic, untrustworthy dataset, leading to failed marketing outreach and inaccurate analytics. The project aims to deliver a clean Customer Silver Table, designing a repeatable and scalable data cleansing pipeline to restore the data integrity and trust in the data using Dataflow Gen2. 

​​

Business Problem

Our initial data profiling of the Bronze Customer_Bronze_Large table confirms the analytics team's complaints. The data is unusable in its current state, and the quality issues are now more complex:

  • Address Inconsistencies: The Address field contains numerous variations for street suffixes ("Street," "St," "Str.", "Road", "Rd.").

  • State Inconsistencies: A new State field has been added, but it contains a mix of abbreviations, full 
    names, and different cases (e.g., "CA", "ca", "Calif.", "California").

  • Contaminated Fields: The Phone field is contaminated with non-numeric characters, including parentheses, dashes, and periods, while others are clearly invalid.

The business can no longer build on this unstable foundation. You must use Dataflow Gen2 to create a solution that joins the bronze data with reference lookup tables to produce a clean Customer_Silver table.

​

Project Objective

  • To ensure maintainability, scalability, and reusability, the solution was built using Microsoft Fabric Dataflow Gen2, leveraging its low-code ETL capabilities and native integration within the Fabric ecosystem.

  • A Published Dataflow Gen2: The dataflow must ingest from the Customer_Bronze table and implement all required cleansing, standardisation, and feature engineering logic to produce the target schema.

  • The Customer_Silver Table: The final, clean table loaded into the Lakehouse, conforming to the schema defined in Exhibit 2.

  • SQL Validation Script: A SQL script with queries that validate the output. The script must include checks to verify that address standardisation was successful, that phone numbers were cleaned, and that the Data_Quality_Flag was correctly implemented.

​​​

Project Design

  • Designed a Dataflow Gen2–based data cleansing and standardisation pipeline to convert inconsistent Bronze data into a reliable Silver layer table (Customer_Silver). This ensured the solution was easily scalable and maintainable by leveraging Dataflow Gen2's low-code capabilities. â€‹â€‹â€‹â€‹â€‹â€‹â€‹â€‹â€‹â€‹

  • Data was merged with the State_lookup Table using a Left Join to standardise state names for better analysis. 

  • ​The Address column was split into two columns as per the business requirement. Address Unit and Address Street were separated using the Split Column functionality in Datflow Gen2. The Texts were split using specific delimiters.

  • Phone number columns were cleaned by ensuring that each column contains only digits and that each digit is 10 characters long. Anything beyond that was termed as null in the dataset. 

  • The Data_Quality_Flag column was inserted using the logic that either the Customer Address or the Customer Phone number shall be available. If neither is available, then the record shall be flagged as 0. 

  • The Data ingested in the Silver layer was validated using SQL scripts. 

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​​​​​​​​​​​​​​​​​​​​​​​​

 

​

​​

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

​​​​​​​

​​​​​​​​Business Problem Solved

  • Trusted Data for Decision-Makers: By restoring confidence in the Telco Prime dataset, both business and analytics teams can now make faster, well-informed decisions with greater stakeholder trust.

  • Always-On Data Quality: The automated Dataflow Gen2 process refreshes data daily, eliminating repetitive manual corrections while continuously maintaining high data standards.

  • Higher Campaign Reach and Response: Standardised and cleansed address information significantly reduced undelivered communications, improving the effectiveness of targeted marketing initiatives.

  • Scalable Analytics Foundation: The refined Silver-layer dataset now serves as a reliable foundation for reporting, advanced analytics, and future machine learning use cases.

  • More Accurate Predictive Insights: Consistent geographic attributes, such as state and location, enhanced the reliability and precision of churn prediction models.

  • Improved Customer Engagement: Verified and validated contact numbers ensured outreach, support calls, and follow-ups reached the intended customers without friction.

​

CS4 - Validation SQL Queries.png
CS4 - M Code.png
CS4 - Initial Data.png
CS4 - Cleaned Data.png

Contact Information

Whether you’re looking to modernise data platforms, improve analytics reliability, or explore data engineering and analytics engineering opportunities, I’d be glad to connect.

​

If you have a project in mind or are building a team focused on scalable, business-ready data solutions, let’s start a conversation.

bottom of page