June 14th, 2021

AzureFunBytes – Intro to Azure Data Factory with @KromerBigData

Jay Gordon
Senior Program Manager

AzureFunBytes is a weekly opportunity to learn more about the fundamentals and foundations that make up Azure. It’s a chance for me to understand more about what people across the Azure organization do and how they do it. Every week we get together at 11 AM Pacific on Microsoft LearnTV and learn more about Azure.

AzureFunBytes animation

Data drives so many of our decisions. Whether it’s determining which products to have viewed first in our online retail store, or creating reports for business intelligence, we’ve got so much data! It’s time to figure out how to learn how to take that data and provide human-readable information that will help us continue to make the right decisions.

This week on AzureFunBytes, I am joined by Principal Program Manager, Mark Kromer about how to store and process our big data with Azure Data Factory. Mark will discuss the ETL (Extract, Transform, Load) process that gets our data into Azure Data Factory. I ask Mark how can we transfer the data we might have to Azure? We look into how to create pipelines to automate the ingestion of our data from various data stores.

00:04:35 – Intro to Mark
00:09:45 – Let’s meet Data Factory
00:14:48 – CI/CD With Data Factory Pipelines
00:20:32 – Azure Data Factory connector overview
00:31:57 – Demo Time

Our Agenda:

  • Intro to Data Factory
  • Differences between ADF & Synapse
  • Data Flows in ADF & Synapse
  • Data lake ETL patterns
  • Build an ETL flow using taxi sample data (Demo)
  • Q&A

From the Azure Documentation “What is Azure Data Factory?

Overview of Data Factory

Azure Data Factory is the platform that solves such data scenarios. It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.

Integrate all your data with Azure Data Factory—a fully managed, serverless data integration service. Visually integrate data sources with more than 90 built-in, maintenance-free connectors at no added cost. Easily construct ETL and ELT processes code-free in an intuitive environment or write your own code. Then deliver integrated data to Azure Synapse Analytics to unlock business insights.


Learn about Azure fundamentals with me!

Live stream is available on Twitch, YouTube, and LearnTV at 11 AM PT / 2 PM ET Thursday. You can also find the recordings here as well:

AzureFunBytes on Twitch
AzureFunBytes on YouTube
Azure DevOps YouTube Channel
Follow AzureFunBytes on Twitter

Useful Docs:

Get $200 in free Azure Credit
Microsoft Learn: Introduction to Azure fundamentals
Microsoft Learn: Integrate data with Azure Data Factory or Azure Synapse Pipeline
Microsoft Learn: Data integration at scale with Azure Data Factory or Azure Synapse Pipeline
Azure Data Factory
Azure Data Factory documentation
Azure Data Factory Tutorials
Extract, transform, and load (ETL)
Transferring data to and from Azure
Big data architecture style
Watch our snack-sized video tutorials here to learn more about building ETL with data flows Follow the Delta Lake tutorial here to build your own lake
Branching and chaining activities in an Azure Data Factory pipeline using the Azure portal
For access to the taxi medallion sample data to build these pipelines on your own, visit Mark’s sample data repo here and look for trip data and trip fare

Author

Jay Gordon
Senior Program Manager

Jay Gordon is a Senior Program Manager with Azure Cosmos DB focused on reaching developer communities. Jay is located in Brooklyn, NY.

0 comments

Discussion are closed.