Data best practices vlog with BI specialist, Joseph Hobbs. Season 3 Episode 1 Hobbs encourages you to consider dataflow as an alternative to storing your data in individual tables in SQL.
Video Transcript
Hello everybody, welcome back. My name is Hobbs and this is This Not That or TNT, it’s a BI vlog where we talk about best practices that we've experienced here in the BI setting with our clients. This is season three I'm excited this time to talk more about back-end processes and data flows and data pipelines and the ways that you can make sure you are doing the best you can with what you got to work with.
Welcome back everybody! My name is Hobbs and today I want to talk some about the difference between what in the [Microsoft] Power Platform world is called a dataflow, and what most people think of when they're designing data by points. So, whenever you are spinning up a database of some kind and you imagine how the data would flow down through the different stages to your final reporting or application layer, usually there's a few steps involved in that process. It would be called different things depending on what organization you're with, how your company thinks about this, but the basic steps are these:
- We start off with the source system. This is the production of your data. And usually that produced data needs to land and live somewhere in the exact same state that it was created in. So, we usually call this source, a schema name.
- From there, from the source, then you get into a series of steps where you're staging things and you're transforming things and then at the end of it all you'll end up with fact and dimension tables, perhaps some other schemas depending on your company and your needs.
What I'd like to propose as an alternative to this though, is to think about it from a Power Platform perspective. With the Power Platform mindset, instead of having each of these individual steps producing a table, what you end up doing is building a dataflow. So, for those of you who are familiar with [Microsoft] Power BI, or [Microsoft] Power Apps for that matter, dataflows is an ETL engine (extract, transform and load) that is going to reach out all the way to your source system and, without all of these different landing steps, will take something through the entire pipeline down to your semantic layer where you're really getting ready to report and build applications on top of this. If you combine this with a common data model, which will be a topic for another time, you get immense versatility and standardization and ease of governance.
So, when you're thinking about these pipelines, you've got the option. And in some cases, your data pipeline is complex enough you may need a true SQL Server schema pipeline, one step at a time. But in most cases, I’d say maybe 80% of the time, the clients I work with, a dataflow is going to be a much better approach for you to take.
So, to recap that really quickly. You're always going to need to take data from where it lives to a state where people can really use it for real life business problems. Rather than the older way of storing things as individual tables in SQL, I encourage you to consider what it would look like to use a single computational engine to take it all the way from source all the way down to that semantic layer. And then to be able to check each of those steps one at a time, right there inside of a dataflow. I hope you enjoyed today's episode.
If you're interested in learning more about dataflows or having someone look at your back end and see exactly the pipelines and processes, you're working with, we would love to partner with you and see how it is that we could be of assistance. As always, check us out on social media, come find me on LinkedIn, ask me questions, any way that we can be of assistance to you, we would like to be. And I will see you next time!