TNT 2.6 Upstream Not Downstream

Published by Joseph Hobbs

Data best practices vlog with BI specialist, Joseph Hobbs. Season 2 Episode 6 focuses on the ETL stage and suggests starting your calculations further upstream in your data pipeline to help your power query/reports run faster.

Video Transcript

Hello everybody, welcome back to This Not That. This is a vlog where we talk about best practices within the data world and compare them to a common mistake that we often see in the industry.

Today we’re going to be talking about where it is [best] to do certain kinds of calculations within your data pipeline, and I’m going to walk you through what my advice is on that topic.

Welcome back everybody. So, one of the problems that I often run into or the conundrums if you will is where precisely to do some of the ETL (extract, transform and load) work that needs to be done for any given data project. There’s sort of a push and pull mechanic going on there. If you’re going to do it further upstream in your data pipeline, you get some benefits there. If you do it further downstream, usually that’s more accessible to your citizen analysts and civilians if you will, who are able to use those lower code environments.

My advice is when you can, do things upstream not downstream. Why is that?

Talking about data projects, and I work often in the reporting world, in your reports you want your report to do as little of the heavy lifting as possible because performance comes at a real premium at that point in time. There’s been some research done about the optimal time to wait between a click of the mouse or an interaction with something and the result popping up. In a perfect world you really want less than a second to wait, less than a second for your action to take effect and produce a result. Compare that if you will to what you expect in your back end, upstream processes. When you’ve written a SQL query of some kind, you don’t need a second response time, you can have a 10 second or 30 second or a 1 minute or much longer response time, it doesn’t matter. There’s a different expectation there for what it is you’re interacting with. Let me give you a real-world example.

One of the tools I commonly work in is called power bi. And in power bi you have the option to create a calculated column where it will go to your source table, or the imported copy that it has, it will go row by row and it will add a value to that. This works really well, it’s very simple to write and to execute. But the problem is that this occurs post compression. Earlier in the process when you’re working in power query, which is where you usually do your ETL steps, that step occurs. That query occurs and the result of that is then compressed and you’re adding an uncompressed value to the end of those results.

If you move it just that one step upstream, you’re going to save yourself a total file size. The size will get smaller AND your report will be faster as a result of that. You can move it up even further into SQL and then your power query will run faster.

There’s this chain reaction that occurs as you move things upstream.

Now as I said, anytime I give these guidelines there’s always an exception of some kind. So, there are places where the calculation is difficult to do upstream or you don’t have the admin privileges that you need or there’s a particular calculation that you want in a reporting tool that’s hard to replicate precisely in your upstream database. In those instances, sure, do it all the way downstream wherever you would like. But where you can, push those changes upstream and you will get positive results inside your reporting and its response time.

Thank you guys for watching today. I hope you enjoy these tips and tricks, things that I’ve found useful. If you liked this feel free to follow us or head over to the website or come to social media and leave me a comment about something you’d like me to talk about. Additionally, the company that I work for- Valorem, we’d love to partner with you. If you’ve got a [data] project and you want some advice or need some help- back end, front end, anywhere in between - we would love to come alongside and make you successful in the data work that you do.

Hope you all have a great week and I will see you next time

Joseph Hobbs

Digital Insights Consultant, Modern Data Experience

24 Articles

Digital Insights Workshop: How to make your business data work for you

Find out how a little digital insight goes a long way with customer engagement, revenue, and operati...

Learn More

You Might Also Enjoy

Technology

TNT 2.6 Upstream Not Downstream

Joseph Hobbs

Digital Insights Consultant, Modern Data Experience

Related Articles

QuBites 2.1- Quantum Computing & IoT

Immersive Workspaces, Microsoft Mesh and More

Ignite 2021 Recap- Microsoft Mesh, Azure Percept & Viva

Digitally Transforming the Frontline with 3D Guided Procedures

IoT, Robotics, Mixed Reality and MSIgnite 2021

Digital Insights Workshop: How to make your business data work for you

Find out how a little digital insight goes a long way with customer engagement, revenue, and operati...

You Might Also Enjoy

QuBites 2.1- Quantum Computing & IoT

Immersive Workspaces, Microsoft Mesh and More

Ignite 2021 Recap- Microsoft Mesh, Azure Percept & Viva

TNT 2.6 Upstream Not Downstream

Joseph Hobbs

Digital Insights Consultant, Modern Data Experience

Related Articles

Digital Insights Workshop: How to make your business data work for you

Find out how a little digital insight goes a long way with customer engagement, revenue, and operati...

Get More Articles Like This Sent Directly to Your Inbox

You Might Also Enjoy

TNT 2.6 Upstream Not Downstream