Data best practices vlog with BI specialist, Joseph Hobbs. Season 2 Episode 3 proposes a database design best practice that allows you to make better, higher-level decisions based on your data.
Video Transcript
Hello again everyone, this is Hobbs from Valorem Reply and this is TNT. This is a video series we’re doing where we talk about best practices and common mistakes that we see sort of in the IT/BI/data space. This week we are going to be talking about transactional data.
Welcome back everybody, I am Hobbs and this week I want to talk about something that is of importance to me in particular, it sort of rubs my personality the wrong way, you could call it a pet peeve. So here it is.
A lot of times when I’m interacting with businesses or specifically when I’m interacting with analysts, and I’ve gone to them and I’ve said ‘Hey, what do you want out of this?’ – I work a lot in the front-end reporting/UI/UX space. And often they’ll come back to me and they’ll say ‘well, what I need to do my job, is this raw data here. And if you can provide that raw data to me, I can take everything from there and I can solve my own problem.’
On the surface that’s a great attitude, right? To be able to say, you give me anything, you give me raw data and I will do all of the rest of the work to get to a business decision, no problem at all. But what I want to do right, the reason that people bring me in, is so that I can say ‘Let me do that work for you.’ One of the most concrete ways to do this, when you’re interacting with a business intelligence or a data project, is to focus on doing aggregate data not transactional data.
So, what do I mean when I say that?
Transactional data is precisely that, it’s data at the level of the transaction. A thing occurs and you show the thing, something else occurs- you show the next thing, something else occurs- you show that thing. So you end up with one row for every event or transaction that has happened. The problem with this of course, is that that data is always enormous. There’s always tons and tons of it and it’s very hard to read and go through.
Aggregate data should be approached from the perspective and say ‘When does the event that occurs or the transaction matter? At what level do I need to make a decision based on the number or the quantity or the quality of these transactions?’ If you take this approach, if you start with the business question sort of in the back of your mind and then move forward down into the way you’re designing your data and you’re starting. If you’re going to say form follows function, start with the function. What am I trying to accomplish, what business decision do I need to make? And then back up into the way you design your databases. Often what you will get, is somewhere in the reporting layer, aggregated data. And I strongly encourage this.
In your reports themselves, focus on the business question and try and take out anything that’s transactional and get to something at a higher level, where you can make an honest to goodness business decision based off of it.
Now like every guideline that I present, there’s always going to be an exception. There are analysts who really do need to know something about one particular transaction. And when that occurs, you need to be able to provide that for them so that they can do their job well. But where you can, when designing your data structures, when designing your reports, when thinking about the function and the form of the projects that you’re doing, try and stay away from transactional things and move towards aggregate ones.
Hope you all enjoyed today's’ episodes. If you have any thoughts on this or you’ve got some best practices you would like to share, I would love to hear from you. I would love for you to comment on these videos and give me your thoughts. And as always, I will see you next time.