How well should BI analysts understand their data assets/sources?
- Kaloyan Petkov
- Feb 5
- 6 min read
Updated: Feb 7
Nowadays everyone deal with "data assets", whether it is a simple table or complex data lake we all must know how to use. Obviously this is pretty essential for Business intelligence professionals - BI analysts, Data analysts etc. All those roles must "handle" the data assets with ease so to perform their tasks like analysis, dashboarding, data extracting etc.
As everything else there are levels of understanding and knowledge. More skilled BI analysts know well their data assets (sources) and in the eyes of stakeholders can produce "miracles". So what exactly means to know and understand the data asset (source).
We will separate the knowledge in two sections - technical and business. Tech knowledge generally includes the data part - how the asset actually work, while the business knowledge is used to answer what the numbers mean. Both of those are critical for the successful understanding and utilization of the data asset (source). Imagine really good data analyst that knows how the source is structured, but has no idea what the fields mean - when the end users come to him with task he won't be able to properly utilize it. Same thing a business analyst that knows what the data mean, not very useful when you can't work with it right? So it is essential to have both types of understanding. Based on it we also give levels of understanding to each. We go even one step further and give examples with near-real life source that every corporation probably has.
So here is our pyramid of knowledge:

So before we go by each one, lets take an example of data source (asset). In my simple example I have the revenue table of some corporation:

So our data source has 9 fields:
RowID (Small Int Sequential) - showing the ID of each row;
Document Number (VARCHAR/String) - unique document number (no matter whether invoice or credit);
Document Type (VARCHAR/String) - indicates whether the document is either invoice or credit;
Client Code (VARCHAR/String) - the client code to whom we sell;
Product (VARCHAR/String) - the product ID that is being sold;
Segment (VARCHAR/String) - several products are grouped in Segment i.e. higher dimension of Product hierarchy;
Type (VARCHAR/String) - indicates whether the row is part of Actuals or Budget or Forecast;
Billed Revenue (Float/Numeric) - numeric field showing the revenue in each invoice/credit;
Forecast (Float/Numeric) - numeric field showing the forecasted revenue for the particular customer-product combination.
Now this is a fairly sophisticated data source, but if I was task to analyze revenue of a company I would definitely create something similar. Having such a complex table allows me to analyze both actuals and forecast of revenue and provide details if needed. So if you are starting a BI job in corporation you will handle similar stuff!
Lets go over each level in more detail.
Tech Knowledge
"How it is structured"
When dealing with more complex tables you need to know how they structured. In our particular example notice our source is actually two different tables: rows 1-9 represent the actual revenue, while rows 9-16 are a forecast of revenue. What is melting them together are the "common fields" like Client & Product. This means that for each client or product you can actually use the table to show both actuals & forecast. Also notice that Segment field is actually a higher level of Product, basically products grouped together into Segments. Additionally actuals section is further broken into Invoice and credits by column Document Type. All of this is pretty important to know when stakeholders come with questions and tasks.
Potential Trap: You sum up the Value field with no filters and say this is our earned Revenue, which is not correct as it includes also the forecast...
"Refresh Frequency "
That one is easy to find out, the team that has created the asset/source surely will know when is the refreshing timing. But it is not to be underestimated, because knowing the refreshing timing of your source will likely play a role when answering questions to stakeholders.
Potential Trap: the CFO asks you whether your analysis is the final position for the month, and you don't know that this is refreshing once a week and answer yes...
"Level of Detail"
LOD or Level of Detail generally refers to the lowest detail that any metric can be shown by. Going back to our example notice we actually two different LODs. Rows 1-8 where are the actuals you can break the revenue down to Document Number, while rows 9-16 go down only to client level. Meaning you can show the Actual revenue by invoice, but the forecast only for client. Realizing this will help you avoid errors.
Potential Trap: you do put a filter on column Type to keep only actuals but then you count all the rows and presented as 'Number of clients' but in fact this is number of documents...while you should've distinct counted the Client column...
"How to relate to other assets"
This connected to the primary keys of the table, so you have to know how you can join different additional tables to your source. So lets say the company policy is that each segment is managed by different salesperson, in this case simply joining by Segment will suffice. However if the it is also dependent on which is the client then you need to add second condition - Client field.
Potential Trap: You have to add the accountant who received the invoice, but you don't know that it is dependent also on the Document Type and you produce duplicates..
"Data Generation Process"
Knowing how the data behind you numbers is generated is absolutely key for successful usage, but often it requires even deeper understanding of how enterprise systems work. This knowledge allows you to solve much more complex queries.
Potential Trap: You put visualization showing new invoice that have appeared in the data but it doesn't show you the details by product in invoice...you missed that the invoice line details are joined by other table in the system that is refreshed only ones a day.
Business Knowledge
"What different fields mean"
Obviously if you are to make an analysis based on data set you definitely need to know the different fields mean in business terms. To be honest 80% of the fields in corporate data are pretty much self-explanatory, but still every BI analyst need to do the homework and know their data.
Potential Trap: You are tasked counting the number of client accounts that paid revenue, but you mistakenly used client group instead of the leaf-level of clients e.g. client accounts...
"What different aggregations mean"
Another important thing about data set for analysis is knowing how to aggregate the data. Does it make sense for a given field to be an average or sum? Again most of the fields are self-explanatory but if you are making an analysis you better have this knowledge.
Potential Trap: You are analyzing the return of financial investment portfolio of the company and you sum up all the prices of the assets in current period and divide on the same in previous month....while instead you should've calculate each asset's change vs previous month and THEN aggregate on portfolio level.
"What questions can be answered by the data"
In more traditional BI teams the requirements of building analysis come from stakeholders that usually give both the questions and data sources used to answer them. But that is very perfect scenario! If you really want to add value in your company it will be necessary to know the answer to those questions yourself. For given analysis should you use the aggregated data from accounting system or invoice level or journal data. Every case and question is unique but if you have sound business knowledge you can tie the knot of knowledge of corporate data systems.
"How it is related to other data assets"
Knowing how to joint tables is very different from knowing by what to join them or why join them overall. Quite often business stakeholders don't know how the data is organized behind the scenes and at the same time data analysts don't know which table should be directly connected, so it is the job of BI analysts to piece together as they should have both the business knowledge and tech expertise. For example if the task is to report revenue by sales teams, you need to know what is the table connecting the transactional data (invoices) to the sales structure and how it must be joined - only client? or perhaps client and product? This is pure business knowledge but it is vital if you want to add value as a BI analysts.
Conclusion
As in every skill the BI work has levels to it, obviously we will have good, bad, great BI analysts with varying skill level. Furthermore the focus will be different as well - some will more skilled in making dashboards, other will be great at SQL, another will have business knowledge. But I believe that all should strive for building their full portfolio of skills - have the necessary tech skills but don't forget the business knowledge as well. Not only you will add value, but you will future proof your career - Generative AI may write better SQL code than you, but it will never be able to reason what tables should be used for particular analysis...unless it becomes General AI (that is your own brain).
Comments