How we measure tax gaps

The methods we use to estimate tax gaps.

Last updated 31 October 2024

About the methods

There are typically 2 broad methods to estimate tax gaps – top-down or bottom-up methods (shown in Figure 5).

Top-down methods use externally provided aggregated data sources to estimate the size of the tax base, from which we estimate the theoretical tax liability. The difference between the theoretical tax liability and the amount we receive is the estimated tax gap. A top-down approach is typically used for indirect taxes.
Bottom-up methods involve a detailed examination of data sources, such as tax returns, audit results (including random enquiry programs), risk registers or third-party data-matching information. We use this information to determine the extent of non-compliance across the whole population, from which we estimate the tax gap. A bottom-up approach is typically used for direct taxes. There are 3 types as shown in Figure 5 and described further below.

Figure 4: Our 2 methods to estimate tax gaps

Figure 5: Method for each gap estimate

Choosing the methodology

We choose the methodology that provides the most reliable estimate for each gap we measure. In doing this, we carefully consider the characteristics of each gap, including:

the design of the tax or program
the characteristics of the population
availability and quality of data.

Assessing these factors helps us decide which method is the most appropriate to use. For example, to use a top-down method we generally require external data. If we don't have a reliable external data source available, we know we'll need to use a bottom-up method to generate a reliable result.

We assess our methodologies for reliability, and where possible test them against alternatives to ensure that we are using the most appropriate methodology. We consult with our engagement, advice and assurance on the options available to us. We also look to other jurisdictions to see what methodologies they use for similar gaps.

We continually work to update and improve our gap estimates. Part of this involves assessing the methodology used, to ensure it's still the most appropriate option. This means we can remain confident that our gap estimates are reliable and credible.

Gap approaches in detail

This section provides a more detailed explanation of the top-down and bottom-up methods we use to measure tax gap estimates.

Top-down methods

A top-down method essentially looks at a system and breaks it down to understand each of its constituent parts and how these work individually. Top-down methods use external information about the system for which we are constructing an estimate.

This method doesn't always provide information on what drives the tax gap, but rather tells us that a gap exists. An example of this is the goods and services tax (GST) gap, which uses information collected through the Australian National Accounts data set. This data is collated by the Australian Bureau of Statistics (ABS) and, therefore, sits outside data collected by us – for example, audit data.

Bottom-up methods

We have used 3 broad types of bottom-up methods:

random enquiry estimates
statistical-based approaches
model-based approaches.

Random enquiry programs

A random enquiry program (REP) is a process for selecting tax returns for evaluation. As the name suggests, the tax returns are randomly selected. This ensures that all have the same likelihood of being chosen.

This is unlike operational audit selection processes, which focus on taxpayers considered to have a higher risk of non-compliance with a potentially large amount of tax at risk. Operational audit data is biased towards this high risk, high consequence segment of taxpayers.

In contrast, random selection avoids any systematic selection of segments of the population. It is designed to provide an unbiased representation of taxpayer information.

Statistical-based approaches

Statistical-based approaches use a set of mathematical models to estimate an outcome where it would be impractical to obtain a data set that covers 100% of the population being estimated.

The 2 types of statistical-based approaches used within the tax gap program to estimate various tax gaps are:

regression analysis
extreme value theory.

Regression analysis

Regression analysis is a standard statistical technique for estimating the relationships between one variable and a series of other variables. The regression can be used to identify the probability or the magnitude of the tax gap using all available taxpayer records and compliance results.

To produce reliable and credible results when using regression analysis, corrections need to be made for selection bias. This bias exists because taxpayers we undertake our compliance activities on are higher risk taxpayers. If we don't adjust for this bias, our estimates are likely to be wrong and conclusions misleading. We adjust for selection bias using either:

propensity score matching
Heckman’s correction.

The benefit of regression analysis is that it is useful in identifying characteristics that help predict whether a taxpayer is non-compliant, as well as characteristics that help predict the degree of non-compliance. Based on these characteristics, or drivers, the size of the tax gap can be estimated for the taxpayers that are modelled to be non-compliant.

Extreme value theory

Extreme value theory is appropriate when the data is characterised by extreme outlier observations – for example, the data follows the 80/20 rule. That is, a small number of the data points (20%) make up most of the total value (80%).

This type of data is commonly seen in finance and science. We also see it in the data related to amendments to tax returns, both positive and negative. This can be from taxpayer adjustments or because of our compliance activities.

When we look at extreme values, we look to the relationship between the size of the extreme values. Their rank is estimated and applied to the population to inform the final tax gap estimate.

Model-based approaches

Where a random enquiry is not suitable, and available data does not match the assumptions required for a statistical approach, we use model-based approaches.

Model-based approaches identify the key themes, factors or channels that contribute to the gap, which are then used to inform the final estimate. Like all our estimates, they draw on all available data including expert judgment, management information and system data to inform the final estimate.

These approaches can also be individually referred to as:

micro-analytical simulation
illustrative
channel analysis.

The aspect they have in common is a disaggregation, the analysis of known information, then an aggregation to a final estimate.

QC53168