We apply 2 bottom-up statistical methods to estimate the medium business income tax gap estimate.
Previously, the key characteristics of the compliance data of individual entities in the medium business segment were more closely aligned with the assumptions under the Extreme Value Theorem (EVT). However, this is no longer the case due to compliance coverage and amendment outcomes for medium business individuals declining notably in recent years.
We have replaced the EVT regression model with a combination of logistic and Poisson Pseudo Maximum Likelihood (PPML) regression models. The method and results are outlined below and combined in Table 1.
- Calculation – medium business individuals
- Calculation – medium business companies
- Limitations
- Updates and revisions to previous estimates
Calculation – medium business individuals
We use 5 steps in applying a logistic regression and a Poisson Pseudo Maximum Likelihood regression to the individuals population.
Step 1: Apply a logistic regression
A logistic regression is applied in our model in order to calculate the probability of each individual having a tax gap.
The results of ATO-initiated compliance activities as well as client-initiated amendments (positive amendments only) are used to estimate the unique probability that each individual has a tax gap. We analyse the income tax return data to identify relevant demographic and financial variables that would contribute to the prediction of whether or not individuals have a tax gap.
The observations for these variables are then weighted based on the individuals' propensities of being selected for compliance activity, before being included in the logistic regression that models the probability of an individual being non-compliant, resulting in a tax gap.
Step 2: Apply a Poisson Pseudo Maximum Likelihood regression
We identify individuals that have been subject to ATO-initiated compliance activity with a positive compliance result. We then determine the variables which are most highly associated with having a positive compliance result. This is based on analysing the correlation coefficients in the regression output, while also considering the collinearity of variables with each other.
After that, the observations for those variables are adjusted using the same weights in the logistic regression above to account for potential selection bias. The Poisson Pseudo Maximum Likelihood regression is used because it is better fits data with a high concentration of observations with zero values in the population.
The key difference between steps 1 and 2 is that step 1 calculates the likelihood of a individual having a tax gap while step 2 calculates the size of each individual's potential tax gap.
Step 3: Combine the results from the 2 models
For each individual in the population and each financial year, the estimated unreported tax is multiplied by the estimated probability of non-compliance. These amounts are then summed on a financial year basis to arrive at the total estimated unreported tax for each year.
Step 4: Apply a non-detection uplift factor and non-pursuable debt
We uplift the estimates preceding this step to account for non-compliance that is not detected. This results in a final estimate that has a lower likelihood of understating the true size of gap in the system, which is not directly observable.
We also seek to quantify the tax gap effects of unreported offshore income by Australians. A main assumption of the OTE estimate method is that it is only borne by individuals (not companies). The OTE estimate is allocated based on the share of net tax of individuals in each gap population. The estimated OTE amount is then added to the non-detection uplift amount.
We also add in the value of non-pursuable debt. This is debt that the Commissioner of Taxation has assessed as:
- not legally recoverable
- uneconomical to pursue
- unable to be pursued due to another Act.
Step 5: Consolidate the tax gap estimates
We calculate the gross gap by adding the unreported amounts or total expected amendments from step 3 to the non-detection uplift and non-pursuable debt from step 4.
We calculate the net gap by subtracting the total amendment amount from the gross gap. Then we add the net gap to the expected collections to estimate the total theoretical liability.
Step | Description | 2015–16 | 2016–17 | 2017–18* | 2018–19* | 2019–20* | 2020-21* |
---|---|---|---|---|---|---|---|
1 | Total population (count) | 6,480 | 6,642 | 6,770 | 6,525 | 6,592 | 6,991 |
2 | Total expected amendments ($m) | 143 | 106 | 93 | 93 | 88 | 103 |
3.1 | Non-detection ($m) | 83 | 65 | 60 | 59 | 58 | 66 |
3.2 | Non-pursuable debt ($m) | 5 | 4 | 4 | 4 | 4 | 4 |
4.2 | Gross gap ($m) | 231 | 174 | 157 | 155 | 150 | 172 |
4.3 | Amendments ($m) | 97 | 51 | 19 | 33 | 33 | 33 |
4.4 | Net gap ($m) | 134 | 123 | 138 | 122 | 116 | 139 |
4.5 | Expected collections ($m) | 1,219 | 1,352 | 1,482 | 1,277 | 1,387 | 1,627 |
4.6 | Total theoretical liability ($m) | 1,353 | 1,475 | 1,620 | 1,399 | 1,504 | 1,766 |
4.7 | Gross gap (%) | 17.1% | 11.8% | 9.7% | 11.1% | 9.9% | 9.7% |
4.8 | Net gap (%) | 9.9% | 8.3% | 8.5% | 8.7% | 7.7% | 7.9% |
Calculation – medium business companies
We use 5 steps in applying logistic linear regressions to the company population.
Step 1: Apply a logistic regression
We analyse the tax return data of companies that have been subject to amendment activities to identify relevant demographic and financial variables that would contribute to the prediction of whether or not businesses have a tax gap.
The observations for these variables are then weighted based on the businesses' propensities of being selected for compliance activity before being included in the logistic regression that models the probability of a company being non-compliant.
We then undertake a Monte Carlo simulation to determine each company's binary status of being either compliant or non-compliant.
Step 2: Apply a linear regression
We analyse the tax return data of known non-compliant companies to identify characteristics of businesses that would contribute to the prediction of the tax gap size if the company were found to be non-compliant.
We apply weights to account for selection bias. Then we apply the linear regression to each company to estimate the potential size of the tax gap.
The key difference between steps 1 and 2 is that step 1 calculates the likelihood of a company having a tax gap while step 2 calculates the size of each company's potential tax gap.
Step 3: Combine the results from the 2 regressions
We calculate the estimated unreported tax amount for each simulation by adding together the non-compliance amounts from step 2 for all non-compliant businesses predicted in step 1.
We estimate total unreported tax (including amendments) by taking an average of the results from 20,000 simulations.
Step 4: Apply a non-detection uplift factor and non-pursuable debt
We uplift the estimates preceding this step to account for non-compliance that is not detected. This results in a final estimate that has a lower likelihood of understating the true size of gap in the system, which is not directly observable.
We also seek to quantify the tax gap effects of unreported offshore income by Australians. A main assumption of the OTE estimate method is that it is only borne by individuals (not companies). The OTE estimate is allocated based on the share of net tax of individuals in each gap population. The estimated OTE amount is then added to the non-detection uplift amount.
We also add in the value of non-pursuable debt. This is debt that the Commissioner of Taxation has assessed as:
- not legally recoverable
- uneconomical to pursue
- unable to be pursued due to another Act.
Step 5: Consolidate the tax gap estimates
We calculate the gross gap by adding the unreported amounts from step 3 to the non-detection uplift and non-pursuable debt from step 4.
We calculate the net gap by subtracting the total amendment amount from the gross gap. Then we add the net gap to the expected collections to estimate the total theoretical liability.
Step | Description | 2015–16 | 2016–17 | 2017–18* | 2018–19* | 2019–20* | 2020-21* |
---|---|---|---|---|---|---|---|
1-2 | Total population (count) | 30,464 | 31,676 | 32,776 | 33,721 | 34,319 | 35,600 |
3 | Total expected amendments ($m) | 605 | 667 | 640 | 631 | 683 | 781 |
4.1 | Non-detection ($m) | 302 | 334 | 320 | 316 | 341 | 391 |
4.2 | Non-pursuable debt ($m) | 64 | 53 | 53 | 53 | 53 | 53 |
5.1 | Gross gap ($m) | 972 | 1,054 | 1,013 | 1,000 | 1,077 | 1,225 |
5.2 | Amendments ($m) | 267 | 195 | 145 | 131 | 125 | 125 |
5.3 | Net gap ($m) | 705 | 860 | 867 | 869 | 952 | 1,100 |
5.4 | Expected collections ($m) | 10,484 | 11,589 | 11,661 | 11,037 | 11,734 | 14,319 |
5.5 | Total theoretical liability ($m) | 11,189 | 12,449 | 12,528 | 11,905 | 12,687 | 15,419 |
5.6 | Gross gap (%) | 8.7% | 8.5% | 8.1% | 8.4% | 8.5% | 7.9% |
5.7 | Net gap (%) | 6.3% | 6.9% | 6.9% | 7.3% | 7.5% | 7.1% |
Find out more about our research methodology, data sources and analysis used for creating our tax gap estimates.
Limitations
The following caveats and limitations apply when interpreting this tax gap estimate:
- There is considerable lag between an income year and the completion of our compliance activities for that year. This means gap estimates are subject to revision for a considerable period. Where the projections are deemed inadequate, we have made provisional amendments for FY2018-FY2020 by applying the average amendment where the average is higher than the actual amounts.
- The true extent of non‑detection is unknown and is extremely challenging to measure. There is no international proxy we can apply to the individuals or companies in this population. We assume there will be errors and omissions in our compliance activities due to factors outside our control and limitations in operational capability and capacity.
Updates and revisions to previous estimates
Each year we refresh our estimates in line with the annual report. Changes from previously published estimates occur for a variety of reasons, including:
- improvements in methodology
- revisions to data
- additional information becoming available.
The updated net gap percentages for medium business - individuals for this year are slightly higher using the new EVT regression model with a combination of logistic and Poisson Pseudo Maximum Likelihood (PPML) regression models.
Figure 2: Comparison of previous and current net tax gap estimates (%), 2012–13 to 2020–21
This data is presented in Table 4 as a percentage.
Table 4: Current and previous net medium business income tax gap estimates, 2012–13 to 2020–21
Year | 2012–13 | 2013–14 | 2014–15 | 2015–16 | 2016–17 | 2017–18 | 2018–19 | 2019-20 | 2020-21 |
---|---|---|---|---|---|---|---|---|---|
2023 program | n/a | n/a | n/a | 6.7% | 7.1% | 7.1% | 7.4% | 7.5% | 7.2% |
2022 program | n/a | n/a | 6.7% | 5.9% | 7.1% | 6.4% | 6.9% | 7.0% | n/a |
2021 program | n/a | 5.7% | 6.2% | 6.2% | 6.3% | 6.0% | 6.2% | n/a | n/a |
2020 program | 6.3% | 6.1% | 6.4% | 6.6% | 6.8% | 6.2% | n/a | n/a | n/a |