The wealth of information provided by integrated datasets can create additional risk by increasing the chance of identifying an entity (such as a person or business). Protecting the confidentiality of individuals or organisations in an integrated dataset is a key element in maintaining the ongoing trust of the Australian public.
It is the responsibility of the integrating authority, on behalf of the data custodians, to ensure that information is only released to data users in a way that is not likely to enable identification, either directly or indirectly, of individuals or organisations. High Level Principle 6 states that “Access to potentially identifiable integrated data for statistical and research purposes, outside secure and trusted institutional environments should only occur where: legislation allows; it is necessary to achieve the approved purposes; and meets agreements with source data agencies.”
Simply removing names, addresses and other direct identifiers such as Australian Business Numbers when releasing or archiving statistically integrated datasets may not be sufficient to ensure that individuals or businesses cannot be recognised in the data. The datasets may also need to be confidentialised to ensure that a person or a business does not stand out in the data.
Accordingly, to confidentialise the data there are two key steps:
- de-identification of the data, that is the removal of any direct identifiers (e.g. name, address, Australian Business Number) from the data; and
- removing or altering other information that may allow an individual to be identified, for example, because of a rare characteristic of the individual, or a combination of unique or remarkable characteristics that enable identification.
It is the integrating authority’s responsibility to confidentialise the integrated dataset, in accordance with the requirements of the data custodians. The extent to which the data needs to be confidentialised before being provided to a data user will depend on factors such as the legislation governing access and use of the data, whether consent has been obtained to use the data for the research or whether ethics committee approval has been given for identifiable data to be released for the project.
If the integrating authority uses facilities such as the Secure Unified Research Environment (SURE) to provide secure access to the data, then the integrating authority should confidentialise the integrated dataset before uploading it to SURE, unless otherwise agreed by data custodian.
There are two general methods (often referred to as statistical disclosure control methods) used to confidentialise data that are to be disseminated:
- data modification methods (perturbation) which involve changing the data slightly to reduce the risk of disclosure, while retaining as much content and structure as possible; and
- data reduction methods which aim to control or limit the amount of detail available, without compromising the overall usefulness of the information available for research.
For more information on confidentiality, including information on popular techniques for confidentialising data, see the Confidentiality Information Series.
The Office of the Australian Information Commissioner (OAIC) has also released a guide for de-identification to align with the requirements of APP 6 which limits the disclosure of personal information. (see “Information policy agency resource 1: De-identification of data and information"). This resource draws extensively on the Confidentiality Series referenced above and supports the two steps for confidentialising data for the purposes of APP6.
For more information about data management see: