Data quality refers to the suitability of data for a specific purpose. Assessing the level of data quality and, where practicable, improving data quality to increase the chance of success of a data integration project is a matter for agreement between the data custodian and data user. It should be recognised that, while the data may be fit for the purpose for which it was collected (for example an administrative purpose), it may not necessarily be fit for statistical or research purposes.
In the context of a data integration project, there are two broad considerations.
The first is the data custodian’s responsibility to provide source datasets which are of an agreed upon quality, ensuring that data users and the integrating authority are aware of any issues with the quality of the data or limitation of its use. It is important that data custodians are transparent about the quality of their data to enable:
i. the data users to determine whether the data will meet the purpose of their project,
ii. the integrating authority to determine whether it is feasible for the dataset to be linked with other source datasets ; and
iii. the integrating authority to properly budget for any data cleansing/standardisation required prior to linking and merging the data.
Data custodians may share details of the data quality through the metadata provided and/or through data quality statements (see Quality assurance section). A benefit to data custodians from their involvement in data integration projects is that in the process of preparing the data for linkage, necessary and helpful data improvements may be identified. If there are issues with the data received (for example, data items are missing), then the integrating authority can raise these issues with the source data custodian, including why it might have occurred, and request for the data to be re-transferred. However, to preserve privacy and confidentiality, only the information that was obtained from that source custodian about an individual or organisation should be disclosed or discussed in relation to the data issues (see High Level Principle 5 regarding feedback of information).
The second element is the assessment and checking of linked record quality throughout the linking process, once the source datasets have been received. This is the responsibility of the integrating authority. Quality checks should be performed throughout each stage of the linkage process, in particular prior to the data extraction, and during the data preparation and linking stages - for more information on these stages refer to Project delivery.
There are many measures available to assess the quality of the linked records. Two common methods are assessing the accuracy of the linked records (e.g. checking whether the records are correctly linked and refer to the same individual or business) and the number of linked records compared to the number of expected links. For more information on checking linked data quality, refer to Sheet 5 of the Data Linking Information Series.
Related Information
The ABS supports an extensive range of initiatives to manage data quality. For further information see Data Quality Management.
For more information on data management see:
- Providing metadata
- Data quality
- Confidentialising data