Access and validation

Governance details

Documents or webpages that describe the overall governance of the data source and processes and procedures for data capture and management, data quality check and validation results (governing data access or utilisation for research purposes).

Biospecimen access

Are biospecimens available in the data source (e.g., tissue samples)?

No

Access to subject details

Can individual patients/practitioners/practices included in the data source be contacted?

No

Description of data collection

Datasets are provided by various organisations and are anonomysed before loading into SAIL Databank.
The data provider splits each dataset into two components:
1) Demographic Component – this holds the identifying information to be anonymised.
2) Content Component – this holds other details, such as diagnosis, medication, etc.


The demographic component (1) is sent to Digital Health and Care Wales, where it is validated and each record is anonymised and assigned a unique, non-identifiable code. This code, and minimal information on gender, area of residence and week of birth is then sent to SAIL Databank.

The content component (2) is sent directly to SAIL Databank where the two components of the dataset (1 and 2) are linked together. The complete de-identified dataset can now be accessed for research, subject to approvals.
Event triggering registration

Event triggering registration of a person in the data source

Birth
Practice registration
Residency obtained

Event triggering de-registration of a person in the data source

Death
Practice deregistration
Emigration

Event triggering creation of a record in the data source

Receiving healthcare; other events like prescriptions, letters sent to patient, etc; creation of disease registries.
Data source linkage

Linkage

Is the data source described created by the linkage of other data sources (prelinked data source) and/or can the data source be linked to other data source on an ad-hoc basis?

Yes

Linkage description, pre-linked

Linkage based on demographic details and NHS number.

Linkage description, possible linkage

Combination of exact (NHS number) and probabilistic linkage

Linked data source 1

Pre linked

Is the data source described created by the linkage of other data sources?

Yes

Data source, other

Numerous datasets from multiple sources linked.

Linkage strategy

Probabilistic

Linkage variable

NHS number, name, date of birth, gender, postcode

Linkage completeness

Varies by dataset; >99% for key datasets

Linked data source 2

Pre linked

Is the data source described created by the linkage of other data sources?

No

Data source, other

SAIL's system enables linkage of additional datasets as acquired by SAIL or brought in for specific projects. Any Welsh data with required demographic fields can be processed and linked to existing datasets in the system.

Linkage variable

NHS number, name, date of birth, gender, postcode

Linkage completeness

Varies by dataset
Data management specifications that apply for the data source

Data source refresh

Monthly

Informed consent for use of data for research

Possibility of data validation

Can validity of the data in the data source be verified (e.g., access to original medical charts)?

Yes

Data source preservation

Are records preserved in the data source indefinitely?

Yes

Approval for publication

Is an approval needed for publishing the results of a study using the data source?

Yes

Data source last refresh

Common Data Model (CDM) mapping

CDM mapping

Has the data source been converted (ETL-ed) to a common data model?

Yes

CDM Mappings

Data source ETL status

Completed

Data source ETL specifications (link)

Data source ETL CDM version

5.3.1

Data source ETL frequency

12,00 months

Data source ETL status

Completed