Study type

Study topic

Other

Study topic, other

Validation study in electronic health data research

Study type

Non-interventional study

Scope of the study

Validation of study variables (exposure outcome covariate)

Data collection methods

Secondary use of data
Non-interventional study

Non-interventional study design

Cross-sectional
Study drug and medical condition

Name of medicine, other

Several combinations of drugs indicative of a comorbidity

Additional medical condition(s)

Several comorbidities that are common, non-communicable, are pharmacological treated and where medications are reasonably specific to treating the condition.
Population studied

Short description of the study population

Adult patients of UK GP practices who, between 2004 and 2024, have been registered with a GP practice that contributes to Optimum Patient Care Research Database (https://opcrd.optimumpatientcare.org/).

Age groups

  • Adult and elderly population (≥18 years)
    • Adults (18 to < 65 years)
      • Adults (18 to < 46 years)
      • Adults (46 to < 65 years)
    • Elderly (≥ 65 years)
      • Adults (65 to < 75 years)
      • Adults (75 to < 85 years)
      • Adults (85 years and over)
Study design details

Study design

The study design is a retrospective data linkage study, following the design of an evaluation of medical tests for classification and prediction.

Main study objective

The primary aim of this study is to assess the validity of using prescription records for identifying patient comorbidities. We additionally identify predictors for misclassification.

Setting

A random subset of the study population is drawn from the OPCRD database. For each patient in the subset, a random index date is created between 2004 and 2024 that allows a 3-year follow-up (follow-up period). For the follow-up period, all medical records are interrogated for a diagnosis indicative of a specific disease (gold standard). For the same period, all prescription records are interrogated for a prescription record indicative of the same disease (index test).

Outcomes

Sensitivity, Specificity, Predictive Values and Likelihood Ratios will be calculated for each disease. To identify predictors for misclassification, test accuracy is calculated for different strata of the study population.

Data analysis plan

The probability of misclassification (positive predictive value and negative predictive value) conditional on potential predictors (covariates) is calculated using multilevel logistic regression models. Observations from patients of the same GP practice are most likely correlated, therefore the practice identifier will be included in any model as random effect.
In sensitivity analysis, we will potentially include (a) a shorter study period (1 year); (b) compare alternative code lists (e.g., only include codes that are indicative of a severe case); (c) analyze time-to-prescription in a survival analysis; (d) modify the definition of a test positive record (e.g., 2 or more prescriptions).
Data management and analysis will be conducted using Microsoft SQL Server and Stata (V18), respectively.