The choice of HbA1c as illustrative example is based on four main reasons: (i) T2DM is a clinical context which is currently debated for the recent approval of SGLT-1/2 inhibitors, whose indication required specific values of HbA1c to be prescribed; (ii) T2DM can be accurately identified both in clinical and administrative databases, since every clinical process (drug prescriptions, outpatient visits, clinical examinations, hospital admissions) related to this condition can be retrieved in these data sources; (iii) HbA1c values are expectedly well-registered in clinical data source (i.e., missing values [n around 30%]) for most of the T2DM patients, so allowing the use of multiple imputation (MI) methods; (iv) this patients category is featured by comorbidities which can be commonly defined in clinical and administrative databases to form the covariates vector for the model imputing HbA1c values. Although this algorithm was not developed for prognostic purpose, we were compliant with Transparent Reporting of Multivariable Prediction Model for Individual Prognosis and Diagnosis (TRIPOD) statements. To develop a model to estimate HbA1c values to identify the diabetes patients being eligible to SGLT-2 inhibitors (ATC: A10BK*; A10BD*), in both data sources, we excluded those already prescribed with these medications in the overall look-back period. Still in both databases, we included those prescribed (i.e., at least two prescriptions) with metformin in 2018 and adherent to this medication as per a variable medicine possession ratio (VMPR)≥80%. Namely,
VMPR was operationally defined as the cumulative number of days for each prescription (i.e., the number of Prescribed Daily Dosages) divided by the number of variable days of follow-up of each drug users. Finally, only for HSD, the date of highest values of HbA1c after metformin use, during 2018, was the study event date. Thus, according to the eligibility criteria for SGLT-2 inhibitors, HSD was used to develop and test the algorithm estimating HbA1c values ≥7%, which are not available in administrative data source. Given the presence of common covariates in HSD and ReS database, the combination of beta coefficients, composing the algorithm
obtained with HSD, was adopted to estimate the missing values of HbA1c in the ReS data source. The demographics and clinical determinants used to develop and apply (to ReS database) the imputation algorithm were operationally defined using ICD-9-CM and ATC codes in keeping with the same harmonization process previously described.