Comparative effectiveness research (CER) and patient-centered outcomes research (PCOR) routinely use secondary data (eg, insurance claims, health records). Leveraging secondary data requires effective and accurate record linkage (RL), that is, matching the same individuals in different data sets. The absence of common, error-free, unique identifiers across data sources challenges RL and forces the use of identifying information (ie, names) to ensure proper linkage. This, in turn, raises privacy concerns. While automated methods are useful, high-quality RL requires human interaction (eg, parameter settings, building training data sets, validating results). Consequently, managing errors from imperfect and complex real-world data requires human access to identifiable data.
