Matching in Nested Case-Control Studies

We developed a simulation tool to explore tradeoffs in statistical efficiency when using different matching criteria to create a nested case-control study from a larger cohort. For multivariable analyses of cancer outcomes in cohort studies, Cox Proportional Hazard models are commonly used and the resulting Hazard Ratio is often interpreted as an estimate of the incidence rate ratio (IRR). When paired with the appropriate analytic methods, a nested case-control study, which uses all or a subset of cases along with matched controls, estimates the rate ratio that would otherwise have been observed in a full cohort analysis. Since the nested case-control design requires the collection and measurement of exposure, covariate, and biomarker data on fewer subjects than a full cohort analysis would, the design is logistically efficient. This is particularly appealing for studies that utilize biomarkers measured in biological specimens that were collected from subjects when they entered the cohort – biomarker assays are often expensive and the biological specimens are valuable and need to be utilized as efficiently as possible.

In a nested case-control study incidence density sampling of controls is used such that as each case arises, a control(s) is randomly selected from all those subjects who have accrued at least the same length of follow-up as the case. Thus all nested case-control studies match on duration of follow-up. Further matching of controls to cases can be implemented so that controls and cases are similar on covariate characteristics. Conditional logistic regression analyses of the nested case-control data produces a non-biased estimate of the incidence rate ratio that otherwise would have be estimated from full cohort analyses.

The matching of controls to cases on covariates presents trade-offs and matching criteria need to be established that best balances these tradeoffs. Matched analyses adjust for the confounding affects of the matched factors with increased statistical efficiency as compared to a non-matched design with analyses that include variables for these factors in the regression model. The downside is that matching requires ‘Goldilocks matching criteria’ that are neither too stringent nor not stringent enough. Overly stringent matching criteria can lead to instances in which a qualified control(s) cannot be found for a case. In this instance the case is dropped from the analysis potentially causing selection bias and a decrease in statistical power. However, matching that is too lenient can result in the creation of case-control pairs that are still quite dissimilar resulting in residual confounding being present in the analyses.

The Match-o-matic models these tradeoffs by having users create a “true” exposure in a cohort along with its associations with other matching criteria. Users can then select matching criteria and compare the estimated risk from the exposure in the unmatched case-control study, the matched case-control study, and the full cohort. We built the Match-o-matic using R Shiny and the Match-o-matic is hosted by R Studio. The link to the Match-o-matic is here.