So you have conducted a clinical study and used the data to train a Machine Learning classifier, perhaps embedded in a device, to make predictions on future subjects for the presense/absence of a disease or a risk factor for a disease and you are planning a validation (confirmatory) clinical trial for registrational purposes?
In our 10+ years of experience in this area, we identify the major risks of failure of a validation trial to show the desired level of accuracy in predicting for new subjects as follows:
Inadequate training: This is related to the training set sample size and the ratio of heterogeneity captured by the training data to the real-world heterogeneity. Training of machine learning models/algorithms for diagnostic purposes is often constrained by time and resources. Also there is no straightforward way to estimate the sample size without making assumptions that may be unrealistic such as assumptions on prevalence/incidence; model complexity; and bias and variance considerations. This poses challenges to ensure that adequte training has been achieved before starting the validation trial. Cross-validation often leads to over-estimated accuracy measures which might be compounded by possible overfitting to the training data.
Population Heterogeneity : Often, training data are collected from a small number of local sites while the validation trial often needs to be conducted over different geographical locations for establishing generalizability and for registrational purposes. This increases the risk of failure in the validation trial - for example, if the feature distributions in the validation set show much larger heterogeneity than in the training set.
Inadequate Testing before Validation: Contraints such as time, resources and availability of a holdout set may lead to inadequate testing for robustness, generalizability and reproducibility. For probability or score classifiers this may also lead to sub-optimal threshold selection which may lead to poor accuracies for the validation data.
In order to mitigate such risks, we propose and have supported several sponsors with an adaptive approach to augmented training + validation. An example of such an approach is depicted in the figure below:
An example of an adaptive augmented-training + validation trial design
The above adaptive strategy can also be generalized into a seamless training + validation trial while following a learning curve (accuracy vs. training set size) to decide when to start the validation phase.
Our statisticians and data scientists have experience both in setting up efficient machine learning workflows as well as in designing risk-mitigated validation studies. We can also support in regulatory approval of such adaptive study designs and in general for all statistical aspects related to the training and validation of machine learning classifiers.