Leveraging Machine Learning Using Digital Twins in Alzheimer’s Disease Clinical Research and Beyond

Three Practical Applications Surrounding the Intersection of Machine Learning, Precision Medicine, and Clinical Trials

Apr 08, 2024

This is part 2 of a Substack series on the different roles of machine learning in clinical trials. I recommend checking out part 1 as many of the concepts discussed there may help in better understanding this piece.

A famous scene in the movie The Matrix depicts the main protagonist, Neo, choosing between the red pill and blue pill offered by the enigmatic Morpheus. Once the choice is made, warns Morpheus, it cannot be reverted. In some sense, delivering care to patients is similar. Although there is an array of options available, once a decision is made, the patient is sent down a path they cannot return from. Yet, that “counterfactual,” or what would have happened to the patient under the other treatment choices, matters a lot. For instance, a patient could have rapidly declined in health regardless of the treatment option indicating that there is no difference in the choice of treatment. This is essentially the basis of why clinical trials have control arms: to give us a comparator group to measure the counterfactual.

To inform patient-level decision-making, we often use randomized clinical trials (RCTs), which compare treatment groups via aggregated outcomes (e.g. averages). The canonical example is where the outcomes of the active treatment group are contrasted with the outcomes of those who received the placebo. Such results, though useful, can be imprecise to a specific patient’s outcome if treatment effects change depending on individual characteristics. This notion has long been encompassed in the statistical notions of “effect modification” and the ecological fallacy. Recently, these concepts have been developed into what we currently call “precision medicine,” or the idea of targeting the right treatments to the right patients at the right time.

The need for precision medicine is salient across many disease areas including Alzheimer’s Disease (AD). For example, one study found that the recently FDA-approved drug lecanemab had a higher rate of harmful side effects for carriers of the APOE4 allele. These findings indicate that the consideration of APOE4 status is important before prescribing lecanemab. Besides APOE4 status, there may be other factors that may modify the effectiveness and safety of lecanemab such as race, comorbidities, and lifestyle choices that we may want to consider when making treatment decisions. Here, to make informed decisions for patients, we could use evidence would be from subgroup analyses within clinical trials that look at how the drug’s effect differed across single attributes (e.g. male or female, BMI). However, the main limitations of these results are that the same sizes are often small leading to potentially spurious findings, and they do not consider the overall impact of all the attributes simultaneously. The concept of a “digital twin” goes a step further than subgroup analyses by integrating a potentially large set of patient attributes to determine patient outcomes such as cognitive decline.

The concept of digital twins begins with the question “what is the best control for you?” Unsurprisingly, the answer is an exact copy of you in a parallel universe where we can see what would happen under another treatment path. Nevertheless, we do not currently possess the same technology as Doc Brown’s DeLorean time machine in Back to the Future, so we instead substitute the theoretical “exact copy” of you with patients who are similar to you with respect to disease progression. We can do this by first training a machine learning (ML) algorithm using historical data from previous trials, electronic health records, and insurance claims. Then, using that model, a digital twin would be computed by predicting your outcome based on your specific characteristics under a supported treatment option. In AD, the authors in this paper used vital signs, blood tests, baseline cognitive function, and biomarkers to train a model to predict AD progression. Subsequently, this model was applied to a new set of patients to generate patient outcomes under what the authors considered to be the standard of care, or best available treatment currently in practice.

While the fundamental framework of RCTs has remained unchanged since the days of Sir Ronald A. Fisher in the 1920s, both trial design and statistical methodology have evolved. Digital twins will perhaps bring the next iteration in how we design, run, and analyze RCTs. In the field of AD and beyond, digital twins can be used to both streamline the discovery of new treatments and improve outcomes with the treatments we already have. In this post, I discuss three important use cases of digital twins in AD clinical research: reducing sample sizes and costs in clinical trials, enriching clinical trials with those most likely to experience cognitive decline, and informing treatment decisions by predicting best responders to treatments.

Use Case #1: Reducing Sample Sizes and Costs in Clinical Trials

Opportunities

In my previous Substack post, I discussed how adjustment for factors that can predict a patient’s outcome under the control treatment can reduce the variability, or statistical noise, of estimates for treatment effect. This reduction, in turn, allows investigators to recruit fewer patients while still achieving the same level of power, or probability to detect a treatment effect given it is truly there. In my post, I further covered how Unlearn.ai was using models trained on historical data to predict outcomes (i.e. a prognostic model) under the control. This prognostic score would then be adjusted for in the final statistical model. This approach is called prognostic covariate adjustment (PROCOVA). The main idea is that if we can determine the variance reduction that PROCOVA can provide before trial recruitment, then we can enroll fewer patients at the same power level. As a result, we could save trial costs and complete trials more quickly, potentially boosting the number of treatments we are able to test in a given period.

Unlearn’s digital twin product uses the prognostic model in a different manner than PROCOVA. While the same historical data is used to train an initial model, Unlearn’s method is deployed to simulate the outcomes of patients in the treatment arm as if they were given control. Then, based on the difference between the actual outcome under the treatment arm and the predicted outcome under control, the treatment effect is calculated. In a sense, the digital twin is a quasi-single arm trial – quasi because patients are still recruited to the control treatment but not to calculate the treatment effect. Instead, they are used to fine-tune the predicted control outcomes. That is, the actual outcomes of control patients are used to validate and potentially correct the prognostic model before applying it to the patients in the treatment arm. The central point is that the number of patients needed to fine-tune the prognostic model is less than how many we would recruit in a traditional RCT but still maintain the same power level.

Considerations

Two aspects of digital twins in this context require more research to flesh out. First, it is not clear we should a priori determine the number of patients that should be recruited in order to validate and correct our prognostic model to make sure our predictions are not biased. The ideal outcome would be if our digital twin trial result was the same as if we ran a traditional clinical trial with a real control arm. Second, special caution must be taken to incorporate the variability of ML predictions into sample size calculations and treatment effect estimates. Each prediction a model makes has a certain level of uncertainty associated with it and this must be contained within a reasonable threshold similar to confidence intervals. Thus, a large risk is that the sample size needed for precise predictions could far outnumber that of a traditional control arm, rendering the approach of limited use. To some extent, these issues can be studied and mitigated through initial analyses and simulations prior to trial enrollment.

There are also practical concerns around the acquisition of historical data to train prognostic models I discussed in my previous Substack post. In particular, curating data to match the exact control treatment and inclusion-exclusion criteria of the present trial can be difficult. I argued most observational datasets would likely fail to meet these requirements depending on the available covariates, how the covariates were measured, and whether the care received would be the same as if the patient was enrolled in a trial.

Use Case #2: Trial Enrichment on Expected Disease Progression

Opportunities

In disease areas with no known cure like AD, a drug is usually deemed successful if the subjects in that arm’s disease progresses slower compared to the control arm. That comparison, in part, depends on how long we run the trial. As an extreme example, if a trial testing a drug intended to slow cognitive decline lasted for merely 30 days, it would be inconclusive because have not observed the patients for long enough to make meaningful conclusions. Alternatively, if we observed patients for five years, many subjects would experience meaningful cognitive decline and we would easily be able to detect if the drug works if it truly does. Nevertheless, we would probably know if the drug worked much before the five years were up. This means that running the trial for the full length would unnecessarily waste resources and delay potentially efficacious drugs from reaching patients in need. In this sense, trialists aim to minimize the length of a trial while still maintaining an appropriate length to detect a meaningful treatment effect and collect safety data. One way this is done is through trial subject “enrichment.”

Enrichment can be boiled down to the idea of specifically recruiting patients who have a higher chance to significantly progress in their disease state more quickly than others. That way, we can run the trial for far less time compared to just randomly sampling the population and still observe a treatment effect if it exists. In MCI and early AD trials, it is common to recruit individuals with at least one APOE4 allele at disproportionate rates because they are more likely to experience rapid cognitive decline.

Yet, APOE4 status is not the only predictor of more rapid cognitive decline and to incorporate a wider range of subjects in enrichment, we could use digital twins. That is, based on baseline characteristics and medical history, each patient wishing to enroll in a clinical trial could have a prognostic score generated via her digital twin. Then, only patients above a certain threshold would be recruited while those below could be tracked for potential enrollment in future trials if their prognosis changes. The ethics regarding disclosure of prognostic status would need to be considered and is similar to considerations surrounding APOE4 disclosure. Furthermore, if the patient knew they were susceptible to decline, then their outcomes may be influenced by this fact.

Considerations

If we are to use prognostic scores for trial recruitment, then we must ensure they are generalizable and translatable into practice. This can be achieved by the AD community jointly settling on a common prognostic score that could be used in both trial and real-world settings. Such an undertaking is not without precedent; the Society of Thoracic Surgeons (STS) gauges cardiovascular surgery operative risk via the STS predicted risk of mortality score (STS-PROM). Furthermore, STS-PROM is commonly used as an inclusion-criteria in clinical trials. In AD, a similar score could be developed using large, openly available datasets such as the National Alzheimer’s Coordinating Center and Alzheimer’s Disease Neuroimaging Initiative.

Use Case #3: Precision Medicine and Finding Best Responders to Treatments

Opportunities

All patients are different in some way. As such, much of the art of medical practice is choosing the best treatment course on a patient-by-patient basis. While some patients may benefit from one medication, others may not. Digital twins can be employed to maximize the patient benefit of already existing and approved treatments. For example, though lecanemab gained FDA approval, the Institute for Clinical and Economic Review (ICER) voted 12-3 that the evidence surrounding the drug “is not adequate to demonstrate a net health benefit of lecanemab when compared to supportive care alone.” ICER’s opinion suggests that more work needs to be done to identify which subpopulations could benefit the most from lecanemab. One potential path forward is through digital twins to help inform treatment decisions.

A naive approach would be to build a model that predicts a patient’s outcome if they were to take lecanemab and use these predictions alone to drive treatment decisions. For a more robust approach, however, we must go a step further and establish the counterfactual, or what would happen if they received the current standard of care. Given enough data on the novel treatment, for each patient, we could have two sets of digital twins: one under the proposed treatment and another for the standard of care. The difference in the predicted outcomes under a set of treatment options would help support treatment decisions depending on which direction it pointed toward.

Considerations

Crucially, precision medicine with digital twins should be examined in pragmatic trials versus standard of care. It is not enough to assume adequate model predictive performance will ensure good patient outcomes. In other disease areas, such trials are ongoing. For example, a Cleveland Clinic-sponsored trial is testing a personalized treatment recommendation system versus usual care to see if it can improve outcomes of type 2 diabetes patients (NCT05181449). As more trials of these types are conducted in different settings, general frameworks can be developed, which can then be applied to dementia.

Future Directions for Digital Twins

The stage is set for digital twins to play a bigger role in clinical research: we have the methodology, the data, and, most importantly, the many clinical unmet needs in AD and other fields. I have discussed three use cases that can potentially save clinical trial costs, quicken the time until approval, and better utilize the treatments we already have for the patients that need them the most. For these reasons, biotech companies, academic researchers, and healthcare systems alike should be investigating how digital twins can help assist clinical research and practice. In this vein, there are many next steps:

1. For quasi-single-arm digital twin trials, studying the robustness of statistical estimates that rely on prognostic models, including establishing guidelines on the adequate sample size required to be recruited in the control arm.

2. For enrichment on the prognostic score, collaborating to establish a common model that can be used in both clinical trials and practice much like the STS PROM score.

3. For precision medicine, building models that weigh different treatment decisions and designing clinical trials that validate them.

The way healthcare is researched and practiced has rapidly advanced since the advent of evidence-based medicine in the 1960s. This includes everything from the growing amounts of data from electronic health records and insurance claims to developments in trial design and statistical methodology. In this vein, digital twins have great potential to encompass the next iteration of medicine.

Machine Learning in Healthcare

Discussion about this post