Proprietary Medical Advice
Pay no attention to Epic's proprietary medical decision-support algorithm behind its closed-source curtain
In a curiously timed post on their ‘Cool Things’ blog, the electronic health records (EHR) software behemoth Epic Systems touted the efficacy of their tools’ predictive models for predicting and identifying acute medical events. The June 28th blog described that one such model was able to detect the potential presence of Sepsis in patients earlier on in their hospital stay than previous detection models were able to.
Sepsis is a life-threatening medical complication wherein the body’s natural response to the presence of an infection causes further damage to tissues and organs. Despite there being—in the United States alone—an estimated 1.7 million adult cases and approximately 270,000 deaths annually, its clinical guidelines for detection, diagnosis, and coding have been a moving target for decades.1
A February, 2020 analysis in the Journal of Thoracic Disease described the remarkable extent to which the epidemiological studies of trends in sepsis have been based upon non-constant clinical criteria and medical coding. It additionally explored a multitude of relevant potential biases that muddy research into the increasing or decreasing prevalence of diagnoses—
Unaccounted for changes in ICD-9-CM codes over the course of individual studies and across different studies, with particular changes and expansions to the sepsis-specific codes.
There is no surefire diagnostic test for sepsis, so diagnosis necessarily incorporates clinician subjectivity
Increasing use of hospice for end-of-life care in the US, which confounds the oft-used metric of in-hospital mortality rates.
Given that recent literature in clinical journals demonstrate that so many aspects of this life-threatening medical episode are equivocal in nature, it would probably stand to reason that broad collaboration and participation among clinical researchers would serve the objectives of all involved parties. Said better and more broadly, Yochai Benkler, of Harvard Law School’s Berkman Center for Internet and Society, writes—
Open collaborative innovation [recognizes] that the best people to solve a given problem are unlikely to work for the firm facing the problem, and that models of innovation that allow diverse people, from diverse settings, to work collaboratively on the problem will lead to better outcomes than production models that enforce strict boundaries at the edge of the firm.3
Conversely, to draw conclusions from the existing body of research, programatize these conclusions into algorithmic alerting in hospital EHRs, and proprietize that algorithm to shield it from independent third party validation, seems imprudent at best.
Indeed, one week before Epic’s blog post, researchers at the University of Michigan’s Michigan Medicine published an analysis of the Epic Sepsis Model’s (ESM) predictive accuracy in JAMA Internal Medicine. Summarizing their findings, Karandeep Singh, MD, MMSc, stated, “in essence, they developed the model to predict sepsis that was recognized by clinicians at the time it was recognized by clinicians. However, we know that clinicians miss sepsis.”4
The ESM is based upon data leveraged from all medical episodes coded as sepsis, with the onset of sepsis defined as the time that the physician identified it and intervened.5 Again, this data is amassed over a period of time during which there were changes to how sepsis is coded and the very sepsis identification criteria.
A key additional focus of their research was the model’s propensity to result in alert fatigue—the desensitization effect that occurs when physicians are presented with too many warnings.
Alert fatigue is a critical factor considered for designers of Clinical Decision Support Systems (CDSS), which are designed to link patient-specific information in EHRs with evidence-based knowledge to generate case-specific guidance messages through a rule-based or algorithm-based software program.6
On this point, the Michigan Medicine researchers—
Found the ESM to have poor discrimination and calibration in predicting the onset of sepsis at the hospitalization level. When used for alerting at a score threshold of 6 or higher (within Epic’s recommended range), it identifies only 7% of patients with sepsis who were missed by a clinician (based on timely administration of antibiotics), highlighting the low sensitivity of the ESM in comparison with contemporary clinical practice. The ESM also did not identify 67% of patients with sepsis despite generating alerts on 18% of all hospitalized patients, thus creating a large burden of alert fatigue.
They concluded that “the increase and growth in deployment of proprietary models has led to an underbelly of confidential, non–peer-reviewed model performance documents that may not accurately reflect real-world model performance.”
This specific research comes amid a backdrop of countless analogous automations introduced throughout the COVID-19 pandemic.
For example, Epic also introduced the fitting, if not ominously-named Deterioration Index for triaging COVID-19 patients. (For the moment, set aside any discomfort with investing in algorithms to ration hospital beds and ventilators, versus investing in supplying a sufficient inventory of hospital beds and ventilators.)
When not subject to external review, these algorithmic approaches often codify health inequities without necessary socioeconomic contextualization. They likewise may draw conclusions from historical medical episodes whose outcomes were contingent upon clinicians’ subjective judgment.
What outcomes can be expected from the machine learning models trained using a historic dataset of physician-recommended candidates for knee replacements, a procedure shown in some studies to be subliminally recommended to male patients three times more often than females?
What will a proprietary algorithm programatize from datasets of patients receiving treatment for spinal disorders, wherein patients that are treated as part of a worker’s compensation cases exhibit worse outcomes?
How can a closed-source model assure anyone that social contexts are considered in an algorithm’s development? How can external parties even validate the claims made by an algorithm’s proprietor? An article on the subject in the Journal of the American Medical Informatics Association describes ones such approach—
For example, a biomarker-based algorithm to diagnose ovarian cancer has a cost of $897 per patient (http://vermillion.com/2436-2/). Assume we want to validate this algorithm in a center that has 20% malignancies in the target population. If we want to recruit at least 100 patients in each outcome group, following current recommendations for validation studies, the study needs at least 500 patients. This implies a minimum cost of $448,500 in order to obtain useful information about whether this algorithm works in this particular center. It is important to emphasize this is just the cost required to judge whether the algorithm has any validity in this setting; there is no guarantee that it will be clinically useful.
That example shouldn’t be read cynically, as this philosophy—that you must pay to become a customer in order to be given the means to validate an algorithm—is part of Epic’s official retort to the Michigan Medicine publication. Their spokesperson stated to the industry website Health IT Analytics that—
The full mathematical formula and model inputs are available to administrators on their systems. Accuracy measurements and information on model training are also on Epic’s UserWeb, which is available to our customers.