Predicting future patient healthcare expenditures from chest X-rays using deep learning: a pilot study
Chest x-ray data
All procedures in this study were approved by the Institutional Review Board of the University of California, San Francisco Medical Center, California, USA and performed in accordance with applicable guidelines and regulations. The institutional review board at the University of California, San Francisco Medical Center, CA, USA waived the requirement for informed consent for the retrospective use of CXR data. All participants were non-obstetric adult patients who presented to the emergency department (ED) between July 1, 2012 and November 30, 2016 and received a chest x-ray in the ED or at an outpatient facility on the day of ED presentation. . 34,743 frontal chest radiographs were initially identified belonging to 30,823 patients, associated with the patient’s age, gender, zip code, and corresponding patient cost at the UCSF Medical Center within 1, 3, or 5 consecutive years. At our facility, frontal chest radiographs for adult patients are typically obtained at 100-120 kVp with automatic exposure control. The distance between the source and the detector is fixed at 72 inches, unless modified for particular reasons. Lateral chest radiographs were excluded. More than 90% of the images were acquired on GE and Philips equipment.
Health expenditure data
Health expenditure data were obtained from the cost accounting unit of the institution’s hospital financial department. Total health expenditure was based on the sum of direct and indirect expenditure attributed to patient stay in hospital, pharmacy, laboratory, imaging, surgeries and medical consultations during the period during which they were included in the study. As a result, we selected total health expenditure over the next 1, 3 and 5 years.
12,869 (37.0%) of 34,743 chest x-rays were excluded from data processing due to some missing patient information (Suppl. Fig. S1). 11,857 (92.1%) of the excluded X-rays had no information available on their healthcare expenditures. The remaining 1012 excluded chest radiographs (7.9%) included 8 with no associated gender, 128 with no associated postal code, and 876 whose postal code could not be matched to median income.
Exploratory data analysis
A pairwise chi-square test was performed between cost groups (above and below median patient expenditures) and patient demographic variables such as age groups, gender, geographic area and location. race. The effect of demographic variables (gender, age, race, and median income by postcode area) and their second-order interaction terms on logten-transformed costs were analyzed in a multifactor ANOVA. A missing data analysis was performed to compare the 1,004 excluded CXRs (not counting the 8 of unknown gender) and the 21,872 included CXRs. We analyzed the association of absence with demographic variables (sex, geographical area and race) as well as the heaviest and least heavy consumers using the chi-square test, and the association with age help you-test (supplementary table S1). Of the 21,872 chest x-rays with 1-year expenditures, 9,477 (43.3%) are missing expenditure amounts for 3 or more years and 20,073 (91.8%) are missing expenditure amounts for 5 years. To investigate the contribution of survival bias (i.e., patients living longer may have lower medical costs because they were healthier or did not have to pay for end-of-life care ), we compared the median expenditures of patients who dropped out versus those who stayed by the 3rd and 5th year after taking CXR.
Regression models were developed to predict healthcare expenditures, and binary classification models were developed to predict whether a participant’s healthcare expenditures were in the top 50%. The regression and classification models have been developed in four versions: (T for “tabular data”) basic model which relies solely on the patient’s sex, age and median postcode income as input, (X for “X-rays”) ResNet11 with only input CXR images, (TX1) separately trained T and X models combined at the end stage, and (TX2) end-to-end modified ResNet training with CXR images, age, sex and median income by postal code as input . The reference (T) and classification reference regression models were gradient-boosting regressors12.13 and an AdaBoost classifier14 respectively implemented in Python package scikit-learn with default settings. The CXR-only (X) regression and classification models were a modified ResNet18 model and a modified ResNet50 model, respectively. For the combined model TX1 (Suppl. Fig. S2), the raw softmax score or the final output (regression) of the model (X) was concatenated to the categorical data and then processed with the model approach (T) to arrive at the exit. For the combined TX2 model (Suppl. Fig. S3), the neural network architectures of the (X) model were modified after the final convolutional layers to allow concatenation of categorical data into the end-to-end neural network model. See additional figures. S2, S3 and eAppendix for implementation details.
All versions of ResNet have been initialized with pre-trained weights on ImageNet15.16. For all models, hyperparameters such as learning rate, line layer dimension, number of line layers, and others were empirically optimized through random search.17. After tuning and training the hyperparameters, the models were evaluated against the pre-split test set18. The training, validation, and testing set has been split by patient ID number to ensure that no CXR from the same patient is represented in multiple datasets. The outputs of the classification model were assessed using the area under the receiver operator characteristic curve (ROC-AUC) and the F1 score. The ROC-AUC of the classification models were compared in pairs using the DeLong method19. The outputs of the regression model were measured using Pearson’s R and Spearman’s ρ. Confidence intervals (95%) were calculated for all statistics. Each training and assessment was conducted for 1 year (21,872 CXR), 3 years (12,395 CXR), and 5 years of expenditure (1,779 CXR), respectively. Since the 1-year expenditure data was the most comprehensive, all subsequent analyzes should be assumed to be based on 1-year expenditure unless otherwise stated.
For the error analysis, we asked whether the absolute difference between the actual cost value and the predicted cost value was correlated with any of the patient demographic factors. The linear model used percentage differences (|actual cost − predicted cost|/actual cost) as the dependent variable and patient gender, race, age, median income, and overall actual cost as covariates.