Finding the best trade-off between performance and interpretability in predicting hospital length of stay using structured and unstructured data

Franck Jaotombo; Luca Adorni; Badih Ghattas; Laurent Boyer

doi:10.1371/journal.pone.0289795

Back

Finding the best trade-off between performance and interpretability in predicting hospital length of stay using structured and unstructured data

Journal article

Open access

Peer reviewed

Finding the best trade-off between performance and interpretability in predicting hospital length of stay using structured and unstructured data

Franck Jaotombo, Luca Adorni, Badih Ghattas and Laurent Boyer

PLOS ONE, Vol.18(11)

30/11/2023

DOI: https://doi.org/10.1371/journal.pone.0289795

Abstract

hospital length of stay

explainable AI

data fusion

structured and unstructured data

clinical transformers

Objective This study aims to develop high-performing Machine Learning and Deep Learning models in predicting hospital length of stay (LOS) while enhancing interpretability. We compare performance and interpretability of models trained only on structured tabular data with models trained only on unstructured clinical text data, and on mixed data. Methods The structured data was used to train fourteen classical Machine Learning models including advanced ensemble trees, neural networks and k-nearest neighbors. The unstructured data was used to fine-tune a pre-trained Bio Clinical BERT Transformer Deep Learning model. The structured and unstructured data were then merged into a tabular dataset after vectorization of the clinical text and a dimensional reduction through Latent Dirichlet Allocation. The study used the free and publicly available Medical Information Mart for Intensive Care (MIMIC) III database, on the open AutoML Library AutoGluon. Performance is evaluated with respect to two types of random classifiers, used as baselines. Results The best model from structured data demonstrates high performance (ROC AUC = 0.944, PRC AUC = 0.655) with limited interpretability, where the most important predictors of prolonged LOS are the level of blood urea nitrogen and of platelets. The Transformer model displays a good but lower performance (ROC AUC = 0.842, PRC AUC = 0.375) with a richer array of interpretability by providing more specific in-hospital factors including procedures, conditions, and medical history. The best model trained on mixed data satisfies both a high level of performance (ROC AUC = 0.963, PRC AUC = 0.746) and a much larger scope in interpretability including pathologies of the intestine, the colon, and the blood; infectious diseases, respiratory problems, procedures involving sedation and intubation, and vascular surgery. Conclusions Our results outperform most of the state-of-the-art models in LOS prediction both in terms of performance and of interpretability. Data fusion between structured and unstructured text data may significantly improve performance and interpretability.

Files and links (3)

pdf

pone.0289795-3Download View

Open Access CC BY V4.0

pdf

journal.pone.0289795Download View

Open Access CC BY V4.0

url

https://doi.org/10.1371/journal.pone.0289795View

Published (Version of record) Open

Metrics

4 File views/ downloads

14 Record Views

Details

Title: Finding the best trade-off between performance and interpretability in predicting hospital length of stay using structured and unstructured data
Creators: Franck Jaotombo (Author) - Aix-Marseille Université
Luca Adorni (Author)
Badih Ghattas (Author) - Aix-Marseille Université
Laurent Boyer (Author) - Assistance Publique Hôpitaux de Marseille
Publication Details: PLOS ONE, Vol.18(11)
Format: 22 p.
Number of pages: 22 p.
Identifiers: 9918193309453
Academic Unit: Department of Operations, Data & Artificial Intelligence; QUANT - Quantitative Methods in Business
Language: English
Resource Type: Journal article
Local Fields: 742959

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this contribution

Collaboration types: Domestic collaboration; International collaboration
Citation topics: 6 Social Sciences; 6.185 Communication; 6.185.2797 AI Ethics
Web of Science research areas: Computer Science, Artificial Intelligence