Supervised Classification of Healthcare Text Data Based on Context-Defined Categories

Bolívar Gómez, Sergio; Nieto Reyes, Alicia; Rogers, Heather L.

doi:10.3390/math10122005

Ver/Abrir

SupervisedClassifica ... (587.6Kb)

Identificadores

URI: https://hdl.handle.net/10902/27216

DOI: 10.3390/math10122005

ISSN: 2227-7390

Fecha

2022

Derechos

Publicado en

Mathematics, 2022, 10(12), 2005

Editorial

MDPI

Enlace a la publicación

https://doi.org/10.3390/math10122005

Palabras clave

Artificial Neural Networks

Decision Tree

Logistic LASSO

Natural Language Processing

Qualitative Data

Supervised Classification

Support Vector Machines

Text Data Analysis

Resumen/Abstract

Achieving a good success rate in supervised classification analysis of a text dataset, where the relationship between the text and its label can be extracted from the context, but not from isolated words in the text, is still an important challenge facing the fields of statistics and machine learning. For this purpose, we present a novel mathematical framework. We then conduct a comparative study between established classification methods for the case where the relationship between the text and the corresponding label is clearly depicted by specific words in the text. In particular, we use logistic LASSO, artificial neural networks, support vector machines, and decision-tree-like procedures. This methodology is applied to a real case study involving mapping Consolidated Framework for Implementation and Research (CFIR) constructs to health-related text data and achieves a prediction success rate of over 80% when just the first 55% of the text, or more, is used for training and the remaining for testing. The results indicate that the methodology can be useful to accelerate the CFIR coding process.

Colecciones a las que pertenece

D21 Artículos [438]
D21 Proyectos de Investigación [344]