Supervised Classification of Healthcare Text Data Based on Context-Defined Categories

Bolívar Gómez, Sergio; Nieto Reyes, Alicia; Rogers, Heather L.

doi:10.3390/math10122005

dc.contributor.author	Bolívar Gómez, Sergio
dc.contributor.author	Nieto Reyes, Alicia
dc.contributor.author	Rogers, Heather L.
dc.contributor.other	Universidad de Cantabria	es_ES
dc.date.accessioned	2023-01-16T15:24:12Z
dc.date.available	2023-01-16T15:24:12Z
dc.date.issued	2022
dc.identifier.issn	2227-7390
dc.identifier.other	MTM2017-86061-C2-2-P	es_ES
dc.identifier.other	MCIN/AEI/10.13039/501100011033	es_ES
dc.identifier.uri	https://hdl.handle.net/10902/27216
dc.description.abstract	Achieving a good success rate in supervised classification analysis of a text dataset, where the relationship between the text and its label can be extracted from the context, but not from isolated words in the text, is still an important challenge facing the fields of statistics and machine learning. For this purpose, we present a novel mathematical framework. We then conduct a comparative study between established classification methods for the case where the relationship between the text and the corresponding label is clearly depicted by specific words in the text. In particular, we use logistic LASSO, artificial neural networks, support vector machines, and decision-tree-like procedures. This methodology is applied to a real case study involving mapping Consolidated Framework for Implementation and Research (CFIR) constructs to health-related text data and achieves a prediction success rate of over 80% when just the first 55% of the text, or more, is used for training and the remaining for testing. The results indicate that the methodology can be useful to accelerate the CFIR coding process.	es_ES
dc.description.sponsorship	A.N.-R. is supported by Grant MTM2017-86061-C2-2-P funded by “ERDF A way of making Europe” and MCIN/AEI/10.13039/501100011033. For H.L.R., this study was funded by Instituto de Salud Carlos III through the project “PI17/02070” (co-funded by the European Regional Development Fund/European Social Fund “A way to make Europe”/“Investing in your future”) and the Basque Government Department of Health project “2017111086”. The funding bodies had no role in the design of the study, collection, analysis, nor interpretation of data, nor the writing of the manuscript. The APC was paid by PI17/02070	es_ES
dc.format.extent	31 p.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	MDPI	es_ES
dc.rights	© 2022 by the authors	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.source	Mathematics, 2022, 10(12), 2005	es_ES
dc.subject.other	Artificial Neural Networks	es_ES
dc.subject.other	Decision Tree	es_ES
dc.subject.other	Logistic LASSO	es_ES
dc.subject.other	Natural Language Processing	es_ES
dc.subject.other	Qualitative Data	es_ES
dc.subject.other	Supervised Classification	es_ES
dc.subject.other	Support Vector Machines	es_ES
dc.subject.other	Text Data Analysis	es_ES
dc.title	Supervised Classification of Healthcare Text Data Based on Context-Defined Categories	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.relation.publisherVersion	https://doi.org/10.3390/math10122005	es_ES
dc.rights.accessRights	openAccess	es_ES
dc.identifier.DOI	10.3390/math10122005
dc.type.version	publishedVersion	es_ES