MEM and MEM4PP: new tools supporting the parallel generation of critical metrics in the evaluation of statistical models

Homocianu, Daniel; Tirnauca, Cristina

doi:10.3390/axioms11100549

Ver/Abrir

MEMMEM4PP.pdf (4.214Mb)

Identificadores

URI: https://hdl.handle.net/10902/28194

DOI: 10.3390/axioms11100549

ISSN: 2075-1680

Fecha

2022-10-12

Derechos

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/)

Publicado en

Axioms, 2022, 11(10), 549

Enlace a la publicación

https://doi.org/10.3390/axioms11100549

Palabras clave

Regression and classification models

Collinearity and reverse causality checks

Performance analysis

Automation and parallelization tools

Resumen/Abstract

This paper describes MEM and MEM4PP as new Stata tools and commands. They support the automatic reporting and selection of the best regression and classification models by adding supplemental performance metrics based on statistical post-estimation and custom computation. In particular, MEM provides helpful metrics, such as the maximum acceptable variance inflation factor (maxAcceptVIF) together with the maximum computed variance inflation factor (maxComputVIF) for ordinary least squares (OLS) regression, the maximum absolute value of the correlation coefficient in the predictors' correlation matrix (maxAbsVPMCC), the area under the curve of receiving operator characteristics (AUC-ROC), p and chi-squared of the goodness-of-fit (GOF) test for logit and probit, and also the maximum probability thresholds (maxProbNlogPenultThrsh and maxProbNlogLastThrsh) from Zlotnik and Abraira risk-prediction nomograms (nomolog) for logistic regressions. This new tool also performs the automatic identification of the list of variables if run after most regression commands. After simple successive invocations of MEM (in a .do file acting as a batch file), the collectible results are produced in the console or exported to specially designated files (one .csv for all models in a batch). MEM4PP is MEM?s version for parallel processing. It starts from the same batch (the same .do file with its path provided as a parameter) and triggers different instances of Stata to parallelly generate the same results (one .csv for each model in a batch). The paper also includes some examples using real-world data from the World Values Survey (the evidence between 1981 and 2020, version number 1.6). They help us understand how MEM and MEM4PP support the testing of predictor independence, reverse causality checks, the best model selection starting from such metrics, and, ultimately, the replication of all these steps

Colecciones a las que pertenece

D21 Artículos [438]

© 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/)

Excepto si se señala otra cosa, la licencia del ítem se describe como © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/)