© The Author(s) 2024. This article is licensed under a Creative Commons
Attribution-NonCommercial-NoDerivatives 4.0 International License,
which permits any non-commercial use, sharing, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons licence, and indicate if you modified
the licensed material. You do not have permission under this licence to share adapted
material derived from this article or parts of it. The images or other third party material in
this article are included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in the article’s
Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this licence, visit http://
creativecommons.org/licenses/by-nc-nd/4.0/
Schizophrenia, 2024, 10(1), 89
Several multivariate prognostic models have been published to predict outcomes in patients with first episode psychosis (FEP), but
it remains unclear whether those predictions generalize to independent populations. Using a subset of demographic and clinical
baseline predictors, we aimed to develop and externally validate different models predicting functional outcome after a FEP in the
context of a schizophrenia-spectrum disorder (FES), based on a previously published cross-validation and machine learning
pipeline. A crossover validation approach was adopted in two large, international cohorts (EUFEST, n = 338, and the PSYSCAN FES
cohort, n = 226). Scores on the Global Assessment of Functioning scale (GAF) at 12 month follow-up were dichotomized to
differentiate between poor (GAF current < 65) and good outcome (GAF current ≥ 65). Pooled non-linear support vector machine
(SVM) classifiers trained on the separate cohorts identified patients with a poor outcome with cross-validated balanced accuracies
(BAC) of 65-66%, but BAC dropped substantially when the models were applied to patients from a different FES cohort
(BAC = 50–56%). A leave-site-out analysis on the merged sample yielded better performance (BAC = 72%), highlighting the effect
of combining data from different study designs to overcome calibration issues and improve model transportability. In conclusion,
our results indicate that validation of prediction models in an independent sample is essential in assessing the true value of the
model. Future external validation studies, as well as attempts to harmonize data collection across studies, are recommended.