中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
使用机器学习来预测迟发性运动障碍的发生及银杏叶的治 疗反应

文献类型:学位论文

作者ULUDAG KADIR
答辩日期2023-01
文献子类博士
授予单位中国科学院大学
授予地点中国科学院心理研究所
其他责任者张向阳
关键词机器学习 迟发性运动障碍预测 遗传学 生物标志物 抗精神病药物类型
学位名称理学博士
学位专业应用心理学
其他题名Predicting Tardive Dyskinesia and Treatment Response to Ginkgo Biloba by Using Machine Learning
中文摘要Background: Tardive dyskinesia (TD) is a common extrapyramidal symptom that substantially affects many schizophrenia (SCZ) patients who use antipsychotics (AP) for more than three months. Many studies have investigated its relationship with various environmental and biological parameters such as genetics, epigenetics, hormonal, clinical, physiological, and immunological variables. However, previous studies are inconsistent, and only a few studies aim to predict TD development risk using novel ML models, including random forest (RF), neural networks (NN), naive bayes (NB), or support vector machine (SVM) models. Furthermore, its treatment with antioxidant Ginkgo Biloba (EGB) was suggested to improve the severity of TD symptoms. In addition, it is necessary to detect relevant factors that impact treatment response to potential therapeutics such as EGB and APs. Methods: In the first section of the study, as a first hypothesis, we investigated the prevalence, clinical associations, and risk factors of TD development in Chinese patients with chronic SCZ. Nine hundred-one Chinese inpatients with SCZ were included. Later, in the second section of the study, within the study's second hypothesis, our goal was to predict TD in this sample. Next, as a third hypothesis, we included only male and smoker patients to predict TD and reduce the confounding factors' impact. In detail, 338 smokers and males with chronic SCZ were recruited from Hebei Rongjun Hospital in Baoding (city of China) to create a random forest (RF) algorithm. Schooler and Kane criteria were used to assess TD. One hundred sixty-five of the patients were diagnosed with TD, while 173 of the patients were not diagnosed with TD. Similar to the sample in Hebei Rongjun Hospital, 74 smokers and male patients were selected from Beijing HuiLongGuan Hospital to validate the RF algorithm. Among them, 14 patients had TD (18.9%), while 60 of the patients did not have TD (81.08%). Next, for the ML model (n=76), the method of RF was used since the ML model could predict TD successfully with over 70% accuracy. Single nucleotide polymorphism (SNP) was analyzed using polymerase chain reaction (PCR)-based methods. In addition, patients were analyzed according to Superoxide Dismutase (SOD) (fifth hypothesis) and AP type categories (sixth hypothesis) to gain insights regarding the most predictive factors using RF, NN, NB, and SVM models. In the third section, as the first hypothesis, we have investigated EGB treatment's effects on TD occurrence and genetics' role. One hundred fifty patients were recruited from Beijing HuiLongGuan Hospital. Seventy-five patients took a placebo, while the rest received EGB treatment. In addition, we have investigated predictive genetic factors related to EGB treatment as a second hypothesis. Specifically, we have investigated its association with the language subscale of RBANS according to the minimum clinically meaningful difference in language score. In the third section, as the first hypothesis, we have investigated EGB treatment's effects on TD occurrence and genetics' role. One hundred fifty patients were recruited from Beijing HuiLongGuan Hospital. Seventy-five patients took a placebo, while the rest received EGB treatment. In addition, we have investigated predictive genetic factors related to EGB treatment as a second hypothesis. Specifically, we have investigated its association with the language subscale of RBANS according to the minimum clinically meaningful difference in language score. According to the third hypothesis, in the Hebei Rongjun Hospital, the RF model demonstrated an accuracy of 88% (sensitivity: 90.6% and specificity: 85.5%). The model's receiver operating characteristics (ROC) curve value was 94%. Moreover, the top 5 predictors were neutrophil, body mass index (BMI), lymphocyte, heart rate (HR), and hemoglobin (HGB), respectively. After testing the same optimized RF model with the same parameters created using the data of Hebei Rongjun Hospital, we have reached an accuracy of 71.6% and a specificity of 86.8%. However, the sensitivity was only 7.1% (Beijing HuiLongGuan Hospital). This hypothesis was tested in an independent sample. Next, the RF model was measured according to the fourth hypothesis; the accuracy was 81.25%, while sensitivity and specificity values were 88.8% and 83%, respectively. The AUC value was more than 80%. The most predictive factors were Prolactin (PRL), TNF-α, Interleukin 2 (IL-2), globin (GLO), and potassium, respectively. The most discriminative genetic feature was Superoxide Dismutase (SOD). In addition, Monoamine oxidase A (MAOA) 272, Brain-Derived Neurotrophic Factor (BDNF) 196, Tumor Necrosis Factor-alpha (TNF-α), Leptin 242, MAOA 309, Catechol-O-Methyltransferase (COMT), IL-6R, Leptin 367, MAOA 351, and BDNF 270 features have followed this trend respectively. Moreover, in the AP-based model, the NB ML model in the SGA group predicted TD with over 80 % accuracy. However, it did not predict TD in the FGA group with the same accuracy as in the SGA group, possibly because of the lower sample size (n=25). However, in the logistic regression (LR) model, the accuracy was successfully predicted with over 80 % accuracy. According to the first hypothesis of the third section, EGB treatment of TD results was predicted using a SVM model with genetic variables with an accuracy of 74%. SVM performed best among all ML models. Other ML models also predicted TD with an accuracy of over 70%. In addition, as the second hypothesis of the third section, EGB treatment effectiveness measured by the language subscale of RBANS using SVM and LR was 91%, whereas the sensitivity value was only 0% due to the lack of true negatives because of the unbalanced sample. Conclusion: Our findings indicate that TD is a common movement disorder, with specific demographic and clinical variables being risk factors for the development of TD (first section). We found that certain sociodemographic, genetic, treatment-related, hormonal, and clinical variables are essential in predicting TD development risk (over chance level accuracy). In addition, AP-based categorization may improve the accuracy of the TD prediction model (second section). Additionally, according to the first hypothesis, the treatment response of EGB to TD can be predicted in the third section. Moreover, according to the second hypothesis in this section, cognitive function measured by RBANS language subscale score can be predicted using parameters related to various genetic variables.
英文摘要背景: 迟发性运动障碍 (TD) 是一种常见的锥体外系症状,严重影响许多服用抗精神病药物 (AP) 超过 3 个月的精神分裂症 (SCZ) 患者。已经有许多研究调查了 TD 与各种环境参数和生物学参数的关系,例如有遗传学的、表观遗传学的、激素的、临床的、生理的和免疫学的变量。然而,之前的研究并没有清晰的结论,且只有少数研究使用新的 ML 模型来专门预测 TD 发展的风险,ML 模型例如有 random forest (RF), neural networks (NN), naive Bayes (NB), 或 support vector machine (SVM)模型。此外,因为抗氧化剂银杏叶(EGB)治疗被建议用来部分改善 TD 症状的严重程度,所以检测决定或影响抗精神药物反应的相关因素是必要的。 方法: 在研究的第一部分,作为第一个假设,我们调查了中国慢性 SCZ 患者 TD 的患病率、临床关联和危险因素。研究一共纳入 901 名中国 SCZ 住院患者。后来,在研究的第二部分,在研究的第二个假设中,我们的目标是预测该样本中的 TD 状态。接下来,在第三个假设中,我们仅纳入了男性吸烟患者来预测 TD 的发生,以减少混杂因素的影响。具体来说,我们从中国保定市河北荣军医院选取了 338 名患有慢性 SCZ 的吸烟男性患者来创建 RF 算法。 Schooler 和 Kane 标准用于评估 TD 的发生。 结果显示,其中 165 名患者被诊断出患有 TD,而另外 173 名患者没有被诊断出患有 TD。与河北荣军医院的样本相似,我们从北京回龙观医院选择了 74 名吸烟的男性患者来验证RF 算法,发现其中 14 例患者有 TD(18.9%),其余 60 例患者均没有 TD(81.08%)。接下来,对于 ML 模型 (n=76),我们只使用了 RF 方法,因为 ML 模型能够成功地预测 TD,准确率超过 70%。我们使用了基于聚合酶链反应(PCR)的方法来分析单核苷酸多态性 (SNP)。 此外,根据超氧化物歧化酶 (SOD) (第五个假设)和 AP 类型分类(第六个假设)对患者进行了分析,以了解使用 RF、NN、NB 和 SVM 模型的最具预测性的因素。 在研究的第三部分,在第一个假设中,我们研究了 EGB 治疗对 TD 发生的影响以及遗传学的作用。 我们从北京回龙观医院招募了 150 名患者,其中 75 名患者服用安慰剂,其余患者接受 EGB 治疗。此外,我们研究了与 EGB 治疗相关的遗传因素作为第二个假设。 具体来说,我们根据语言分数的最小临床意义差异调查了 EGB 治疗反应与 RBANS 语言分量表的关联。 结果: 在第一部分中,正如我们第一个假设所调查的那样,与非 TD 患者相比,TD 患者更有可能是男性,吸烟率更高,病程更长 (DOI)。 此外,与非 TD 患者相比,TD 患者的 PANSS 总分、PANSS 阴性分量表和认知分量表得分更高,但代谢生物标志物的平均水平更低。 后来,根据第二个假设(第二部分),根据 RF 模型预测 TD 的准确度为 79%(灵敏度:75.9%,特异度:97.3%,曲线下面积(AUC):91%)。 结果: 在第一部分中,正如我们第一个假设所调查的那样,与非 TD 患者相比,TD 患者更有可能是男性,吸烟率更高,病程更长 (DOI)。 此外,与非 TD 患者相比,TD 患者的 PANSS 总分、PANSS 阴性分量表和认知分量表得分更高,但代谢生物标志物的平均水平更低。 后来,根据第二个假设(第二部分),根据 RF 模型预测 TD 的准确度为 79%(灵敏度:75.9%,特异度:97.3%,曲线下面积(AUC):91%)。 此外,在基于 AP 的模型中,SGA 组中的 NB ML 模型预测 TD 的准确率均超过80%。 然而,NB ML 模型并没有像 SGA 组那样准确地预测 FGA 组中的 TD,这可能是由于样本量较低(n = 25)。 而在逻辑回归模型 (LR) 中,TD 准确率被成功预测,准确率超过 80%。 根据第三部分的第一个假设,使用遗传变量的 SVM 模型预测 EGB 治疗 TD 结果的准确率为 74%。 LR 在所有 ML 模型中表现最好。 其他 ML 模型也以超过 70% 的准确度预测 TD。 此外,作为第三部分的第二个假设,使用 SVM 和 LR 的 RBANS 语言分量表测量的 EGB 治疗效果为 91%,而由于样本不平衡导致缺乏真阴性值,灵敏度仅为 0%。 结论: 我们的研究结果表明,TD 是一种常见的运动障碍,特定的人口统计学和临床变量是 TD 发展的危险因素(第一部分)。 我们发现某些社会人口学、遗传、治疗相关、激素和临床变量对于预测 TD 发展风险具有重要作用(准确性超过机会水平)。 另外,基于 AP 的患者分类可以提高 TD 预测模型的准确性(第二部分)。 此外,在第三部分中,根据第一个假设,EGB 对 TD 的治疗反应可以被预测。根据这部分的第二个假设,通过 RBANS 语言分量表评分测量的认知功能可以使用与各种遗传变量相关的参数来预测。
语种中文
源URL[http://ir.psych.ac.cn/handle/311026/44833]  
专题心理研究所_社会与工程心理学研究室
推荐引用方式
GB/T 7714
ULUDAG KADIR. 使用机器学习来预测迟发性运动障碍的发生及银杏叶的治 疗反应[D]. 中国科学院心理研究所. 中国科学院大学. 2023.

入库方式: OAI收割

来源:心理研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。