中国科学院机构知识库网格系统: 静息态脑影像多中心大数据站点效应校正方法研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

静息态脑影像多中心大数据站点效应校正方法研究

文献类型：学位论文


作者	王瑀薇
答辩日期	2024-06
文献子类	博士
授予单位	中国科学院大学
授予地点	中国科学院心理研究所
其他责任者	严超赣
关键词	标准化静息态多站点大数据
学位名称	理学博士
学位专业	认知神经科学
其他题名	Harmonization on Site Effect of Multi-site Resting-state Functional Magnetic Resonance Big Imaging Data
中文摘要	Brain imaging data has entered the era of big data, and large sample sizes can enhance statistical power and research reproducibility. Multi-center data aggregation is a primary way to obtain big data currently. It not only improves the efficiency of acquiring time-consuming and labor-intensive brain imaging data (such as prospective and retrospective cohort studies through multi-center collaboration) but also enables the merging of groups that are difficult to obtain within a single site due to selection bias (such as collecting only or mainly normal populations for research purposes) into datasets with rich demographic information by pooling data from different scanning sites. This facilitates subgrouping and pathological exploration, which are not feasible with small samples. However, multi-center data aggregation comes with site effects, a statistically significant factor that strongly interferes with biological effects. Site effects refer to the variability in data generated between scanning sites caused by differences in scanning procedures, experimental setups, environments, and subject differences. The non-biologically relevant portion of this effect is defined as noise, which hinders the discovery of the target effects in research. Previous studies on removing site effects have mostly focused on structural images, such as T1-weighted images and Diffusion Tensor Imaging (DTI). These types of imaging have good contrast and can reflect brain tissue structures, and the image features obtained from these imaging techniques allow for the transfer and application of related techniques in artificial intelligence. However, for resting-state brain imaging, there is a lack of targeted method development and comparison in existing studies. The limited choices and uncertainties in methodology severely hinder the aggregation of resting-state functional MRI (fMRI) big data and the research based on it. Furthermore, existing studies have several shortcomings in evaluating related methods, including the lack of experimental design addressing multi-center issues, complex and chaotic internal data structures, unclear relationships between site information and other biologically relevant information, leading to insufficient evaluation efficacy of the methods; lack of discussion on the reproducibility of multi-center data after site effect correction, which is a crucial issue for resting-state brain imaging data; and the necessity of considering the practicality of site effect correction methods under the complete confounding of site and biological information due to various difficulties in actual operations, which directly determines the feasibility and screening conditions of data aggregation. To address the issue of site effect correction in the field of brain imaging and provide practical methodological recommendations for correcting multi-center data aggregation in resting-state brain imaging, this study collected popular and cutting-edge correction methods for current multi-center data aggregation issues in resting-state fMRI and conducted comprehensive evaluations on residual site effects, individual identification rate, reproducibility, replicability, and the ability to handle situations where gender and site are completely confounded. The evaluation results indicated that the Subsampling Maximum-Mean-Distance Algorithm (SMA) performed the best overall. Therefore, we recommend using this algorithm for site effect correction in multi-center resting-state brain imaging. Based on this evaluation conclusion, we further explored the use of SMA. When using SMA for site effect correction, it is mandatory to set a target site, aligning the data distribution of other sites to it to obtain homogeneous site effects. However, the impact of different characteristics of the target site on correction results remains unknown. Thus, we selected two important features in its problem space—the sample size of the target site and the demographic distribution characteristics—as the targets for experimentation. The experimental results showed that the correction results were better in terms of individual identification rate, test-retest reliability, and stability when the target site had a larger sample size and the smallest demographic distribution difference compared to other sites. From this, we proposed a heuristic formula that considers the relative sample size and demographic distribution differences between sites, providing users with direct target site references for using SMA for multi-center site effect correction. Based on the above research conclusions, we developed the DPABI Harmonization toolbox on the MATLAB platform, integrating all site effect correction methods discussed in this paper. This toolbox is flexible and easy to use, aiding the development of multi-center research and maximizing data potential. In summary, we innovatively applied the Subsampling Maximum-Mean-Distance Algorithm to the field of resting-state functional brain imaging and were the first to discuss the parameter setting issues during its application, proposing a heuristic formula for target site selection. To better promote a healthy methodological ecosystem, we developed an integrated toolbox that includes various site correction methods to facilitate the mutual validation of methods within the field.
英文摘要	脑影像数据已进入大数据时代，大样本可以提高统计效力和研究的可重复性。多中心数据汇聚是当前获得大数据的一个主要途径，其不仅可以提高耗时耗力的脑影像数据的获取效率 (如多中心协作的前瞻性队列和回顾性队列)，还可以通过汇聚不同扫描站点的数据，把在单一站点内因选择偏差 (比如因研究需要只采集/主要采集正常人群体等)而难以获得的组别 (如抑郁症群体)合并成具有较丰富人口学信息数据集，从而实现亚组的划分以及病理学上的探索等在小样本上不具有可行性的研究。然而多中心数据的汇集会伴随站点效应，该因子统计学上具有较强的效应，对生物学效应具有较强的干扰效果。站点效应是指由扫描程序、实验设置、环境和被试差异引起的扫描站点间的数据产生的变异。这种效应的非生物学相关部分被定义为噪声，有碍于研究的目标效应的发现。既往已有的站点效应去除的研究多针对结构像，如 T1 加权影像 (T1- weighted Image)和磁共振弥散张量成像 (Diffusion Tensor Imaging, DTI)，这主要是由于该类成像具有较好的对比度，可以反映大脑组织结构的构成，同时该类成像技术得到的脑影像具有的图像特征允许人工智能领域的相关技术的迁移应用。但对于静息态脑影像，已有研究缺少具有针对性的方法的开发以及比较，方法学上有限的选择和不确定性无形之中都严重阻碍了静息态功能磁共振脑影像大数据的汇聚，以及基于其的研究的开展。而且，已有的研究在评估相关方法时存在一些不足，包括缺乏针对多中心问题进行实验设计、所使用的数据内部结构复杂混乱以及站点信息和其它生物学相关信息关系不明等，导致对方法的评估效力不足；缺乏对校正站点效应后的多中心数据在可重复性上的探讨，对于静息态脑影像数据来说，可重复性是一个关键的问题，对其的忽视也促成了评估的不全面性；此外，由于站点信息和生物学信息之间关系的不明确性，以及实际操作中的种种困难，有必要考虑站点和生物学信息完全混淆的情况下的站点效应校正方法的实用性，这将直接决定了数据汇聚的可行性和筛选条件。为了解决脑影像领域的站点效应校正问题，给出适合于静息态脑影像多中心数据汇聚时进行校正的方法学上的实践建议，本研究针对静息态多中心数据汇集时的站点效应问题收集当下流行和前沿的校正方法，并对其进行了剩余站点效应、个体识别率、可重复性、可复制性和应对性别和站点完全混淆情况能力的全面评估。经过以上测试评估得出，子采样最大均值差异算法 (Subsampling Maximum-Mean-Distance Algorithm, SMA)的综合表现最优。因此我们推荐使用该算法作为静息态多中心站点效应校正方法。基于该评估结论，我们进一步对 SMA 的使用进行探究。在使用 SMA 进行站点效应校正时，需要设定目标站点 (target site)，从而令其他站点的数据分布向其看齐，从而获得同质的站点效应。然而，目标站点的不同特质对校正结果的影响尚且未知。因此，我们在其问题空间中选取了两个重要的特征——目标站点的样本量和人口学分布特征，作为目标并进行实验。实验结果表明，具有更大样本量，同时和其他站点的人口学分布差异最小的目标站点的校正结果在个体识别率，统计结果的重测信度和稳定性方面都更佳。由此出发，我们提出一个启发式公式，同时考虑站点间的样本量相对大小和人口学分布差异大小，在使用 SMA 进行校正多中心站点效应时，该启发式公式可以方便快捷地为使用者提供直接的目标站点参考。基于以上研究结论，我们在 MATLAB 平台开发了 DPABI Harmonization 工具箱，集合了本文所涉及的所有站点效应校正方法。该工具箱灵活易用，可以助力多中心研究的开展，发挥数据潜能。综上，我们开创性地将子采样最大均值距离算法应用至静息态功能脑影像领域，并首先对其在应用时的参数设置问题进行探讨，提出了用于目标站点选择的启发式公式。为了更好推动方法学生态良好循环，在此基础上，开发集成工具包，囊括多类站点校正方法，用于促进领域内共同检验方法的泛化性。
语种	中文
源URL	[http://ir.psych.ac.cn/handle/311026/48004]
专题	心理研究所_认知与发展心理学研究室
推荐引用方式 GB/T 7714	王瑀薇. 静息态脑影像多中心大数据站点效应校正方法研究[D]. 中国科学院心理研究所. 中国科学院大学. 2024.

入库方式： OAI收割

来源：心理研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。