中国科学院机构知识库网格系统: 作文评分的客观性：分步增值评分模式和整体双评评分模式的比较研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

作文评分的客观性：分步增值评分模式和整体双评评分模式的比较研究

文献类型：学位论文


作者	刘斯佳
学位类别	同等学力硕士
答辩日期	2015-07
授予单位	中国科学院研究生院
授予地点	北京
关键词	作文考试整体评分分步增值评分多面Rasch模型评分效率
其他题名	The Objectiveness of Writing Assessments:A Comparison between the Multistage Rating Augmentation Method and the Holistic Rating Method
学位专业	心理学
中文摘要	作文评分被认为可以更好地反映实际能力而受人青睐，然而其评分质量和客观性却存在着质疑。评分质量包含了两个层面的内容，其一是通过统计指标量化成绩与能力的对应程度，其二是成绩被有效理解和使用的实用意义。本研究率先采纳了王博等（2012）介绍的分步增值评分模式以及传统整体双评评分模式，通过传统测量学模型以及多面Rasch模型，分别对500份国家级作文考试答卷的评分情况进行了考察。研究另外抽取了一部分答卷通过专家评分，考察了两种评分模式的误差程度及其评分效率。研究一发现作文评分过程中，不论是分步增值评分模式还是传统双评评分模式，评分成绩的不一致情况是非常普遍的；分步增值评分模式相对于传统双评评分模式，在成绩分布情况上显得更为合理，并且评分一致性更好。研究二发现分步增值评分模式相对于传统双评评分模式，评分误差较低；甚至，分步增值评分1评的误差相对传统双评评分的结果更小。研究三发现分步增值评分模式相对于传统双评评分模式，在概率曲线的分布情况上显得更为合理。研究四通过多面Rasch模型发现，分步增值评分模式相对于传统双评评分模式，对于各个评分者在评分偏差度和评分成绩的区分度的表现显得更好。研究五发现分步增值评分模式相对于传统双评评分模式，在保证评分误差的情况下评分效率更好。最后，研究六发现分步增值评分模式相对于传统双评评分模式，对于不同性别和学历的评分者在评分偏差度和评分成绩的区分度的表现显得更好。结果表明分步增值评分模式的确能够提高评分成绩的一系列统计学指标，并且分步增值评分模式在有效理解和使用的实用性层面也更加优越。因此，分步增值评分模式在评分质量的两个层面均有建设性意义。未来研究中可以考察不同类型的评分者所表现出的评分质量，以及对评分维度进行量化分析来简化和明确主观试题的评分依据。关键词：作文考试；整体评分；分步增值评分；多面Rasch模型；评分效率
英文摘要	Writing assessment has been regarded better representing the real life abilities of the candidate, thus received much favoritism; but the objectiveness of such assessment remained questionable. The quality of the assessment includes two aspects, for one, it is the scores of the tests to represent the actual targeted ability of an individual, for the other part, it is the interpretation and utilization of the tests from the practical senses. This research firstly applied the multistage rating augmentation method of writing assessment introduced by Wang et al. (2012) and the traditional holistic rating method, to assess their soundness in scoring 500 papers of a national-wise writing assessment. This was based both on the traditional psychometric methods and the multifaceted Rasch model. The research also select a proportion of the papers that were to rate by a panel of experts for the assessments of rating errors and efficiency. Study 1 suggest that regardless of the multistage rating augmentation method or the holistic rating method, rating by two raters were often not at the same scoring levels; but the multistage rating augmentation method revealed more optimal score distribution, and consistency. Study 2 suggest that the multistage rating augmentation method, as compared to the holistic rating method, illustrate smaller rating errors; and this was revealed even when only one set of the multistage rating scores was included. Study 3 further suggest that as compared to the traditional holistic rating method, the multistage rating augmentation method demonstrated better category probability curve distribution. Study 4 detailed better performances in terms of misfit and discrimination when the multistage rating augmentation method was compared to the holistic rating method. Study 5 suggest that as compared to the traditional holistic rating method, the multistage rating augmentation method obtained better rating efficiency by maintaining less rating errors based on a single rater. Lastly, Study 6 further showed that raters with different gender and education levels better performed in terms of misfit and discrimination when the multistage rating augmentation method was compared to the holistic rating method. The results showed that the multistage rating augmentation method enhanced the scores of the tests to represent targeted abilities from a statistical sense, and the multistage rating augmentation method demonstrated superiority even in terms of the interpretation and utilization of the tests from a practical sense. Thus, the multistage rating augmentation method shown constructive contribution to the quality of subjective assessment. Future studies could further investigate the rating qualities of raters of different categories, as well as clarifying the dimensions of writing assessment in simplifying and clarifying the objective evidences regarding subjective assessment.
学科主题	应用心理学
语种	中文
源URL	[http://ir.psych.ac.cn/handle/311026/20553]
专题	心理研究所_健康与遗传心理学研究室
作者单位	中国科学院心理研究所
推荐引用方式 GB/T 7714	刘斯佳. 作文评分的客观性：分步增值评分模式和整体双评评分模式的比较研究[D]. 北京. 中国科学院研究生院. 2015.

入库方式： OAI收割

来源：心理研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。