metadata only access
In previous studies of the reliability of peer assessment has been inferred by comparing marks from peer raters with those obtained from teachers. These approaches, whether employing correlation analysis or mark differences, are based on an implicit assumption that the teacher marks represent the benchmark or 'gold standard' by which the reliability or accuracy of peer scores can be gauged. This paper reports the outcomes of teacher and peer assessments of 119 final year engineering students' oral presentation skills within a fourth year Communication subject in 1996. A major part of the assessment for this subject is based on seminar presentations of their thesis project. An average of 5 staff carry out an assessment of each seminar presentation, together with a similar number of peer assessors. The data sets consisting of multiple assessments by both peers and teaching staff for every student's presentation meant that separate reliability estimates using ANOVA techniques could be made for both. Analysis of these data revealed that teacher assessments yielded much higher levels of agreement than did peer assessments. However, further analysis using the reverse Spearman-Brown formula indicated that Scores based on average of four or more peer raters are likely to be more reliable than ratings provided by a single teacher assessor. Findings are discussed in relation to concerns raised in the literature on the reliability of peer assessments, and questions the conventional notion of teacher assessment being the benchmark or 'gold standard' by which the reliability of multiple peer assessments is estimated.