Measuring rater consistency : an investigation into the effects of two testing instruments on raters' scores