The inter-rater reliability statistics are used to measure how consistently two reviewers applied topic tags to the same set of OPENEND and STRING questionnaire responses. Ideally, the reviewers both received the same training about the tags, and then worked independently to read each response and assign the appropriate topic tags. The greater their agreement, the greater confidence you can have that the results of the content analysis are meaningful.

Select Options

The inter-rater reliability statics may only be computed between two reviewers. Pick the two reviewers whose tag assignments you wish to compare.

The percent complete information roughly indicates each reviewer's progress. If a reviewer had tagged all of the responses (including blank responses) for all of the OPENEND or STRING questions for which topic tags have been defined, then the reviewer would be listed as being 100% complete.

Step 1

Select Options

View Report

The Inter-Rater Reliability Report shows several common measures of reliability between two coders: Cohen's Kappa, Krippendorff's Alpha-Reliability, and the percent agreement for all the OPENEND and STRING questions that have been coded. Each tag is treated independently and may either be set to "Yes," the code was applied, or "No," the code was not applied. The summary count for the yes and no aggreement combinations are also shown in the report.

In order to account for cases that were only reviewed by one Administrator, the second Administrator's codes are presumed to be "No," for all the tags that apply to that question in this report.

According to Jim Fields, Assistant Director, GAO, "Academic studies often accept a Kappa or Alpha score of 0.7 as being acceptable. The acceptable level for a GAO study is based on the purposes of the analysis and the risks associated with the use of the data in the GAO report. The pattern of the errors should also be examined." A score of 1.0 indicates perfect agreement.

Step 2

Cohen's Kappa Krippendorff's Alpha-Reliability

In the example above, n (the number of responses that have been tagged) is less than N (the total number of respondents), because many respondents did not answer these questions and the Administrators chose not to tag blank responses.

QPL Home Send comments to Kevin Dooley.

Site Map