A perennial issue at every college and university is how to measure teacher quality. It is important because it directly influences decisions about retention, tenure, and promotion. Everyone complains about basing such decisions on end-of-course evaluations. This column will explore a recent study by Scott Carrell and James West , undertaken at the United States Air Force Academy (USAFA), that strongly suggests that such evaluations are even less useful than commonly believed and that the greatest long-term learning does not come from those instructors who receive the strongest evaluations at the end of the class.
The study authors chose to measure teacher effectiveness in Calculus I by examining value added in both Calculus I and Calculus II: comparing student performance on course assessments for each instructor after controlling for variables in student preparation and background that included academic background, SAT verbal and math scores, sex, and race and ethnicity. It is generally acknowledged that better teachers produce better results in their students, but this has only been extensively studied in elementary students, and even there it is not without its problems. The authors reference a 2010 study by Rothstein  that shows a strong positive correlation between the quality of fifth grade teachers and student performance on assessments taken in fourth grade, suggesting a significant selection bias: The best students seek out the best teachers. This may be even truer at the university level where students have much more control over who they take a class with. For this reason, Carrell and West were very careful to measure the comparability of the classes. At USAFA, everyone takes Calculus I, and there is little personal choice in which section to take, so such selection bias is less likely to occur. The authors also tested for and found no evidence of backward correlation, that the best Calculus II instructors were correlated with higher grades in Calculus I.
The authors had a large sample size with which to work, all of the students who took Calculus I from fall 2000 through spring 2007, over 10,000 students and 91 instructors. The faculty make-up at USAFA is unusual among post-secondary institutions. There is a small core of permanent faculty. Only 15% of Calculus I instructors held the rank of Associate or Full Professor, and only 31% held a doctorate in mathematics or a mathematical science. Most of the teaching is done by officers who hold a master’s degree and are doing a rotation through USAFA. The average number of years of teaching experience among all Calculus I instructors was less than four years. Because of this, there is tight control on these courses, which facilitates a careful statistical study. There are common syllabi and examinations. All instructors get to see the examinations before they are given so that there is opportunity, if an instructor so wishes, to “teach to the test,” emphasizing those parts of the curriculum that are known to be important for the assessment.
Positive responses to the following prompts all had positive influence on student performance in Calculus I, significant at the 0.05 level:
- Instructor’s ability to provide clear, well-organized instruction.
- Value of questions and problems raised by instructor.
- Instructor’s knowledge of course material.
- The course as a whole.
- Amount you learned in the course.
- The instructor’s effectiveness in facilitating my learning in the course.
On the other hand, faculty rank, highest degree, and years of teaching experience were negatively correlated with examination performance in Calculus I, but positively correlated with performance in Calculus II, with statistical significance for years of teaching experience for both the negative impact in Calculus I and the positive impact in Calculus II.
The suggested implication is that less experienced instructors tend to focus on the particular skills and abilities needed to succeed in the next assessment and that students like that approach. Experienced instructors may pay more attention to the foundational knowledge that will serve the student in subsequent courses, and students appear to be less immediately appreciative of what these instructors are able to bring to the class.
This study strongly suggests that end of course student evaluations are, at best, an incomplete measure of an instructor’s effectiveness. It also suggests a long-term weakness of simply preparing students for their next assessment, though it should be emphasized that this represents merely a guess as to why less experienced instructors appear to get better performance from their students in Calculus I assessments.
At Macalester College, we recognize the importance of student reflection on what they learned months or years earlier. When a promotion or tenure case comes to the Personnel Committee, we collect online student evaluations of that faculty member from all of the students who have taken a course with him or her over roughly the past five years, combining both recent and current appraisals with longer term assessments of the effect that instructor has had.
 Scott E. Carrell & James E. West, 2010. "Does Professor Quality Matter? Evidence from Random Assignment of Students to Professors," Journal of Political Economy, University of Chicago Press, vol. 118(3), pages 409-432, 06. Available at http://www.nber.org/papers/w14081
 Jesse Rothstein, 2010. “Teacher Quality in Educational Production: Tracking, Decay, and Student Achievement.” Quarterly Journal of Economics 125 (1): 175–214. Available at http://gsppi.berkeley.edu/faculty/jrothstein/published/rothstein_vam_may152009.pdf