Instructional Rounds & Inter-Rater Reliability

Radiologists are trained to look at very specific sources of data— MRIs, CT scans, and x-rays, among others—and provide accurate judgments about what they see.

As instructional leaders, we aspire to do the same in classroom observations. But can we?

If you show the same image to half a dozen radiologists, there should be substantial inter-rater reliability on the question of whether:

  • Everything is normal, or
  • Something is definitely wrong, or
  • More information is needed due to ambiguities

In radiology, measuring inter-rater reliability in radiology is both possible and essential. But achieving rock-solid inter-rater reliability isn’t always possible.

Is the same true in instructional leadership and supervision? The increasingly popular practice of “instructional rounds” provides some critical clues.

Rounds Isn’t Radiology

Can we achieve inter-rater reliability in teacher observations? This is an issue of growing importance, because many school districts are convening principals—as a form of professional development—for instructional rounds visits focused on calibrating observation ratings.

Other districts are using non-school-based observers to provide a second rating when a teacher receives an unfavorable review—much like a hospital showing an MRI to another radiologist for second opinion.

If we’re going to conduct high-stakes teacher evaluations, the thinking goes, they need to be valid and reliable, and inter-rater reliability is a strong indicator that an evaluator is being fair.

How is it working for our profession? Is the “rounds” model holding up? Does the “second opinion” add value?

First, we need to understand what medical rounds are. When conducting rounds, doctors discuss all of the available information about the patient, not just a single source of data like an MRI.

Second, while I’m no doctor, I suspect that if you asked a group of physicians how well their rounds achieve inter-rater reliability, you’d be met with a puzzled look. An accurate diagnosis leading to an effective course of treatment—not inter-rater reliability—is the goal of medical rounds.

And yet in the instructional rounds process, we’ve been stretching one-shot classroom observations beyond what they’re capable of telling us.

In short, we’ve been treating classroom observations like MRIs.

Inter-Rater Reliability and Instructional Leadership’s “MRI”

The “MRI” for teaching is the classroom observation. There’s no doubt—when leaders convene to hold instructional rounds, the “image” they’re looking at is a lesson or part of a lesson.

Typically, instructional rounds works like this: A group visits a classroom, observes for a while, then departs to discuss what they saw (with or without looping in the teacher they observed).

There are variations on the process, but at the core of rounds is a classroom observation by a group of outsiders, escorted by the teacher’s supervisor.

We’d like to believe that a single observation can yield precise insights about a teacher’s strengths and weaknesses, especially if we put our heads together. We’d like to believe that we’re radiologists looking at an MRI. Because we all have evidence, and we can compare notes, we think we’re like radiologists all looking at the same CT scan.

Does the instructional rounds process turn us into the educational equivalent of radiologists?

No—or if it does, only in a very narrow sense. Observing a lesson is simply the tip of the iceberg of teaching practice.

Missing Context

When we attempt the rounds process, we’re typically working with relatively little information—the information we can ascertain from observing a lesson.

For certain topics, this isn’t a problem. If you want to see how well the teacher uses a certain questioning strategy, or how he handles student behavior, and if the lesson gives you the opportunity to see what you came to see, rounds can be productive.

If you all observe the teacher using a questioning strategy, it’s reasonable to work toward some degree of consensus about the effectiveness of the practice you observed.

But it’s entirely unreasonable to expect to come to consensus about the teacher’s overall effectiveness. It’s simply too broad a question for the narrow data available. Bringing along more observers doesn’t help.

To make broader judgments, we need richer data. High-performance instructional leaders have more information, which isn’t readily available to a team of outsiders. They’re in classrooms daily, and in every classroom every two weeks. This provides enormously helpful context for what happens in an observed lesson, and makes the available evidence much more useful.

If you’re doing rounds, make sure you aren’t mining your observations for insights they can’t provide. And if you’re using outside observers, make sure they spend enough time in the classroom to have meaningful context for what they see during formal observations.

A Better Goal

Inter-rater reliability is only a useful construct when each of the raters have sufficient information. Otherwise, they’re achieving precision without accuracy—producing tightly clustered ratings that, while close to each other, miss the mark.

But what can be gained from having half a dozen or more observers discuss what they see in a lesson? Plenty.

Even with incomplete information, a rounds team can benefit from using the language of their shared instructional framework to discuss the teaching practice they observed.

The problem? The team won’t always get the information they hope to gather on a specific topic.

If no students misbehave (good!), they won’t see how the teacher responds to misbehavior. If the teacher doesn’t use the preferred questioning strategy during the observed lesson, there’s simply no evidence to discuss.

With those factors in mind, here are four recommendations for getting more out of instructional rounds—instead of trying to achieve inter-rater reliability.

Four Ways To Get More From Instructional Rounds

1. Go in with open eyes
Don’t expect to see a particular strategy at a particular time, unless it’s a strategy that should be used every single day in every single lesson.

Instead, stay attuned to what the teacher is trying to accomplish with the lesson. That’s a much fairer basis for judging the effectiveness of a lesson, and will lead to much more relevant discussions.

2. Record evidence in the language of your instructional framework
The more familiar you are with your framework, the more you’ll be able to capture salient points in your notes. Better evidence will make for a better discussion afterward.

3. Choose a focus with plenty of evidence
Don’t start your discussion with “warm” and “cool” feedback. Unless it was an unmitigated disaster, you probably have very little basis for drawing an overall conclusion. The effectiveness of a lesson depends on what happened before and after the lesson, which you don’t get to see in a brief visit.

Instead, start your discussion by looking closely at your instructional framework. What elements seems most salient, given what you observed? Where do you have the most—or the most interesting—evidence to discuss?

4. Look for descriptors of practice

This is where the “calibration” discussion can become productive. Once you’ve decided what you actually have enough evidence on, you can seek to align that evidence with your evaluation rubric or instructional framework.

In the better frameworks, like Danielson’s, you’ll find leveled descriptors of each practice, so you’ll have a reference point other than personal opinions.

Remember, you’re calibrating the way you match evidence to the rubric, not calibrating your judgments of the teacher’s overall performance. You didn’t see the teacher’s overall performance—this is radiology, not true medical rounds.

Toward Better Professional Development for Instructional Leaders

The rounds process can be powerful if we keep our focus on the right goals. It should be a part of every principal’s professional development.

But even more important is getting administrators into their own teachers’ classrooms more often. That’s why I created the 21-Day Instructional Leadership Challenge, which to date has helped more than 3,200 administrators in 50 countries develop the daily habit of providing evidence-rich feedback to their teachers.

If you’re interested in bringing the Challenge to your district, please get in touch. It’s a free program that anyone can join at any time, and if you’d like to bring me out to kick things off, contact me for rates and availability.

Share:Email this to someoneTweet about this on TwitterShare on Google+Share on FacebookShare on LinkedIn
About Justin Baeder

Justin Baeder helps school administrators increase their productivity through the High-Performance Instructional Leadership Network. Learn More »

Speak Your Mind

Google+