Implementing Evidence-Based Programs with Fidelity [Part 2]

The story of the last decade of professional learning has been around the idea of “evidence for practice.” With ESSA’s definition of professional learning, it’s clear that programs have to be linked to research that shows impact on students.

But choosing a research-backed framework is the easy part. Translating it into practice — also known as “implementing with fidelity” — is a taller order.

Today: Part 2 of a conversation with two real-life practitioners on how to optimize these drivers in a school district setting, track implementation success in real time, and understand what works and why. Click here for Part 1!

(Quotes have been lightly edited and condensed for clarity. See the whole conversation at!)

Don’t just say “go for it” — putting the right foundations in place

A solid framework is a great first step, but it’s not the full scope of the process. Once you’re planning your implementation, it’s critical to consider the other steps you may take to put a foundation in place and ensure a successful launch:

Kim Swartz, Central Rivers Area Education Agency:
I think for us, the Innovation Configuration map was really important. It was one of the first things we had to do because it paints the picture of what this should [ultimately] look like. Also, keeping the training ongoing — this is not a one-and-done by any means. In fact, we talked to districts about it [in terms of] the three to five year scope.

Districts can choose to do this work with us or not, but we want to be really up front with them that this takes time, it is hard work, and we need the administration and coaches to be in on it with us. We’re building the capacity of their coaches to eventually take over and do this work on their own, so having those conversations up front is really important.

[KickUp lets us] collect data that we’ve never had before, and because of it we’ve been able to have these really powerful conversations. We can start to see what might be going on at an individual building or a grade level or for individual teachers. Being able to collect that data from those IC maps and actually having it there during conversations with the administration or building leadership team has been really powerful, and it’s driven what our professional learning looks like in the future because of that.

Audience question: How many days of training did staff need on the IC maps when you first began using them?

Both speakers talked about the importance of a “go slow to go fast” mindset — making sure staff were settled with the framework and core ideas before full-scale implementation:

A few years back we had about five days of training for our whole entire agency, but [now] it’s kind of an ongoing learning process. It takes a long time to put these maps together, it’s not something that you do overnight. When we worked on our literacy IC map last year we only had four components that we worked on and eventually got really good at, then did the rest of it over the summer.

Image framework courtesy of the Carnegie Foundation for the Advancement of Teaching

Audience question: Once the program is launched, how do you know whether or not you’re achieving fidelity?

Carlye Norton, KickUp:
[The above image] is a pretty typical approach to change. We have an idea for change, we put really good planning into place, and we frequently still don’t get to quality that’s also reliable at scale. What implementation improvement science really argues is that the missing piece there is data, and cycles of data that allow you to get feedback iteratively and make those changes as you’re going. But again, that’s the science, and we’re here to talk about what practitioners actually doing.

Greg Kibler, Youngstown City School District:
When we started rolling out, the Gradual Release of Responsibility framework has some sub-components to it. We started with just some of the basics and kind of created our own little learning cycle with it. We would train our administration and coaches first; the next week we would train our teachers; and then we would kind of say “Okay, let’s go do it and let’s see what we’re what we’re actually doing.”

We had a requirement to see every teacher once a week and provide feedback on that bite-sized action step. So that really helped us collect a lot of data and allowed us to see where, among a building or a district or a grade level or a team, the people that are struggling with it. That was kind of our immediate feedback: we were taking it a little bit slow, going back to that in-depth kind of knowledge, making sure everybody’s on the same page before moving forward the whole framework. We [also] set internal goals of how much data we wanted to collect, and once we got there we would produce a regular review of it, so every five weeks we were looking at this data along with some other points to monitor how it was coming along.

When we would review [the data] we would look for what we were seeing in the classroom outside of our visits as well. Is this something we’re seeing when we’re walking through? Is it a change that happens when we just kind of walk by and don’t actually go in the classroom? So some of those outside-the-data implementation notes were taken into consideration as well.

When we go out and do observations, we meet with either the administrator and the coach or sometimes a leadership team, and pull up the data to see where where they’re at. We’ll slice it and dice it in lots of different ways, but we always have those conversations so that we know what the next steps are based on what we’re seeing right now.

We never had the opportunity to do that before, because we didn’t have the data to look at. [KickUp] has really given us the opportunity to be able to do that.

Audience question: How do you make sure all your implementers are on the same page with assessing progress?

One of our [major concerns] is inter-rater reliability. We have 12 literacy consultants that are going out to do this work, and it’s really important that we’re all on the same page. We have the IC map, but how does that come to life in the classroom? What does it actually look like, and do we all have the same agreement about what that is?

(Click here to see Greg talk more about YCSD’s inter-rater alignment strategies!)

Swartz, cont’d:
We really focus on that inter-rater reliability amongst our team. We look at videos of instruction and rate [them] on the IC map, but we also do the same thing when we’re out doing observations with coaches and administrators… so that we’re comparing apples to apples.

The part that’s really beneficial to teachers is to have that conversation afterwards. We’ll do that in different ways [depending on instructional context and schedules]. It could be our literacy consultant that might have that conversation, it could be in combination with the coach or the administrator, it could be just the administrator it could be the coach — but again it’s bite-size pieces of information, about what they’re doing well and considerations for next steps.

Data courtesy of Central Rivers Area Education Agency

Swartz, cont’d:
[Above] you can see 58% of teachers received an A and B an A or B on their observation in [an area] that is actually kind of what Greg talked about — “I do we do you do.” Then on the right-hand side is where teachers reflected and where they felt they were, so 84% percent of the teachers feel like they are either in that A/B category.

This is a large group at a building level, but what it helps us to see is that when we’re doing observations, we’re not seeing teachers in necessarily those categories where they’re seeing themselves. It goes back to that whole idea about how sometimes at the beginning of the year they’ll mark themselves higher because they don’t know what they don’t know — but the implication for us is, let’s dig deeper. Let’s see where those differences are, and what kind of professional learning we need to provide based on that.

We were able to utilize a snow day to have all of our administrators in for a session on exactly this. We watched films of teachers teaching and had side by side conversations about “Does this look like this piece of evidence to you? What makes it that piece of evidence? How do we know are we looking for the right things?” Those recalibration meetings really struck a chord with the people that were doing these observations. Without it I think we would have definitely been over self-inflated rather than having that critical eye on what we should be seeing.

Audience question: What’s the timeline for looking at inter-rater reliability in a program rollout?

I feel like you have to do it right away. Everybody has to be on the same page at the very beginning when you start. Of course it’s an ongoing thing that you need to continue to do, but you know for sure at the beginning and then partway through you need to come back and recalibrate.

I’d also add that it’s important to focus not just on how you’re rating, but also the [consistency of the] feedback you’re giving. We struggled with that a little bit — feedback was very minimal, and so teachers weren’t getting much out of it. We worked on it by putting up some anonymous pieces of feedback and having discussions [with implementers] around “If you were the teacher getting this, how would you be able to use it?”

That’s such an important piece. We really studied Hattie’s work around feedback and what that should look like.

How to coordinate observations with multiple schools and conflicting schedules

Because we work with multiple schools, just getting into the buildings and doing those observations becomes a challenge. We really had to work with districts around where we were going to be a minimum of once a month — but you need to be doing observations more often than that. So we actually came up with a document [that outlines] “If you join us in this work here’s what you can expect from us as an AEA, here’s what we expect from you as an administrator, and here’s what we expect from you maybe as a coach or as a leadership team.”

It’s been really helpful to have that [both] up front and ongoing, and in creating a plan about how those observations will happen and how we’re going to make sure that teachers are getting feedback in the most effective way. There’s a nice function in KickUp where you can actually email that feedback to that teacher so if there’s just no way somebody can get in there for a face-to-face conversation, we at least [directly communicate with] them.

Audience question: How do you connect professional learning data to student outcomes?

You use your lens, you pull your data, you look at the things that you’re doing that show student outcomes. So if you’re doing PBIS, relate that to discipline referrals. For academic improvement, depends on what you’re using. For example I’ll say that at Youngstown, we have DIBELS  as a benchmark assessment in our younger grades, and then we use NWEA MAP in our 4th through 12th. So connecting that, we really can look at our cycles of implementation in GRR and those benchmark scores.

That’s the connection we try to make. Some of our teachers, although not all of them, do utilize one of our platforms for some short cycle assessments, which show if we’re seeing the same kind of positive effects [in teachers as in students]. It’s very hard, because it depends on what level or lens you’re looking at. Are you looking district-wide? A building? A teacher? [Isolating the relationship between] two things that you’re really looking at — for us it’s at GRR and student achievement on our state assessment — is a little bit harder, because that’s a [big] chunk of time. But if you look at something that you’re seeing on a smaller scale I think that’s easier to [draw conclusions from].

Some of the research that Joellen Killian has advocated for around assessing impact is the idea that, if you’re making a logic model that defines your program’s goals and outcomes, then the key is being clear about which outcomes and when. In that context, you might define short-term outcomes as “educators are building the knowledge needed to implement the program.” Mid-term outcomes are “You’re seeing large-scale shifts in instructional practice.” And those long-term outcomes are student achievement. And so if you’ve collected data in all three of those buckets, then you have a pretty strong case that [the long-term outcome] is at least correlated to [the short-term one].

Audience question: What advice would you give to someone who wants to start doing this work, but is afraid of taking that first step?

My advice would be “Go slow to go fast.” Start in one small area — we have schools that have even just started with a grade level. Understand that it’s going to take time for people to learn. As a team, we had to start really small and now we’re growing it and we maybe are growing it almost too fast, because it’s getting kind of crazy! Just know there’s going to be bumps along the road, but you’re learning together and it’s the process that’s important.

It’s about knowing your audience, so to speak. You know your building or your district. You know how open they are to trying something new, how much time it’s going to take for them to learn and know and do. Plot that out carefully in your planning and maybe start with something simple to test “Is this even going to be feasible down the road, or am I jumping in something too big and should start with even smaller stuff?”

Watch the full video of Kim and Greg’s conversation here!

Let's get started

Schedule a demo with one of our friendly team members.

Schedule a Demo