Home » Library » Learn » Training Theory

Should You Use No Reward Markers? Examining the Debate

By trainer@canines... on 04/01/2010

Filed in - Training Theory - Reaching the Animal Mind

Beware

What is a No Reward Marker (NRM), and is it a useful tool or an awful mistake?

Should a good clicker trainer use an NRM, and, if so, when?

It’s out there, lurking. At times you feel it stalking just behind you. At last it springs as someone asks, “Why don’t you tell your dog it was wrong?”

The NRM debate has been reopened once more.

The debate arises in cycles, but next time you’ll be prepared for it, no matter how stealthily it creeps.

What is an NRM, anyway?

On the surface, an NRM is rather straightforward. At times, though, there is considerable debate regarding its true nature. The No Reward Marker is usually described as “conditioned extinction,” as its intention is to inform the learner that no reinforcement awaits down the path he is considering. One of the most well-known examples of NRM is the children’s game Hot & Cold, where feedback of “getting warmer” guides the participant to a goal object, while “colder” indicates that the participant should try another route.

At first glance, this looks like a good use of continuous feedback. However, a closer examination reveals that the “cold” feedback is really unnecessary. Savvy players start by spinning until they hear “hot;” they do not waste their time passing through the room experimenting with how many “cold” responses they can get. In fact, the lack of a “hot” response is equivalent to a “cold” response, as anyone who has played a shaping game can attest. In the clicker trainer’s version of Hot & Cold, the feedback is click and no-click respectively. In both the children’s version and the clicker version, the cold answer—“cold” or no click—adds no further information.

Testing NRMs in humans

At a Shedd Aquarium training workshop, Ken Ramirez led us through a variety of training games to develop skills in timing, cuing, chaining, and more. After several days, he gave us a new challenge: train a human subject to perform three simple cued behaviors using both a conditioned reinforcer and an NRM. Our task would not be complete unless the three behaviors could be performed successfully—and the subject could recognize and define our NRM stimulus.

The results were amazing. Even though we had been discussing NRMs so recently and the concept was fresh in our minds (unlike the minds of our usual animal subjects), only one learner out of fifteen guessed that the extra stimulus was supposed to be useful data as an NRM.

Meanwhile, every learner exhibited frustration, and even occasional aggression (sometimes veiled as jokes and sometimes not). About half of the learners never completed the tasks in the allotted time, while they had been highly successful in the other games.

In my own learner, I saw cue inversion (frustration and confusion with the NRM caused confusion in other areas) and a general loss of enthusiasm. Even though I was making everything as plain and simple as possible—marking errors with the same precision as I would click correct responses, and trying to follow errors with a chance for success—I could see her attitude souring.

Yet as a group we’d done well. Ours was the first session in the years of Ken’s teaching where someone had not stormed out angrily during the NRM challenge.

This experience cemented my current opinion on the NRM. With this confusion and frustration in humans who already knew the NRM concept, why risk those feelings with those who cannot discuss it with us?

Conditioned extinction or aversive?

There is far more to the NRM debate than this, however. Stand back, as this is where I'll step on some toes…

By the time an NRM has real meaning for the learner, it has become positive punishment.

An NRM may cue extinction, but in doing so it also signals a loss of opportunity. The chance of earning reinforcement has closed. If the subject changes his behavior to avoid the NRM—and that is the whole point of its use—then the NRM is by definition an aversive. It may be a mild aversive or it may be severe, depending upon the learner’s mindset, but it is a stimulus the learner is actively working to avoid. Because the trainer introduces the NRM upon the learner’s mistake (adds an aversive stimulus that modifies behavior), the NRM is positive punishment.

Is the NRM necessarily evil? Probably not, but it's not the completely neutral stimulus that many claim. The punishment continuum runs from fairly mild to extremely harsh, and it is the learner who interprets the severity of any given punisher. If a trainer wishes to avoid the use of positive punishment, he should be aware of all its forms, including the form of an NRM.

Observe a contestant on a game show. When he answers a question and then hears the buzzer marking a wrong answer, does his body language indicate that the buzzer is a neutral stimulus, serving only as useful data? Certainly not! The disappointed contestant may exhibit slumping posture, frustrated displacement gestures, perhaps profanity—even if he does not lose points or money, only the opportunity to earn more of the same. For someone who really wants to be right, being wrong is quite aversive. (A learner who doesn’t care about being right is facing a motivation problem, not a data problem; an NRM won’t help and may even hinder the development of motivation.)

Broken contracts

Some trainers use NRMs not only to shape a new behavior, but to indicate any mistake a learner makes, including a failure to respond properly to a cue (no response or an incorrect response). For example, if a trainer sends a dog to select a scented object from a collection and the dog retrieves the wrong one, the trainer might say “oops” as the dog picks up the incorrect object.

While, superficially, this seems to be relevant data, it can break down careful training. Positively-trained cues are themselves tertiary reinforcers. An NRM after a failed cue breaks the contract of reinforcement, offering P+ after a tertiary reinforcer—and creates serious risk of poisoning the cue (and rendering it useless for future use in chains).

(Note: If you find yourself using an NRM after a cue, review the cue. Why isn't it working? The issue is probably not the NRM at all!)

Many animals (and humans) exhibiting stress in challenging conditions are stressed not only by the tasks they face, but by the changing schedules of reinforcement and the increased chance of punishment. Is the dog really finding scent discrimination so difficult—or is the dog frustrated by the learning conditions?

Is this data necessary?

Proponents argue that NRMs are simply data to inform the learner. They say that it’s not fair to leave a dog guessing; it’s kinder to tell him what’s not working.

Why tell the dog that he wasn't successful? This question is usually asked in a more philosophical way, but I mean it very practically—if the dog needs an NRM to realize that he isn't being reinforced, the trainer has screwed up badly. Why doesn't the dog know already? Clicker training is pretty much yes/no. If training has been set up so that the dog can't tell if he's been successful, and he needs supplemental information, then something is wrong! (See “Fixing behavior without an NRM” for more on this.)

Fixing Behaviors without an NRM

Training my dog Shakespeare the (admittedly silly) behavior of putting his head into a bucket for a shaping demo was going smoothly, until I inadvertently reinforced my paw-oriented dog for moving his paw as he dipped his head. Within seconds I had a dog convinced that I wanted his right paw in the bucket! While many trainers might have resorted to an NRM to discourage the paw behavior, I chose to repair the behavior using only good timing and careful placement of reinforcement.

You’ll see in the video how Shakespeare is frustrated by his low rate of success at first. Would his attitude have been improved if I’d told him that his behavior was incorrect? Would an NRM have helped him know exactly how to modify the behavior the way I wanted, or would the NRM have been associated with bucket interaction itself? To fix the behavior, I tightened up my timing and temporarily reduced criteria. With the resulting jump in rate of reinforcement, the learner was able to quickly grasp what I wanted and retain it. (Edits in the video are solely to save time as Shakespeare located and ate his treats.) You’ll see the superstitious paw movement persist and then fade under the weight of reinforcement for the desired behavior.

Note that it was far more tempting for Shakespeare to place his paw in the bucket as he approached it from a distance; the test for this behavior was his ability to approach from across the room and place his head in the bucket cleanly.

Can it ever be useful?

So is it always wrong to mark a behavior as non-reinforcing? Keep in mind that blanket generalizations are always wrong (irony intended!). Some informational cues could be called NRMs, because they signal the lack of potential for reinforcement—a red light rather than the more common and cuing green light. My dogs have learned if I say “shoo” while I’m at the computer, I’m not available to play, while at other times a nose poke might elicit attention. In this situation, “shoo” is a signal that future offered behaviors will not be reinforced. (Most pet owners will recognize that our pets know a host of these types of cues, mostly non-verbal.)

Most of the time, however, I see NRMs used as a crutch where the initial training was not clean and precise. This puts the burden of the trainer’s mistake on the learner, who didn’t receive adequate data in the first place and must now sort through additional cues, stimuli, and frustration. The vast majority of the time, the “need” for an NRM can be avoided through proper attention to training basics—good timing, appropriate criteria, and a high rate of reinforcement.

I think there is an application for NRMs in a situation where click/non-click is not clear to the subject, but these situations are rare and most trainers will not encounter them. This makes training an NRM “in case of need” a waste of effort. Spend your time training more cleanly in the first place and you’ll never need the NRM.

Alternatives to a punishing NRM

So what’s a trainer to do when a learner errs? There are several alternatives to the NRM as unintentional punisher. A time-out (usually the removal of the trainer’s attention and/or opportunity) is negative punishment, rather than positive punishment. A least-reinforcing stimulus (LRS, a complete lack of response from the trainer or environment) is true extinction—and generally the best response to an error. A trainer working at a good pace (15-20 reps per minute for a simple behavior) may pause only a second for an LRS and then move on with the next repetition, but that is enough to note the error and its (lack of) consequence. (I use an LRS at the 59 second point in the video “Fixing Behaviors without an NRM.”)

Training without aversives

Even potentially useful tools can be harmful, especially if they are a crutch for the sloppy use of preferred tools. In Animal Training: Successful Animal Management Through Positive Reinforcement, Ken Ramirez wrote, “I frequently discourage [novice] trainers from ever conditioning a ‘no’ signal, because if there is not a signal for ‘not’ it cannot be overused.”

In the process of writing this article, even though I am arguing against the use of NRMs, I have found myself using more NRMs in my own training—having them on my mind made me more likely to use them even though I knew better!

While it is true that many learners can work through NRMs, it is equally true that many cannot (and many who can, do better without). It is a difficult habit for a trainer to break. Having the option can create the opportunity or even the need. As songwriter Jonathan Coulton noted, “We do what we must because we can.” To avoid aversives in training, be aware of them in all forms, and plan accordingly.

About the author

Laura VanArendonk Baugh, CPDT, KPACTP, started playing with animals at an early age and never grew out of it. She owns Canines In Action, Inc. in Indianapolis, where she lives with her tolerant husband and her dobermans. Laura is also a Karen Pryor Academy faculty member.

NRMs

Submitted by allmeansall on Wed, 2010/12/15 - 2:01pm.

This is an excellent, thoughtful article. Thank you for the reminder to stay away from NRMs! I use them out of habit, unintentionally sometimes, and that is not good, clean training! My dog has shut down practicing agility because of too many "oops" and not enough rewards. I changed that and now she is joyfully running a course at full speed, anticipating take off at a "wait". Now I say "try again" cheerfully, but don't even need that. It is more a crutch for me than for her! I need to continue to learn good habits and better training...even after many years of professionally training! Keep these wonderful articles on the KPCT site coming please; I am so grateful for such good information over the net!

Bonnie

Scream this from the rooftops!

Submitted by Criosphynx on Thu, 2010/07/22 - 11:33am.

Great article, I began to get away from NRMs when my dog would cringe and shutdown the second I used one...this made me realize she saw it as a verbal correction...or P+ which I did not want.

The other dogs seemed either frustrated by them, or indifferent, and would learn despite them. I also find, like you explained to the other poster, that in many "real world" contexts, they act as an interupter, or simply as a cue to reorient to the handler. :)

NRM to define boundaries

Submitted by Alyna on Tue, 2010/04/20 - 9:44pm.

You ask "why tell a dog he wasn't successful?" and I wanted to outline a situation in which it is crucial (I think) to communicate to my dog that I don't want her to do something. My example - a park without a secure enclosure. I use "Uh-uh" as a NRM when I want to communicate boundaries to my dog as in "don't go beyond that point" - like a park that doesn't have a fence. She approaches my imaginary border and I say "uh-uh" she turns around I reward her for staying within my boundaries. Giving her treats IN the park is great but teaching her that I DON'T want her to go beyond a certain border - how else would I convey that without a NRM? The park near me has hedges that she could easily run through - she knows that anytime she has even begun to look like she is about to go through those hedges she get a NRM. She turns away from the hedges she gets a reward. I'm not trying to undermine her success - I am only using NRMs to help her be safe.

We would yell "No!" if a child were about to run into a street or touch a hot stove, not simply yell their name and redirect their attention or ask for a different behavior. Instinct is to tell them INSTANTLY what not to do. Having a NRM allows me to do that with my dog. It isn't just our words that communicate but our tone and volume. I think that as good as my dog is about coming when called, in a stressful situation, will she come if my voice sounds completely different? I've tried to train for this but I just don't think I can replicate the tone that actual fear puts into your voice.

What about body language that acts as a NRM? i.e. stopping when a dog is pulling and moving forward when leash is loose or turning away when they jump or removing the treat from sight if an undesirable behavior is presented like hopping up for the treat? What about saying "Ouch" and ending play when a dog gets too rough? Aren't these all ways of telling a dog that they weren't successful and aren't these important training techniques?

I think this is a really interesting topic and I'm really looking forward to learning more and this article has sparked a lot of questions for me (obviously!). Good read!

NRMs vs Cues vs Punishment

Submitted by trainer@canines... on Tue, 2010/04/27 - 2:57am.

Thanks for writing! You bring up some excellent questions, and I'm not sure I can answer them thoroughly in the space of a comment, but I'll see what I can do.

First of all, if we ignore our human language assumptions and look only at the dog's behavior, I'd guess that in the park your dog is demonstrating that "uh-uh" is a cue to check in with you, rather than a true NRM. (If it really were a true NRM, we likely wouldn't be having this conversation, as you would have had to indicate to your dog only a couple of times that the hedge was a boundary.)

Alternately, your "uh-uh" may be a punisher, serving to reduce the behavior of going through the hedge. (If it is reducing the occurrence of behavior, it is by definition a punisher, regardless of whether or not you personally feel it's an aversive.)

Interrupting cues -- which is any cue strong enough to override a behavior in progress -- are a great way to block unwanted or unsafe behavior. Interestingly, I've found that my practice with OC has replaced what you describe as instinct, and I tend to yell instructions in case of emergency rather than a simple (and less useful) "no." Just a few moments ago, I let my dogs out and immediately heard a dogfight start -- there were strange dogs in my fenced yard! I ran out into the dark and as another of my dogs started toward the fight, I shouted "come!" Telling her "no" wouldn't have helped the situation -- no, what? No, don't notice that strange dog in your yard fighting with your mate? No, don't run that way? No, don't look at me as you pass? This was definitely a safety issue, but I needed what Kathy Sdao calls a trumping cue, not an NRM.

While "no!" might seem perfectly clear to me, it really doesn't convey much information. One of my dogs would freeze at hearing that -- which I think is what you want, from your comment. But another of my dogs would bolt, possibly sending her right into danger. It's much safer for me to call "come" or "down," or even "freeze" if you want to train that, which cues a specific response rather than hoping for inhibition under stress. (If you have trained "no!" to mean "freeze," then it's a behavioral cue, not an NRM!)

Yes, body language and environmental aspects can become cues or NRMs, of course, as I mentioned. But don't confuse removing attention or removing a treat with NRMs; both are negative punishment, if they affect future behavior. Again, consider the whole picture: if a dog jumps for the treat, and I ignore the behavior or negatively punish it, do I really need to inform him that it didn't work -- or does he realize that jumping was a waste of effort? If he doesn't recognize the difference between success and failure, I'm doing something wrong as a trainer, and adding an NRM won't fix my mistakes!

If my body language serves as an incidental NRM (I reach for the door, dog jumps, I return my hand to my side), it still conveys information to the dog, of course -- but there is no need for me to add a formal NRM, nor to consciously add it to my toolbox.

training duration without NRMs

Submitted by trainer@canines... on Tue, 2010/04/20 - 11:26am.

I've had a couple of questions from this article about training duration; I just posted a blog entry about using "300 Peck" to train duration without the use of a verbal NRM. See http://blog.caninesinaction.com/2010/04/laev-throws-me-a-bone/ if interested.

Thanks!

Submitted by izimi on Sat, 2010/04/10 - 5:57am.

A very good article! Thank you! I have been thinking of introducing a NRM, but after reading this I have changes my mind.

Marked Time Outs

Submitted by khartshorn on Fri, 2010/04/02 - 10:33am.

I'm curious - what do you think of a marker (basically an NRM) for a time out? I've used those before when the reinforcement was evironmental (i.e. window barking / chasing) to signal a time out and more closely mark the behvior that I'd like to eliminate.

Delta signals

Submitted by trainer@canines... on Thu, 2010/04/08 - 12:43pm.

More accurately, marking a time-out (if it serves as P-) is a Delta signal/stimulus (a conditioned punisher), which is also an aversive which might be mild or severe depending upon the learner. Both NRMS and Deltas effectively become positive punishment with use.

An additional point on NRMs and Deltas, both -- remember that if punishment is going to work, it will work in very few applications -- so an intended NRM or a Delta which needs repeated more than 2-3 times (not per session, but in a behavior's lifetime) is not being used effectively.

Could you better explain the

Submitted by Kaitlyn Heideman on Fri, 2010/04/02 - 2:12am.

Could you better explain the experiment with the NRM in humans? NRM stimulus? I'm a little confused.

Human experiment

Submitted by Alyna on Wed, 2010/04/21 - 10:19pm.

I'm with Kaitlyn! I would like more info. My understanding of using a NRM is that is is "charged" to mean "nope - not what I want" just like charging a clicker means (YES! You got it!) so if this is the case, even with humans then you would make it clear what the reward markers were and the NRMs were prior to trying to teach any behavior. If this were the case, how did the humans get frustrated? Wouldn't it have made learning their task faster?

Similarly, you don't add the cue "Sit" until you know you can elicit a sit from a dog. You wouldn't use a NRM marker without making it mean something first - or else, how on earth would a dog (or person) understand it? This is why I guess I don't understand how this human experiment actually gives accurate data on how dogs perceive a NRM - if a human is frustrated because they don't have enough information, a click would be just as meaningless if it weren't made clear what the click means. Unless I'm just not understanding the human experiment altogether...which is why I would love more info! Please!

the human NRM experience

Submitted by trainer@canines... on Tue, 2010/04/27 - 2:42am.

Unfortunately, I'm going to be a bit vague on the NRM experiment in humans, because it's a teaching tool which Ken Ramirez uses regularly, and if I explain in too much detail it would spoil the experience for others.

However, we all understood that our learners would not automatically understand our arbitrary NRM stimulus. Remember, part of our test was to condition the NRM so that our learner could define it -- if we trained the behaviors and yet our learner could not define the meaning of the stimulus, we'd failed!

Now, this was an exalted company of trainers. Many of our number had written books, taught seminars and workshops, presented at Expo, and more -- this was an advanced group. You would recognize these names. With all of their expertise at conditioning stimuli, only ONE learner guessed that the stimulus was supposed to be an NRM, and she wasn't really certain of it. Many realized that their NRMs might not be as clear to their animals as the trainer assumes! (It's easy for us to assume the learner knows the NRM, but does he really? Or is he just offering more behavior despite the NRM?)

It's a bit tough to try with a knowing human, because if I tell you I'm conditioning an NRM, you of course recognize it. :) But our critters don't have the luxury of listening to our abstract debates on the utility of an NRM; all they know is that sometimes when they offer behavior, you tell them you aren't going to pay.

Interestingly, I have for years almost never "charged the clicker" with dogs, but jumped right in and they picked it up promptly. (Other species may vary.) Yet an NRM has to be carefully conditioned, and I have never seen a conditioning protocol which did not frustrate the learner. My rough rule of thumb regarding training techniques is, would I do it with a grizzly bear? And no, I don't think it's wise to pull back food from a grizzly! ;)