At an annual meeting of the Association for Behavior Analysis, where over 2,000 behavioral scientists gather each year, a woman professor with whom I was acquainted told me she had organized, among her students, a Rat Olympics. I was excited! What a good way to interest students in operant conditioning!
Alas, when I saw her videos, I was disappointed. Students were luring rats to climb ropes by holding food in front of them. They were doing such things as baiting or shooing rats through tunnels, or causing jumps by putting a rat on a wooden bridge with a gap in it, placing food on the other end of the bridge, and then gradually widening the gap.
"Why didn't you use clickers?" I asked the teacher.
"Oh, they didn't have time! They only had a few weeks," she responded. What? Time for what? All this luring and shooing was a pretty slow business, after all.
Time, she meant, to condition the clicker.
Science is supposed to be logical and based on proven facts, but scientists are humans, after all. They develop customs and practices in laboratory work that are based on nothing more than history, opinions, assumptions, and even superstitions. This idea that developing a conditioned reinforcer is a complex and difficult task, and that it must precede any "training" you will do by means of a conditioned reinforcer, is an example: it's not a fact, it's just a laboratory tradition.
Clicker training, or Skinnerian shaping as we practice it today, involves two kinds of conditioning. The first is classical (Pavlovian) in which some association is made more or less unconsciously, between a stimulus (a particular smell, say) and what it makes you think of (pizza, your boyfriend, new cars, the dentist's office). Pairing the click with a treat, therefore, initially simultaneously and then sometimes with an occasional delay, click-then-treat, creates this Pavlovian conditioning, plus some tolerance for a slightly delayed treat.
The second kind of conditioning is operant: if I, the learner, do a particular thing, I can make the click happen. I press the button on the parking garage machine, a ticket comes out and the barrier swings up, letting me into the garage. This is a conscious association; the learner deliberately engages in a repeat of whatever it did before, in expectation of a reinforcer. That's what Skinner meant by the word "operant." The learner is the operator; the learner runs the machine.
In the case of our learners, we want them to offer whatever behavior we clicked, hoping to make a click happen again. We first create the classical conditioning association, by clicking and instantly delivering a treat two or three times. When we see that the animal has noticed the food and is eating it and looking for more, we immediately choose some specific behavior to click. It needs to be something the animal is already doing anyway: looking at the trainer, say, or maybe looking at or smelling a target object we've presented. We click and treat during that behavior, several times in rapid succession. Now, right away, we are adding in the operant conditioning, using the clicker to mark some particular behavior as it is happening.
At some point, the learner figures it out and begins offering the behavior "on purpose." Now we have a properly conditioned reinforcer: it means, in the Pavlovian connection, "Click means treat is coming;" and it means, in the Skinnerian connection, "What YOU did, made me click." It marks, or identifies, a new operant behavior. And we did it all in the very first training session, probably in the very first two or three minutes.
Furthermore, in that very first session, we don't just sit there reinforcing the newly learned behavior over and over again. God forbid the animal should think there is only one way of making clicks happen. So we might go on to reinforce other behaviors, as well as to shape additional criteria for the one we started with.
For example, in debarking a kennel at a shelter, I might organize a few volunteers to go up and down the line clicking and treating any dog with its mouth shut (i.e., a quiet dog). And then, while some dogs still need to be clicked only for not barking, for dogs that are now quiet I might suggest that they click and treat any dog that is not jumping (i.e., any dog with four feet on the floor).
When most of the dogs are a) quiet and b) not jumping, I might move people on to clicking for eye contact. Most of the dogs would now tend to be looking into people's faces anyway, as the volunteers pass up and down the line of cages: "See, I'm quiet, see, I'm standing still, click me, click me." Looking up to see a person's eyes tends to lead to sitting, and bingo, we can now capture sitting, too.
By the time a dog that was once barking and bouncing around is sitting quietly at its cage door, that dog has learned not just one but four different ways to make a person click. It has "learned to learn." It has also discovered that giving one's attention to humans can really pay off; and it is well on the way to being adoptable. (This procedure takes about ten or twenty minutes, depending on how many dogs and volunteers you have, and it can be permanent, needing only to be refreshed briefly and sporadically, especially when new dogs come in.)
What happened with rats and pigeons in the operant laboratory was quite different, I think. The investigators only needed one behavior, pressing the lever or pecking a key. The learners were often required to do it over and over, many times, for each food reward. The investigators were not, as a rule, interested variations of behavior; quite the opposite. If a rat associated some unrelated act, backing up, say, with the delivery of food, that "superstitious" behavior would interfere with the lever-pressing and might skew the experimental results.
To avoid this inconvenient accident researchers made sure, with a minimum of two hundred reinforcers delivered randomly, that any accidental associations of behavior and food delivery were deliberately deconditioned, or extinguished, by going unreinforced in a blizzard of deliberately random events.
So this became the rule, in laboratories and textbooks and college classes: you have to go through an elaborate conditioning procedure to develop a conditioned reinforcer, before you use it. I was taught that, in 1963, by my dolphin training teacher, Ron Turner, a graduate student of behaviorism at Columbia University. You MUST make sure the dolphin gets a whistle and a fish at least two hundred times, with no particular associations, not at the same time, not in the same spot, not in association with any other behavior, to avoid the development of superstitious behavior. What a pain in the neck, especially with a newly caught dolphin that wasn't eating very well anyway, and might not take more than ten or twenty fish in a day.
And actually, it was a waste of time. You go to all this trouble to teach the animal that the stimulus you're using, the whistle or the click or the blink or whatever, means NOTHING except "food is coming." And then you want to turn around and ask the animal to learn the very opposite: "No, hey, now the click means you get food if you do that again." Maybe it made sense in the laboratory. In our world, where we want a huge variety of operant behavior as soon as possible, and we want it all on cue (under stimulus control, which is another level of operant behavior) as soon as possible, we want the animal to know and value its own magnificent ability to Make People Do Stuff from the very beginning.
You do have to maintain that Pavlovian connection, as JesÃºs Rosales-Ruiz and his students have elegantly proven in their research. People who make a habit of delaying the treat, of not having treats handy, or sometimes forgetting to treat, of substituting petting when the animal hates to be petted, may find that the clicker becomes less meaningful because the automatic conditioning has deteriorated. Often the giveaway, with dogs, at least, is that the animal fails to stop what it's doing when it hears the click, but continues the behavior, while watching your hand; hand movement has become conditioned instead.
So, "charging up the clicker" is an integral part of the first training session; and perhaps in later training sessions you might begin with two or three closely paired clicks and treats, just to keep the Pavlovian conditioning strong. And deliberately "charging up the clicker" by deliberately giving ten or twenty clicks and treats in rapid succession, unrelated to any particular behavior, is a remedial technique one might use with some dog who has stopped responding to the clicker because of some accidental deconditioning by the trainer.
But any long, randomized process you may have been told about is, I think, an artifact; a useless leftover of a laboratory procedure that is due more to custom (where did the magic number 200 come from, anyway?) than to science, and that in any case undermines the operant learning, and the important function of the clicker as event marker, which is the whole point of clicker training.
Post new comment