Thus we begin the second half of the quarter…
We have already discussed the success of instrumental conditioning, specifically, that of reinforcement training. To go one step deeper, we will now focus on more detailed aspects of this strategy. We will look at how exactly reinforcement is administered. An experimenter by the name of Bijou focuses on the specifics of reinforcement and hypothesizes that “intermittent reinforcement (whether fixed or irregular in pattern) markedly increases resistance to extinction as compared to continuous reinforcement” (Bijou, 47). To discuss this, it is helpful to define the most common types of reinforcement patterns as quoted in the article:
1. Continuous reinforcement: In the experimentally defined situation, a response is reinforced on each occasion of its occurrence.
2. Intermittent reinforcement: A reinforced occurrence of a response is preceded or succeeded on at least one occasion by an unreinforced occurrence of the response. No differentiation is made among the terms descriptive of this procedure; namely, intermittent reinforcement, partial reinforcement and periodic conditioning.
a) Interval intermittent reinforcement: The pattern of a reinforcement is controlled by temporal events in the external environment.
b) Ratio intermittent reinforcement: The pattern of a reinforcement is dependent on the subject’s behavior and follows a specific ratio.
c) Fixed and Variable patterns of intermittent reinforcement: The relationship between the reinforced and nonreinforced is either fixed or variable. Both interval and ratio may be either fixed or variable in pattern.
Bijou’s experiment focuses on the following question: “For a given number of reinforcements, is there a difference in the extinctive behavior of two groups of preschool children when the training of one group is on a continuous reinforcement pattern, and the training of the other group is on a variable intermittent schedule with reinforcement following 20% of the responses?” (Bijou, 48). To answer this, Bijou uses a box with two holes, one above the other such that the top is the input hole and the bottom is the output hole. The subjects are allowed to place a rubber ball in the top hole which is the action that is rewarded. A motor driven machine is also utilized to dispense trinkets as rewards. The experimenters explain to the subjects how the box works and allow the children to play with the apparatus. One group of children, group A, are rewarded with trinket reinforcements six times in a consecutive order. The second group of children, group B, are rewarded with trinket reinforcements six times such that they are received over 30 responses, specifically on trials 1, 6, 13, 17, 23, and 30. For both group A and group B, the extinction period followed immediately after such that the children received no reinforcement for three and a half minutes regardless of their behavior.
From the data collected it is clear that the rate of extinction is greater for the group that was continuously reinforced (group A: 100%) than for the group that was reinforced intermittently (group B: 20%). The mean number of responses for the extinction period is 15.3 and 22.0 for group A and group B respectively.
Bijou goes on to do a second experiment where now, the trinket dispenser is accentuated by a buzzer. Everything else is identical to the first experiment. The same general trend occurs where the continuously reinforced group shows a greater extinction rate than the intermittently reinforced group. The mean number of responses for the extinction period is 13.0 and 26.2 for group A and group B respectively.
From Bijou’s experiments, we are provided with two interesting points. First, “for a given number of reinforcements, a variable ratio intermittent distribution is associated with more resistance to extinction than a continuous schedule” (Bijou, 52). This is interesting because one would intuitively think that a continuous schedule would better reinforce a behavior than would an intermittent schedule. An increase in contingency usually is related to greater learning. Therefore, one might assume the robustness and resiliency of such a strongly learned behavior would prevail and resist extinction. However, this is not the case. What could account for this?
Hypothesis 1) Because an intermittent type of reinforcement does not reward the subject on every trial, the subject could just assume it is just one of those trials that he or she will not rewarded. They realize that, to be rewarded, they have to endure these trials with no reward and therefore will not be able to tell that they’ve entered an extinction phase or that anything has changed. However, if the continuously reinforced subjects are not rewarded, this is not “normal,” and they realize that something has changed. The “surprise” factor is different.
Hypothesis 2) The intermittent type of reinforcement already establishes a moderate amount of frustration on the subjects. They are a bit upset that they do not receive rewards all the time, but this negative feeling is minor. The continuous type of reinforcement establishes no frustration on the other group of subjects since they are always rewarded. Although once no reward is administered, both group A and group B will be greatly frustrated, the difference between the initial levels of frustration and the final levels of frustration will be different for group A and for group B. Group A will feel a greater change in frustration while group B, already frustrated a bit, will feel a smaller change in frustration.
The second point is that the difference in the mean amount of responses between group A and group B are higher in the second experiment than it is in the first experiment (Bijou, 52). This means that the increased distinctiveness of the auditory stimulus serves as a stronger conditioned reinforcer. The buzz increases the saliency of the reward, and because the difference between the means will be greater.
As we conclude this post, there is one thing I would like to mention. This study held the number of reward trials constant such that both group A and group B received 6 rewards. However, because group B was rewarded intermittently, group B participated in the experiment for a longer amount of time and for more trials (group A had only 6 trials, all of which were rewarded, and group B had 30 trials, of which 6 were rewarded). This difference could account for the results found as opposed to the type of reinforcement pattern. Therefore, to investigate this, future research can be done such that the amount of total trials, the sum of the reinforced and nonreinforced, is held constant. The duration of the experiment is the same for both groups. The comparison of this data to the data found in Bijou’s article can then explain this phenomena further and more accurately.
Bijou, S. W. (1957). Patterns of reinforcement and resistance to extinction in young children. Child Development, 28(1), 47-54.