Variable Reinforcement

Unanswered posts | Active topics

Board index » The Art of Natural Dressage » Groundwork Questions and Discussions

All times are UTC+01:00

Post new topic Reply to topic

Page 1 of 4

[ 47 posts ]

Go to page 1 2 3 4 Next

Print view

Previous topic | Next topic

Author

Message

Karen

Post subject: Variable Reinforcement

PostPosted: Thu Oct 30, 2008 6:17 pm

Moderator

Joined: Thu May 17, 2007 8:18 pm
Posts: 4941
Location: Alberta

I need a refresher course. I could go google to get the information I need, but I thought that it would be more helpful to amass the ideas here, from the wealth of knowledge our forum members have.

It's time for me to begin to approach variable or intermittent rewarding with Tam. Partly as an experiment, but also to help round out my knowledge of clicker training. I can read all about it, but if I don't practice it, I'm not really going to know how to do it.

To this point, I still basically reward in the exact same way and nearly the exact same frequency with any given behavior as I do when it's at it's very beginning stages. I may have more duration, but the rewards come on a very regular and expected basis.

Can anyone outline to me under what circumstances you will begin a variable reinforcement schedule, and how...and maybe why would be good to know too?

Thank you!

_________________
"Ride reverently, as if each step is the axis on which the earth revolves"

Top

Profile

Reply with quote

Brenda

Post subject: Re: Variable Reinforcement

PostPosted: Thu Oct 30, 2008 7:05 pm

Joined: Sat Feb 09, 2008 12:03 am
Posts: 1351
Location: Washington, Maine USA

Karen wrote:

Can anyone outline to me under what circumstances you will begin a variable reinforcement schedule, and how...and maybe why would be good to know too?

Thank you!

Hey Karen! Great idea!

Here's a definition, always a good place to start a discussion:

INTERMITTENT REINFORCEMENT
A term that applies to schedules of reinforcement in which only some responses are reinforced; ratio and interval schedules are common examples. A person trained on an intermittent schedule of reinforcement will continue making a response during extinction for a longer period of time than a person trained on a continuous schedule. Intermittent reinforcement also produces more responding for fewer reinforcers, thus reducing the problem of satiation.

More later...

Brenda

_________________
http://www.youtube.com/user/Lucy04574
http://www.youtube.com/user/Jack04574

Top

Profile

Reply with quote

Brenda

Post subject:

PostPosted: Thu Oct 30, 2008 7:31 pm

Joined: Sat Feb 09, 2008 12:03 am
Posts: 1351
Location: Washington, Maine USA

Hi Karen and all,

The way I look at it, whenever I raise my criteria, be it for quality, duration, # of reps, chaining behaviors together, I am using intermittent reinforcement, or more specifically a variable ratio schedule of reinforcement. In the process of raising my criteria, I am always extinguishing previous steps. For example step 1 and 2 are extinguished (or in the process of) when I move on the step 3/4, IOW step 2 doesn't get reinforced anymore.

The most important thing that I keep in mind when I train is getting stuck at a intermediate step up the ladder on the way to my goal. For example, Lucy's canter right now is pretty primitive, especially in duration/# of strides. IOW, it has SOME components I like, but there are a lot I still need to shape and install. So iam trying to raise my criteria to 3 or 4 strides, regardless of quality right now, just some duration and/or effort.

So if I keep clicking stride, stride, HOP too many times, it will be more and more difficult ot extinguish that step in the future and get say 3 or 4 strides or 20, etc. So, being aware of that little black hole, now I am thinking up a new plan, thinking up ways to 'get the behavior', going back to groundwork, getting her canter more fluid and rhythmic, more strides, etc, before I try to chain it into a riding sequence.

So in the process of shifting my behavior criteria up and down within a certain range (say between step 2 and 4 of a 10 step goal), I am in fact almost always using intermittent reinforcement VS continuous reinforcement (say reinforcing step 2 forever EVERY time is is performed).

Some food for thought...

Brenda

_________________
http://www.youtube.com/user/Lucy04574

http://www.youtube.com/user/Jack04574

Top

Profile

Reply with quote

Brenda

Post subject:

PostPosted: Thu Oct 30, 2008 7:59 pm

Joined: Sat Feb 09, 2008 12:03 am
Posts: 1351
Location: Washington, Maine USA

A few more helpful definitions:

CONTINUOUS REINFORCEMENT
is a schedule of reinforcement in which every response is reinforced. This technique is usually used when a person is first learning a behavior, particularly in shaping procedures.

EXTINCTION
Is a process in which a response is repeated without reinforcement. When we extinguish a behavior, we withhold the reinforcement that has maintained that behavior in the past so that responses go unreinforced. Note that extinguish is not the same as eliminate. There are many ways of eliminating a behavior besides extinction.

Theses are all sort of related so hopefully this helps to look at the bigger picture!!

I do use continuous reinforcement early in the shaping process, often when on step 1, which also adds some classical conditioning effect to the behavior and setting.

An important difference between intermittent reinforcement and continuous reinforcement is how resistant to extinction the behavior becomes. If you think about it, if a behavior is on a continuous schedule, i.e. reinforced EVERY time, if you do decide to stop reinforcing, the animal will NOTICE the absence of the reinforcer and give up performing that behavior.

On the other hand, if the reinforcer happens on an unpredictable but still supportive schedule (like ratio schedules), the animal will be more persistent at performing the behavior, i.e. keep trying to get the reinforcer longer instead of giving up!

Hope these ideas and definitions help get the ball rolling??

Brenda

_________________
http://www.youtube.com/user/Lucy04574

http://www.youtube.com/user/Jack04574

Top

Profile

Reply with quote

windhorsesue

Post subject:

PostPosted: Fri Oct 31, 2008 4:47 am

Joined: Sun May 20, 2007 5:52 am
Posts: 1852
Location: Taiwan, via NZ

Brenda, your writing is as useful and illuminating as ever! Thanks for your definitions and explanations. This helps me out a lot in a funny way.

Karen asked me to contribute to this thread, because she knows that I used to expound the "variable reinforcement schedule". I was just going to pop on over here and say.. oh well.. I'm not really a good person to ask because I gave up using it and since I did, training is going so much better for Sunrise and me".

And that brought up the question for me, of WHY, if variable reinforcement is so effective, (as I believe in theory it must be) training has improved since dispensing with it. Curious!

Now, reading your explanations, I realise I haven't stopped using it at all, I'm just using it in a different way, which I'm obviously better at, and am integrating variability into our training without even being aware of it. Wow! Cool!

Quote:

So if I keep clicking stride, stride, HOP too many times, it will be more and more difficult ot extinguish that step in the future and get say 3 or 4 strides or 20, etc

Yes yes! This was one of the problems I touched on in my diary last night. In teaching ramener, because my imagination at that time wouldn't furnish me with alternative things to teach to lead to the whole, I dwelled too long on reinforcing her first incorrect attempts. Now it's very difficult to extinguish.

Other tasks that I've taught since have progressed much better..and I know what it is that I am doing better.. I just hadn't made the connection and realised that this IS variable reinforcement. :!:

Brilliant! So now, I still capture and reinforce the first incorrect attempts to give her encouragement.. but then I ask for something more, or different.. so the reinforcement comes slower. And I can throw in some simple secondary task to up the rate of reinforcement again if she becomes frustrated or lost.

Thanks for enlightenment!

I wonder though if it was this that Karen was asking about, or the method of VR used in Bridge and Target training, which takes the same principle and applies is differently.

It is this that I don't use any more.

This method is to only reinforce a certain percent of the Terminal Bridges.. (the "clicks".), and to do it randomly, so that the horse will not know which click (or other TB ) will lead to a reinforcer. You still use encouragers, or Intermediate Bridge in the same way.
I know that this method works, as many people are successfully using it. I am not one of them. :oops:

I found that it was just too confusing for me and the horse.. and unneccessary, because I have IB or keep going signals that can create a similar effect without the confusion.

For example, in BandT.. you might
Cue Click
Cue Click
Cue Click Treat
Cue Click
Cue Click Treat
Cue Click
Cue Click
Cue click
Cue Click Treat
Cu Click
Cue Click treat.

In CT you can
Cue good good
Cue Click treat
Cue good good good
Cue good good
Cue Click treat
Etc.

In B and T you are supposed to begin with the random reinforcement right at the start of training, as soon as your horse has connected the bridge with the reward.

I was struggling with a lot of things when I was trying to use this method, so maybe it was not a very fair trial.. I'm sure not! B and T is not so easy to learn as there's little information out there being shared.. and unless you can get to a clinic, I think much is a mystery.

It did initially cause her to try extra hard to get the "right" click, but I found that it lead to a lot of frustration as well (probably from my own ineptness in using it,)and I did find that Sunrise seemed to be doubting the bridge signal, and often choosing to do her own thing rather than hang about for carrots which she loves.

Immediately that I gave it up, and started reinforcing wildly (something I picked up from watching your video of paint drying Karen. :lol:

) I suddenly found far greater enthusiasm and clarity myself, and Sunrise LOVED it.. and we've never looked back.

And since then I've been puzzling over why I've got a so much better motivated horse when in theory she should be complacent.. :lol:

Thanks Brenda for some food for thought.

Karen, you said:

Quote:

I'm wondering, if you have more duration, how can the rewards be coming at at the same frequency?

Are you not able to just add in some good good goods between clicks, and slowly dispense with some of the clicks for old well learned and everyday behaviours?

For example, you must have clicked and treated each step when you first asked Tam to walk with you, but now I'm sure you don't click every step.. you're asking for something else and clicking that.

I think this question you're asking is one a few of us are struggling with at the moment. When and how do we start just expecting our horses can perform certain things without needing continual reinforcement.. move on to adolescence and adulthood.
Did you see that nice little video clip that I found and posted a while ago of a clicker trainer working with a well trained horse? It was exactly this they were trying to address.

Cheers,
Sueh

_________________

I have not sought the horse of bits, bridles, saddles and shackles,
But the horse of the wind, the horse of freedom, the horse of the dream. [Robert Vavra]

Top

Profile

Reply with quote

windhorsesue

Post subject:

PostPosted: Fri Oct 31, 2008 5:17 am

Joined: Sun May 20, 2007 5:52 am
Posts: 1852
Location: Taiwan, via NZ

Brenda.. just thought of another question I want to ask you:

Following on from your definition of extinction as withholding a reinforcement until a behaviour is no longer performed.. How do you take that next step of withholding the reinforcement WITHOUT the behaviour being extinguished.. in simple tasks that you don't want to have to be continually clicking and treating for ever for.. a simple walk for example?

Cheers,Sue

_________________

I have not sought the horse of bits, bridles, saddles and shackles,

But the horse of the wind, the horse of freedom, the horse of the dream. [Robert Vavra]

Top

Profile

Reply with quote

Brenda

Post subject:

PostPosted: Fri Oct 31, 2008 12:22 pm

Joined: Sat Feb 09, 2008 12:03 am
Posts: 1351
Location: Washington, Maine USA

windhorsesue wrote:

Karen, you said:

Quote:

I'm wondering, if you have more duration, how can the rewards be coming at at the same frequency?

Sue

Yes! I agree Sue!

Karen,

From reading your training accounts, I'm pretty sure you are using intermittent reinforcement, as in duration, # of reps (twofers, treefers, tenfers??), and chaining??

Maybe what you are referring to is where you have got the behavior to where you want it, and it just stays there on a continuous schedule, reinforced every time??? A sort of stand alone behavior, like maybe GOTM, or some stretches??? Since these behaviors may never be a part of a chain, and always are performed independently?? then continuous reinforcement is all you need??

Also, I have found it very beneficial to keep 'difficult', either mental or physical, on a high rate of reinforcement, not necessarily a continuous schedule? Keeping a high rate of reinforcement along with "Varying the Variables" (another dog term) is intermittent reinforcement, and IMO, VERY good for foundation work! And I 'think' that 's what you are doing?

So for Lucy, handling her front feet forward to rasp is both physically demanding (off balance on 3 legs), AND also mentally taxing (some fear baggage). So I keep it on a high rate of reinforcement, i.e she gets cookies EVERY time! But the duration is always a bit different, and how I rasp changes, and other 'variables' are added, like changing location, etc. so I would say that that behavior is on an intermittent schedule, some reps are harder than others, even tho she gets treats for every 'lift'??

Hope some of this helps???

Brenda

_________________
http://www.youtube.com/user/Lucy04574

http://www.youtube.com/user/Jack04574

Top

Profile

Reply with quote

Brenda

Post subject:

PostPosted: Fri Oct 31, 2008 12:44 pm

Joined: Sat Feb 09, 2008 12:03 am
Posts: 1351
Location: Washington, Maine USA

windhorsesue wrote:

Good question! Well, from the animals point of view, the first few times you withhold reinforcement, they may 'think' the behavior is on extinction, and you can get a wide variety of responses to that, from shut down, to agression, as well as the desirable response of continuing to try or offer the behavior, or even get a bit of frustration or energy that adds to the behavior, or not!

As long as you GRADUALLY increase the # of steps, or time interval of your walks, fluctuating from easy-hard-easy, etc. you should be o.k. The telling is in the animal's response! If you get an undesirable response, you took too big a step. If you are not gaining on your chosen behavior, you took too small a step??

One of the positive side effects of intermittent reinforcement is persistence, to gamble, don't give up cuz the reinforcer is just a round the corner! And that is resistance to extinction! Think slot machines! Very successful at teaching people to keep performing without continuous reinforcement!!

I think when we describe a 'clicker savvy', or 'clicker wise' animal, that is one of the most important traits that they have acquired thru the shaping experience: persistence!!

Brenda

_________________
http://www.youtube.com/user/Lucy04574

http://www.youtube.com/user/Jack04574

Top

Profile

Reply with quote

Brenda

Post subject:

PostPosted: Fri Oct 31, 2008 12:55 pm

Joined: Sat Feb 09, 2008 12:03 am
Posts: 1351
Location: Washington, Maine USA

windhorsesue wrote:

So now, I still capture and reinforce the first incorrect attempts to give her encouragement..
Sue

Yes! And those 'incorrect' attempts tho are still a shadow of what you want in the end, so is really just early criteria setting?? So, the 'step one doesn't look anything like the final step' concept!!!

And if you keep reinforcing for a range, say reinforce step 3, 4, or 5, you end up with an average of reinforcing step 4, which is a type if intermittent reinforcement, variable ratio schedule (more on that another day), and this fluidity within the range, as well as shifting the range up the criteria ladder as you go, helps to keep you from getting stuck on an intermediate step and not reaching your final goal!

Brenda

_________________
http://www.youtube.com/user/Lucy04574

http://www.youtube.com/user/Jack04574

Top

Profile

Reply with quote

Brenda

Post subject:

PostPosted: Fri Oct 31, 2008 1:12 pm

Joined: Sat Feb 09, 2008 12:03 am
Posts: 1351
Location: Washington, Maine USA

windhorsesue wrote:

Immediately that I gave it up, and started reinforcing wildly (something I picked up from watching your video of paint drying Karen. :lol:

Thanks Brenda for some food for thought.

Sue

What I think you did was began using a much higher rate of reinforcement to start with. This of course helps learning get a foothold, but also I think there is a very valuable classical conditioning effect, i.e training/being with you/the sky is blue = good things, and that emotional component is a wonderful side effect to positive reinforcement training!

I have incorporated much of the BnT concepts into my training, and it has been very helpful, I LOVE the way BnT defines behaviors thru targeting (surprise surprise!) and that has been a tremendous asset to my animals learning as well as my own! I taught my CR instructor to explain riding tasks to me using targets and it is soooo helpful for me!!

But I have never adopted the 'thin' reinforcement schedule that BnT recommends in starting a behavior. Also, my terminal bridge:reinforcer is always 1:1 and is not variable, so the animal knows when the treat is coming, very important with horses I think, for safety. And like you use an intermediate bridge/keep going signal like 'good' to support longer or harder behaviors.

Thanks for sharing! Great topic!

Brenda

_________________
http://www.youtube.com/user/Lucy04574

http://www.youtube.com/user/Jack04574

Top

Profile

Reply with quote

Donald Redux

Post subject: Re: Variable Reinforcement

PostPosted: Fri Oct 31, 2008 1:50 pm

Joined: Fri Sep 21, 2007 4:10 am
Posts: 3688
Location: Pacific Northwest U.S.

Karen wrote:

I'm trying to recall where in clicker training (as say Kurland practices) there is an intermediate bridge used. Maybe I'm just sleepy, but I think the IB, or intermediate bridge came from SATS, and Kayce Cover. Correct me if I'm wrong.

Karen wrote:

To this point, I still basically reward in the exact same way and nearly the exact same frequency with any given behavior as I do when it's at it's very beginning stages. I may have more duration, but the rewards come on a very regular and expected basis.

Can anyone outline to me under what circumstances you will begin a variable reinforcement schedule, and how...and maybe why would be good to know too?

Thank you!

You are discussing two things, IB, and VR.

I'm not sure it's a rule, but I've read that they best gauge as to when to switch to VR is when the animal knows what you want, but you want to make it more concrete and more energetically responded to.

I read someone recently (sorry to whoever, can't remember the name of forum) that said about one in three responses could be reward for effective VR.

As for the IB, I have a dog, Rio, that given his past likely had a lot of chaos in his handling. He's highly dependent, even to the point of suppressing his canine instincts, on cues from his handler.

He has virtually no chase responses. Ignores deer, cats, everything, oddly enough, but squirrels. Those turn him into, if they come too close, a frenzied killing machine. Weird.

Nevertheless, this past of his and his personality, shows itself in his confusion about what is wanted.

It's as though he knows clearly, and even makes a move to comply when asked, for instance, to go to his RioBed. That's a spot by the wood stove behind the Alpha Female's (Kate) recliner chair.

He would make circles going in that direction, just as he does about other requests when he really doesn't want to comply.

He wants the reward, but he also wants to be doing something else. Like lying at Kate's feet, one of his favorite spots.

IB has been changing this. Not immediately or fast, but he is more often now going in a direct line to his RioBed.

That combined with VR has shown some change in his behavior. I reward about one in three, with, when he has a good direct and energetic path to his RioBed, a jackpot.

The trick now is to fade the treats. That's often my downfall. I just feel so strongly that effort should always be rewarded.

Problem is that makes the reward the goal, rather than the behavior itself.

Donald

_________________
Love is Trust, trust is All
~~~~~~~~~
So say Don, Altea, and Bonnie the Wonder Filly.

Top

Profile

Reply with quote

windhorsesue

Post subject:

PostPosted: Fri Oct 31, 2008 2:35 pm

Joined: Sun May 20, 2007 5:52 am
Posts: 1852
Location: Taiwan, via NZ

Quote:

Hope some of this helps???

Don't know about Karen but it sure is helping me! :wink:

Will have to read it all over again tomorrow and make sure I've collected all the meaning. Wonderful stuff, thanks Brenda!

Another Q for you, if you had to recommend one book on R+ training for a training companion.. which one would it be?

I'm full of q's at the mo.. thanks for your patience!
Sue

_________________

I have not sought the horse of bits, bridles, saddles and shackles,

But the horse of the wind, the horse of freedom, the horse of the dream. [Robert Vavra]

Top

Profile

Reply with quote

Brenda

Post subject: Re: Variable Reinforcement

PostPosted: Fri Oct 31, 2008 4:28 pm

Joined: Sat Feb 09, 2008 12:03 am
Posts: 1351
Location: Washington, Maine USA

Donald Redux wrote:

Hi Donald,

I know BnT uses and defines the IB, but it may have came into fashion with mammal trainers???...dunno.

In clicker training, a keep going signal or KGS is often used.

Tho the IB and the KGS may differ some? the scientific name is a tertiary reinforcer, named after the way it acquires it's reinforcing properties thru the secondary reinforcer to the primary reinforcer.

Brenda

_________________
http://www.youtube.com/user/Lucy04574

http://www.youtube.com/user/Jack04574

Top

Profile

Reply with quote

Brenda

Post subject:

PostPosted: Fri Oct 31, 2008 4:54 pm

Joined: Sat Feb 09, 2008 12:03 am
Posts: 1351
Location: Washington, Maine USA

windhorsesue wrote:

Another Q for you, if you had to recommend one book on R+ training for a training companion.. which one would it be?

I'm full of q's at the mo.. thanks for your patience!
Sue

No problem!

Well, I have collected a number of text books over the years, usually lower level human Psych, or learning and memory, stuff like that. They can often be gotten pretty cheap used from college book stores, or online. New, tho, they often cost a bundle!

I also took a Learning and Memory (college psych course) many years ago that was a good start, and then I could read other texts and articles have a better understanding of the concepts.

IMO the best teaching is to explore/research a concept and then try to put it into practice! Concepts like the IB, KGS, negative reinforcement/time out, extinction, classical conditioning effects, adding transferring cues, jackpots, stimulus control, chaining, fading, etc. are all fun to try out and see how they might fit in to your training program! For example, with simple targeting, you could explore a number of the above concepts and stay busy for months!!

But the one dog trainers rave about is by Ken Ramirez, 'Animal Training, Successful Animal Management thru Positive Reinforcement', tho I haven't read it, VERY pricey! Also Pam Reid's 'Accelerated Learning' was popular in agility training circles. And there are a few other animal training ones, usually dog related, that have come out since I retired!

Maybe others will have more recommendations!

Brenda

_________________
http://www.youtube.com/user/Lucy04574

http://www.youtube.com/user/Jack04574

Top

Profile

Reply with quote

Brenda

Post subject:

PostPosted: Fri Oct 31, 2008 5:03 pm

Joined: Sat Feb 09, 2008 12:03 am
Posts: 1351
Location: Washington, Maine USA

OOOPS! Sorry Donald...

BnT is Bridge and Target, same as SATS.

Your encouraging words sounds like a Keep Going Signal (KGS), where as the IB in SATS is continuous and the silence or cessation of it is used as a 'cold' marker, 'you're on the wrong track' idea. I do like the cessation idea but I just find it very tiring to use an IB for long periods of time, tho I did teach my dogs a few quick things with an IB followed by a C/T. So yes s you say they differ in execution, tho they are both technically types of tertiary reinforcers.

An example of a tertiary, secondary, and primary is one that I'm sure you and others have experienced:

So, animal performs desirable behavior:

"Good' (tertiary/3)
Click (secondary/2)
Treat (primary/1)

So the reinforcing properties of the tertiary comes thru it's association with the secondary which is conditioned by the primary. And you can have 4th and 5th level reinforcers, and we probably do, we just aren't aware of them, but the animal just might be!!

And as far as a mutual language, operant conditioning, extinction, classical conditioning, and desensitization are what makes the world go round, cuz all creatures seek out reinforcement and try to avoid aversives and just want to be stress free if possible!! IMO the horse's 'language' that everyone touts these days is mostly applied operant conditioning, tho heavy on the negative reinforcement/aversive side. And with say wild dogs, they use a lot of pressure and aversives to communicate with each other, and that may be 'natural' but that doesn't mean that more positive reinforcement won't work when humans train them! And I think the only reason positive reinforcement CAN work with human trainers is we CAN control the resources the animal wants and needs and use that to reinforce the behaviors that we like.

Sorry about the acronyms, just is quicker...just let me know if there was one I didn't clarify???

So I am hungry right now so I think I will seek out lunch (mmmmmm)!!!

Brenda

_________________
http://www.youtube.com/user/Lucy04574

http://www.youtube.com/user/Jack04574

Top

Profile

Reply with quote

Post new topic Reply to topic

Page 1 of 4

[ 47 posts ]

Go to page 1 2 3 4 Next

Board index » The Art of Natural Dressage » Groundwork Questions and Discussions

All times are UTC+01:00

Who is online

Users browsing this forum: No registered users and 2 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Jump to: