In 3.(b) do you want us to run learning several times?
Post date: Oct 27, 2011 10:1:44 PM
In 3. (b) it says 'Adjust the rewards until the RL agent converges on two distinct policies'. Do you want us to run the learning with 'number of runs=1' or do you want us to run the learning several times (each with 20,000,000 steps) and average over the number of runs?
for part (b) its enough to learn with 'number of runs=1' as I'm looking for a qualitative description of the policy arrived at (as opposed to quantitative values). However, you may wish to run learning with the same parameters several time to see if the policy learnt was a one-off or something that is reliably converged on.
for part (c) you should run multiple runs and compute averages, etc.