RL Assignment 2011
RL Assignment Q&A
For "statistical significance" in 3(c), should we calculate a numerical result or only make some qualitative comments?
I sent you a question that you've not answered?
In 1. Is it allowed to come up with different games?
In 1. the assignment says "Come up with a task, different to the examples given [in the lectures or course book] ... but then it says "Briefly (i) describe the task , e.g play Backgammon" which is in the lectures.
In 2, the V or Q values of states 7 or 3 weren't given. Are they needed?
In 2. we don't have to update the whole of Table 1?
In 3(b) what does "behaviour of a converged policy" mean?
In 3(c) I don't want to increase the sample size as the specification says 5.
In 3(c) I'm confused due to lack of statistical knowledge.
In 3.(b) do you want us to run learning several times?
In 3.(b) is it enough just to write down the values for time step 20,000,000 and treat them as the 'limit' values?
In 3.(c) can we choose the same set of rewards as in (b)?
In Question 1 what about task X?
Is it okay to email you the graph?
Is there something I need to do to get the visualization working?
