In 2, the V or Q values of states 7 or 3 weren't given. Are they needed?
Post date: Nov 08, 2011 11:7:12 AM
Updates should be calculated using the update step of the deterministic Q-learning algorithm as shown in Table 13.1 of Tom M. Mitchell, Machine Learning (1997) and reproduced in the lecture slides. You shouldn't need the V-values of any states. Also as the agent doesn't visit states 7 or 3 you don't need Q-values for these states.
The algorithm only updates the Q values for those actions that were taken by the agent. Similarly you only need to update the Q values for those actions that were executed.