Search this site
Skip to main content
Skip to navigation
Heriot Watt University, Interaction Lab
Home
Academic Interests
Research Projects
Publications
Kenneth C Scott-Brown, Julia Allan, Leif Azzopardi, Marjon van der Pol, Paul Crook, Mark Bamford, Claire Moncrieffe, Donna McAvoy & Ian Reynolds. "Service-Please: an interactive healthy eating serious game application for tablet computer"
Paul A. Crook and Oliver Lemon. "Lossless Value Directed Compression of Complex User Goal States for Statistical Spoken Dialogue Systems"
Paul A. Crook and Oliver Lemon."Accurate Probability Estimation of Hypothesised User Acts for POMDP Approaches to Dialogue Management"
Paul A. Crook and Oliver Lemon."Representing Uncertainty about Complex User Goals in Statistical Dialogue Systems"
Paul A. Crook, Brieuc Roblin, Hans-Wolfgang Loidl and Oliver Lemon. "Parallel Computing and Practical Constraints when applying the Standard POMDP Belief Update Formalism to Spoken Dialogue Management"
Paul A. Crook, Simon Keizer, Zhuoran Wang, Wenshuo Tang, and Oliver Lemon. "Real User Evaluation of a POMDP Spoken Dialogue System Using Automatic Belief Compression"
Paul A. Crook, Zhuoran Wang, Xingkun Liu and Oliver Lemon. "A Statistical Spoken Dialogue System using Complex User Goals and Value Directed Compression"
Downloads
Honours and MSc Projects
Invited Talks / Tutorials
Public Engagement
Teaching
announcements
attachments
RL Assignment 2011
RL Assignment Q&A
For "statistical significance" in 3(c), should we calculate a numerical result or only make some qualitative comments?
I sent you a question that you've not answered?
In 1. Is it allowed to come up with different games?
In 1. the assignment says "Come up with a task, different to the examples given [in the lectures or course book] ... but then it says "Briefly (i) describe the task , e.g play Backgammon" which is in the lectures.
In 2, the V or Q values of states 7 or 3 weren't given. Are they needed?
In 2. we don't have to update the whole of Table 1?
In 3(b) what does "behaviour of a converged policy" mean?
In 3(c) I don't want to increase the sample size as the specification says 5.
In 3(c) I'm confused due to lack of statistical knowledge.
In 3.(b) do you want us to run learning several times?
In 3.(b) is it enough just to write down the values for time step 20,000,000 and treat them as the 'limit' values?
In 3.(c) can we choose the same set of rewards as in (b)?
In Question 1 what about task X?
Is it okay to email you the graph?
Is there something I need to do to get the visualization working?
RL Assignment D
Heriot Watt University, Interaction Lab
attachments
Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse