cntk.contrib.deeprl.agent.qlearning module¶
Deep Q-learning and its variants.
-
class
QLearning
(config_filename, o_space, a_space)[source]¶ Bases:
cntk.contrib.deeprl.agent.agent.AgentBaseClass
Q-learning agent.
Including: - DQN https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf - Prioritized Experience Replay https://arxiv.org/pdf/1511.05952.pdf - Dueling Network https://arxiv.org/pdf/1511.06581.pdf - Double Q Learning https://arxiv.org/pdf/1509.06461.pdf
-
end
(reward, next_state)[source]¶ Last observed reward/state of the episode (which then terminates).
Parameters: - reward (float) – amount of reward returned after previous action.
- next_state (object) – observation provided by the environment.
-
start
(state)[source]¶ Start a new episode.
Parameters: state (object) – observation provided by the environment. Returns: action choosen by agent. debug_info (dict): auxiliary diagnostic information. Return type: action (int)
-
step
(reward, next_state)[source]¶ Observe one transition and choose an action.
Parameters: - reward (float) – amount of reward returned after previous action.
- next_state (object) – observation provided by the environment.
Returns: action choosen by agent. debug_info (dict): auxiliary diagnostic information.
Return type: action (int)
-