cntk.contrib.deeprl.agent.qlearning module

Deep Q-learning and its variants.

class QLearning(config_filename, o_space, a_space)[source]

Bases: cntk.contrib.deeprl.agent.agent.AgentBaseClass

Q-learning agent.

Including: - DQN https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf - Prioritized Experience Replay https://arxiv.org/pdf/1511.05952.pdf - Dueling Network https://arxiv.org/pdf/1511.06581.pdf - Double Q Learning https://arxiv.org/pdf/1509.06461.pdf

end(reward, next_state)[source]

Last observed reward/state of the episode (which then terminates).

Parameters:
  • reward (float) – amount of reward returned after previous action.
  • next_state (object) – observation provided by the environment.
enter_evaluation()[source]

Setup before evaluation.

save(filename)[source]

Save model to file.

save_parameter_settings(filename)[source]

Save parameter settings to file.

set_as_best_model()[source]

Copy current model to best model.

start(state)[source]

Start a new episode.

Parameters:state (object) – observation provided by the environment.
Returns:action choosen by agent. debug_info (dict): auxiliary diagnostic information.
Return type:action (int)
step(reward, next_state)[source]

Observe one transition and choose an action.

Parameters:
  • reward (float) – amount of reward returned after previous action.
  • next_state (object) – observation provided by the environment.
Returns:

action choosen by agent. debug_info (dict): auxiliary diagnostic information.

Return type:

action (int)