cntk.contrib.deeprl.agent.shared.customized_models module

Customized Q function or (unnormalized) log of policy function.

If models from cntk.contrib.deeprl.agent.shared.models are not adequate, write your own model as a function, which takes two required arguments ‘shape_of_inputs’, ‘number_of_outputs’, and two optional arguments ‘loss_function’, ‘use_placeholder_for_input’, and outputs a dictionary containing ‘inputs’, ‘outputs’, ‘f’ and ‘loss’. In the config file, set QRepresentation or PolicyRepresentation to path (module_name.function_name) of the function. QLearning/PolicyGradient will then automatically search for it.

conv_dqn(shape_of_inputs, number_of_outputs, loss_function=None, use_placeholder_for_input=False)[source]

Example convolutional neural network for approximating the Q value function.

This is the model used in the original DQN paper https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf.

Parameters:
  • shape_of_inputs – tuple of array (input) dimensions.
  • number_of_outputs – dimension of output, equals the number of possible actions.
  • loss_function – if not specified, use squared loss by default.
  • use_placeholder_for_input – if true, inputs have to be replaced later with actual input_variable.
Returns: a Python dictionary with string-valued keys including
‘inputs’, ‘outputs’, ‘loss’ and ‘f’.