cntk.contrib.deeprl.agent.shared.customized_models module¶
Customized Q function or (unnormalized) log of policy function.
If models from cntk.contrib.deeprl.agent.shared.models are not adequate, write your own model as a function, which takes two required arguments ‘shape_of_inputs’, ‘number_of_outputs’, and two optional arguments ‘loss_function’, ‘use_placeholder_for_input’, and outputs a dictionary containing ‘inputs’, ‘outputs’, ‘f’ and ‘loss’. In the config file, set QRepresentation or PolicyRepresentation to path (module_name.function_name) of the function. QLearning/PolicyGradient will then automatically search for it.
-
conv_dqn
(shape_of_inputs, number_of_outputs, loss_function=None, use_placeholder_for_input=False)[source]¶ Example convolutional neural network for approximating the Q value function.
This is the model used in the original DQN paper https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf.
Parameters: - shape_of_inputs – tuple of array (input) dimensions.
- number_of_outputs – dimension of output, equals the number of possible actions.
- loss_function – if not specified, use squared loss by default.
- use_placeholder_for_input – if true, inputs have to be replaced later with actual input_variable.
- Returns: a Python dictionary with string-valued keys including
- ‘inputs’, ‘outputs’, ‘loss’ and ‘f’.