cntk.contrib.deeprl.agent.shared.customized_models module¶

Customized Q function or (unnormalized) log of policy function.

If models from cntk.contrib.deeprl.agent.shared.models are not adequate, write your own model as a function, which takes two required arguments ‘shape_of_inputs’, ‘number_of_outputs’, and two optional arguments ‘loss_function’, ‘use_placeholder_for_input’, and outputs a dictionary containing ‘inputs’, ‘outputs’, ‘f’ and ‘loss’. In the config file, set QRepresentation or PolicyRepresentation to path (module_name.function_name) of the function. QLearning/PolicyGradient will then automatically search for it.

conv_dqn(shape_of_inputs, number_of_outputs, loss_function=None, use_placeholder_for_input=False)[source]¶

Example convolutional neural network for approximating the Q value function.

This is the model used in the original DQN paper https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf.

Parameters:	shape_of_inputs – tuple of array (input) dimensions. number_of_outputs – dimension of output, equals the number of possible actions. loss_function – if not specified, use squared loss by default. use_placeholder_for_input – if true, inputs have to be replaced later with actual input_variable.

Returns: a Python dictionary with string-valued keys including: ‘inputs’, ‘outputs’, ‘loss’ and ‘f’.