Builders¶
Builders combine together various operations to implement more complicated things such as recurrent and LSTM networks
-
struct dynet
::
LSTMBuilder
¶ - #include <lstm.h>
LSTMBuilder creates an LSTM unit with coupled input and forget gate as well as peepholes connections.
More specifically, here are the equations for the dynamics of this cell :
\( \begin{split} i_t & =\sigma(W_{ix}x_t+W_{ih}h_{t-1}+W_{ic}c_{t-1}+b_i)\\ \tilde{c_t} & = \tanh(W_{cx}x_t+W_{ch}h_{t-1}+b_c)\\ c_t & = c_{t-1}\circ (1-i_t) + \tilde{c_t}\circ i_t\\ & = c_{t-1} + (\tilde{c_t}-c_{t-1})\circ i_t\\ o_t & = \sigma(W_{ox}x_t+W_{oh}h_{t-1}+W_{oc}c_{t}+b_o)\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split} \)
Inherits from dynet::RNNBuilder
Public Functions
-
dynet::LSTMBuilder
LSTMBuilder
()¶ Default constructor.
-
dynet::LSTMBuilder
LSTMBuilder
(unsigned layers, unsigned input_dim, unsigned hidden_dim, Model &model)¶ Constructor for the LSTMBuilder.
- Parameters
layers
: Number of layersinput_dim
: Dimention of the input \(x_t\)hidden_dim
: Dimention of the hidden states \(h_t\) and \(c_t\)model
: Model holding the parameters
-
unsigned dynet::LSTMBuilder
num_h0_components
() const¶ Number of components in
h_0
For
LSTMBuilder
, this corresponds to2 * layers
because it includes the initial cell state \(c_0\)- Return
2 * layers
-
std::vector<Expression> dynet::LSTMBuilder
get_s
(RNNPointer i) const¶ Get the final state of the hidden layer.
For
LSTMBuilder
, this consists of a vector of the memory cell values for each layer (l1, l2, l3), followed by the hidden state values- Return
- {c_{l1}, c_{l1}, …, h_{l1}, h_{l2}, …}
-
void dynet::LSTMBuilder
set_dropout
(float d)¶ Set the dropout rates to a unique value.
This has the same effect as
set_dropout(d,d_h,d_c)
except that all the dropout rates are set to the same value.- Parameters
d
: Dropout rate to be applied on all of \(x,h,c\)
-
void dynet::LSTMBuilder
set_dropout
(float d, float d_h, float d_c)¶ Set the dropout rates.
The dropout implemented here is an adaptation of the variational dropout with tied weights introduced in Gal, 2016 More specifically, dropout masks \(\mathbf{z_x}\sim \mathrm{Bernoulli}(1-d_x)\), \(\mathbf{z_h}\sim \mathrm{Bernoulli}(1-d_h)\), \(\mathbf{z_c}\sim \mathrm{Bernoulli}(1-d_c)\) are sampled at the start of each sequence. The dynamics of the cell are then modified to :
\( \begin{split} i_t & =\sigma(W_{ix}(\frac 1 {1-d_x} {\mathbf{z_x}} \circ x_t)+W_{ih}(\frac 1 {1-d_h} {\mathbf{z_h}} \circ h_{t-1})+W_{ic}(\frac 1 {1-d_c} {\mathbf{z_c}} \circ c_{t-1})+b_i)\\ \tilde{c_t} & = \tanh(W_{cx}(\frac 1 {1-d_x} {\mathbf{z_x}} \circ x_t)+W_{ch}(\frac 1 {1-d_h} {\mathbf{z_h}} \circ h_{t-1})+b_c)\\ c_t & = c_{t-1}\circ (1-i_t) + \tilde{c_t}\circ i_t\\ & = c_{t-1} + (\tilde{c_t}-c_{t-1})\circ i_t\\ o_t & = \sigma(W_{ox}(\frac 1 {1-d_x} {\mathbf{z_x}} \circ x_t)+W_{oh}(\frac 1 {1-d_h} {\mathbf{z_h}} \circ h_{t-1})+W_{oc}(\frac 1 {1-d_c} {\mathbf{z_c}} \circ c_{t})+b_o)\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split} \)
For more detail as to why scaling is applied, see the “Unorthodox” section of the documentation
- Parameters
d
: Dropout rate \(d_x\) for the input \(x_t\)d_h
: Dropout rate \(d_x\) for the output \(h_t\)d_c
: Dropout rate \(d_x\) for the cell \(c_t\)
-
void dynet::LSTMBuilder
disable_dropout
()¶ Set all dropout rates to 0.
This is equivalent to
set_dropout(0)
orset_dropout(0,0,0)
-
void dynet::LSTMBuilder
set_dropout_masks
(unsigned batch_size = 1)¶ Set dropout masks at the beginning of a sequence for a specific bathc size.
If this function is not called on batched input, the same mask will be applied across all batch elements. Use this to apply different masks to each batch element
- Parameters
batch_size
: Batch size
-
dynet::LSTMBuilder
-
struct dynet
::
VanillaLSTMBuilder
¶ - #include <lstm.h>
VanillaLSTM allows to create an “standard” LSTM, ie with decoupled input and forget gate and no peepholes connections.
This cell runs according to the following dynamics :
\( \begin{split} i_t & =\sigma(W_{ix}x_t+W_{ih}h_{t-1}+b_i)\\ f_t & = \sigma(W_{fx}x_t+W_{fh}h_{t-1}+b_f+1)\\ o_t & = \sigma(W_{ox}x_t+W_{oh}h_{t-1}+b_o)\\ \tilde{c_t} & = \tanh(W_{cx}x_t+W_{ch}h_{t-1}+b_c)\\ c_t & = c_{t-1}\circ f_t + \tilde{c_t}\circ i_t\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split} \)
Inherits from dynet::RNNBuilder
Public Functions
-
dynet::VanillaLSTMBuilder
VanillaLSTMBuilder
()¶ Default Constructor.
-
dynet::VanillaLSTMBuilder
VanillaLSTMBuilder
(unsigned layers, unsigned input_dim, unsigned hidden_dim, Model &model, bool ln_lstm = false)¶ Constructor for the VanillaLSTMBuilder.
- Parameters
layers
: Number of layersinput_dim
: Dimention of the input \(x_t\)hidden_dim
: Dimention of the hidden states \(h_t\) and \(c_t\)model
: Model holding the parametersln_lstm
: Whether to use layer normalization
-
void dynet::VanillaLSTMBuilder
set_dropout
(float d)¶ Set the dropout rates to a unique value.
This has the same effect as
set_dropout(d,d_h)
except that all the dropout rates are set to the same value.- Parameters
d
: Dropout rate to be applied on all of \(x,h\)
-
void dynet::VanillaLSTMBuilder
set_dropout
(float d, float d_r)¶ Set the dropout rates.
The dropout implemented here is the variational dropout with tied weights introduced in Gal, 2016 More specifically, dropout masks \(\mathbf{z_x}\sim \mathrm{Bernoulli}(1-d_x)\), \(\mathbf{z_h}\sim \mathrm{Bernoulli}(1-d_h)\) are sampled at the start of each sequence. The dynamics of the cell are then modified to :
\( \begin{split} i_t & =\sigma(W_{ix}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{ih}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_i)\\ f_t & = \sigma(W_{fx}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{fh}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_f)\\ o_t & = \sigma(W_{ox}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{oh}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_o)\\ \tilde{c_t} & = \tanh(W_{cx}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{ch}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_c)\\ c_t & = c_{t-1}\circ f_t + \tilde{c_t}\circ i_t\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split} \)
For more detail as to why scaling is applied, see the “Unorthodox” section of the documentation
- Parameters
d
: Dropout rate \(d_x\) for the input \(x_t\)d_h
: Dropout rate \(d_h\) for the output \(h_t\)
-
void dynet::VanillaLSTMBuilder
disable_dropout
()¶ Set all dropout rates to 0.
This is equivalent to
set_dropout(0)
orset_dropout(0,0,0)
-
void dynet::VanillaLSTMBuilder
set_dropout_masks
(unsigned batch_size = 1)¶ Set dropout masks at the beginning of a sequence for a specific batch size.
If this function is not called on batched input, the same mask will be applied across all batch elements. Use this to apply different masks to each batch element
- Parameters
batch_size
: Batch size
-
dynet::VanillaLSTMBuilder
-
struct dynet
::
RNNBuilder
¶ - #include <rnn.h>
interface for constructing an RNN, LSTM, GRU, etc.
[long description]
Subclassed by dynet::DeepLSTMBuilder, dynet::FastLSTMBuilder, dynet::GRUBuilder, dynet::LSTMBuilder, dynet::SimpleRNNBuilder, dynet::TreeLSTMBuilder, dynet::VanillaLSTMBuilder
Public Functions
-
dynet::RNNBuilder
RNNBuilder
()¶ Default constructor.
-
RNNPointer dynet::RNNBuilder
state
() const¶ Get pointer to the current state.
- Return
- Pointer to the current state
-
void dynet::RNNBuilder
new_graph
(ComputationGraph &cg, bool update = true)¶ Initialize with new computation graph.
call this to reset the builder when you are working with a newly created ComputationGraph object
- Parameters
cg
: Computation graphupdate
: Update internal parameters while training
-
void dynet::RNNBuilder
start_new_sequence
(const std::vector<Expression> &h_0 = {})¶ Reset for new sequence.
call this before add_input and after new_graph, when starting a new sequence on the same hypergraph.
- Parameters
h_0
:h_0
is used to initialize hidden layers at timestep 0 to given values
-
Expression dynet::RNNBuilder
set_h
(const RNNPointer &prev, const std::vector<Expression> &h_new = {})¶ Explicitly set the output state of a node.
- Return
- The hidden representation of the deepest layer
- Parameters
prev
: Pointer to the previous stateh_new
: The new hidden state
-
Expression dynet::RNNBuilder
set_s
(const RNNPointer &prev, const std::vector<Expression> &s_new = {})¶ Set the internal state of a node (for lstms/grus)
For RNNs without internal states (SimpleRNN, GRU…), this has the same behaviour as
set_h
- Return
- The hidden representation of the deepest layer
- Parameters
prev
: Pointer to the previous states_new
: The new state. Can be{new_c[0],...,new_c[n]}
or{new_c[0],...,new_c[n], new_h[0],...,new_h[n]}
-
Expression dynet::RNNBuilder
add_input
(const Expression &x)¶ Add another timestep by reading in the variable x.
- Return
- The hidden representation of the deepest layer
- Parameters
x
: Input variable
-
Expression dynet::RNNBuilder
add_input
(const RNNPointer &prev, const Expression &x)¶ Add another timestep, with arbitrary recurrent connection.
This allows you to define a recurrent connection to
prev
rather than tohead[cur]
. This can be used to construct trees, implement beam search, etc.- Return
- The hidden representation of the deepest layer
- Parameters
prev
: Pointer to the previous statex
: Input variable
-
void dynet::RNNBuilder
rewind_one_step
()¶ Rewind the last timestep.
- this DOES NOT remove the variables from the computation graph, it just means the next time step will see a different previous state. You can rewind as many times as you want.
-
RNNPointer dynet::RNNBuilder
get_head
(const RNNPointer &p)¶ Return the RNN state that is the parent of
p
- This can be used in implementing complex structures such as trees, etc.
-
void dynet::RNNBuilder
set_dropout
(float d)¶ Set Dropout.
- Parameters
d
: Dropout rate
-
void dynet::RNNBuilder
disable_dropout
()¶ Disable Dropout.
In general, you should disable dropout at test time
-
virtual Expression dynet::RNNBuilder
back
() const = 0¶ Returns node (index) of most recent output.
- Return
- Node (index) of most recent output
-
virtual std::vector<Expression> dynet::RNNBuilder
final_h
() const = 0¶ Access the final output of each hidden layer.
- Return
- Final output of each hidden layer
-
virtual std::vector<Expression> dynet::RNNBuilder
get_h
(RNNPointer i) const = 0¶ Access the output of any hidden layer.
- Return
- Output of each hidden layer at the given step
- Parameters
i
: Pointer to the step which output you want to access
-
virtual std::vector<Expression> dynet::RNNBuilder
final_s
() const = 0¶ Access the final state of each hidden layer.
This returns the state of each hidden layer, in a format that can be used in start_new_sequence (i.e. including any internal cell for LSTMs and the likes)
- Return
- vector containing, if it exists, the list of final internal states, followed by the list of final outputs for each layer
-
virtual std::vector<Expression> dynet::RNNBuilder
get_s
(RNNPointer i) const = 0¶ Access the state of any hidden layer.
See
final_s
for details- Return
- Internal state of each hidden layer at the given step
- Parameters
i
: Pointer to the step which state you want to access
-
virtual unsigned dynet::RNNBuilder
num_h0_components
() const = 0¶ Number of components in
h_0
- Return
- Number of components in
h_0
-
virtual void dynet::RNNBuilder
copy
(const RNNBuilder ¶ms) = 0¶ Copy the parameters of another builder.
- Parameters
params
: RNNBuilder you want to copy parameters from.
-
void dynet::RNNBuilder
save_parameters_pretraining
(const std::string &fname) const¶ This function saves all the parameters associated with a particular RNNBuilder’s derived class to a file.
This should not be used to seralize models, it should only be used to save parameters for pretraining. If you are interested in serializing models, use the boost serialization API against your model class.
- Parameters
fname
: File you want to save your model to.
-
void dynet::RNNBuilder
load_parameters_pretraining
(const std::string &fname)¶ Loads all the parameters associated with a particular RNNBuilder’s derived class from a file.
This should not be used to seralize models, it should only be used to load parameters from pretraining. If you are interested in serializing models, use the boost serialization API against your model class.
- Parameters
fname
: File you want to read your model from.
-
dynet::RNNBuilder
-
struct dynet
::
SimpleRNNBuilder
¶ - #include <rnn.h>
This provides a builder for the simplest RNN with tanh nonlinearity.
The equation for this RNN is : \(h_t=\tanh(W_x x_t + W_h h_{t-1} + b)\)
Inherits from dynet::RNNBuilder
Public Functions
-
dynet::SimpleRNNBuilder
SimpleRNNBuilder
(unsigned layers, unsigned input_dim, unsigned hidden_dim, Model &model, bool support_lags = false)¶ Builds a simple RNN.
- Parameters
layers
: Number of layersinput_dim
: Dimension of the inputhidden_dim
: Hidden layer (and output) sizemodel
: Model holding the parameterssupport_lags
: Allow for auxiliary output?
-
Expression dynet::SimpleRNNBuilder
add_auxiliary_input
(const Expression &x, const Expression &aux)¶ Add auxiliary output.
Returns \(h_t=\tanh(W_x x_t + W_h h_{t-1} + W_y y + b)\) where \(y\) is an auxiliary output TODO : clarify
- Return
- The hidden representation of the deepest layer
- Parameters
x
: Input expressionaux
: Auxiliary output expression
-
dynet::SimpleRNNBuilder