# Builders¶

Builders combine together various operations to implement more complicated things such as recurrent and LSTM networks

struct dynet::LSTMBuilder
#include <lstm.h>

LSTMBuilder creates an LSTM unit with coupled input and forget gate as well as peepholes connections.

More specifically, here are the equations for the dynamics of this cell :

$$\begin{split} i_t & =\sigma(W_{ix}x_t+W_{ih}h_{t-1}+W_{ic}c_{t-1}+b_i)\\ \tilde{c_t} & = \tanh(W_{cx}x_t+W_{ch}h_{t-1}+b_c)\\ c_t & = c_{t-1}\circ (1-i_t) + \tilde{c_t}\circ i_t\\ & = c_{t-1} + (\tilde{c_t}-c_{t-1})\circ i_t\\ o_t & = \sigma(W_{ox}x_t+W_{oh}h_{t-1}+W_{oc}c_{t}+b_o)\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split}$$

Inherits from dynet::RNNBuilder

Public Functions

LSTMBuilder()

Default constructor.

LSTMBuilder(unsigned layers, unsigned input_dim, unsigned hidden_dim, Model &model)

Constructor for the LSTMBuilder.

Parameters
• layers: Number of layers
• input_dim: Dimention of the input $$x_t$$
• hidden_dim: Dimention of the hidden states $$h_t$$ and $$c_t$$
• model: Model holding the parameters

unsigned num_h0_components() const

Number of components in h_0

For LSTMBuilder, this corresponds to 2 * layers because it includes the initial cell state $$c_0$$

Return
2 * layers

std::vector<Expression> get_s(RNNPointer i) const

Get the final state of the hidden layer.

For LSTMBuilder, this consists of a vector of the memory cell values for each layer (l1, l2, l3), followed by the hidden state values

Return
{c_{l1}, c_{l1}, ..., h_{l1}, h_{l2}, ...}

void set_dropout(float d)

Set the dropout rates to a unique value.

This has the same effect as set_dropout(d,d_h,d_c) except that all the dropout rates are set to the same value.

Parameters
• d: Dropout rate to be applied on all of $$x,h,c$$

void set_dropout(float d, float d_h, float d_c)

Set the dropout rates.

The dropout implemented here is an adaptation of the variational dropout with tied weights introduced in Gal, 2016 More specifically, dropout masks $$\mathbf{z_x}\sim \mathrm{Bernoulli}(1-d_x)$$, $$\mathbf{z_h}\sim \mathrm{Bernoulli}(1-d_h)$$, $$\mathbf{z_c}\sim \mathrm{Bernoulli}(1-d_c)$$ are sampled at the start of each sequence. The dynamics of the cell are then modified to :

$$\begin{split} i_t & =\sigma(W_{ix}(\frac 1 {1-d_x} {\mathbf{z_x}} \circ x_t)+W_{ih}(\frac 1 {1-d_h} {\mathbf{z_h}} \circ h_{t-1})+W_{ic}(\frac 1 {1-d_c} {\mathbf{z_c}} \circ c_{t-1})+b_i)\\ \tilde{c_t} & = \tanh(W_{cx}(\frac 1 {1-d_x} {\mathbf{z_x}} \circ x_t)+W_{ch}(\frac 1 {1-d_h} {\mathbf{z_h}} \circ h_{t-1})+b_c)\\ c_t & = c_{t-1}\circ (1-i_t) + \tilde{c_t}\circ i_t\\ & = c_{t-1} + (\tilde{c_t}-c_{t-1})\circ i_t\\ o_t & = \sigma(W_{ox}(\frac 1 {1-d_x} {\mathbf{z_x}} \circ x_t)+W_{oh}(\frac 1 {1-d_h} {\mathbf{z_h}} \circ h_{t-1})+W_{oc}(\frac 1 {1-d_c} {\mathbf{z_c}} \circ c_{t})+b_o)\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split}$$

For more detail as to why scaling is applied, see the “Unorthodox” section of the documentation

Parameters
• d: Dropout rate $$d_x$$ for the input $$x_t$$
• d_h: Dropout rate $$d_x$$ for the output $$h_t$$
• d_c: Dropout rate $$d_x$$ for the cell $$c_t$$

void disable_dropout()

Set all dropout rates to 0.

This is equivalent to set_dropout(0) or set_dropout(0,0,0)

void set_dropout_masks(unsigned batch_size = 1)

Set dropout masks at the beginning of a sequence for a specific bathc size.

If this function is not called on batched input, the same mask will be applied across all batch elements. Use this to apply different masks to each batch element

Parameters
• batch_size: Batch size

struct dynet::VanillaLSTMBuilder
#include <lstm.h>

VanillaLSTM allows to create an “standard” LSTM, ie with decoupled input and forget gate and no peepholes connections.

This cell runs according to the following dynamics :

$$\begin{split} i_t & =\sigma(W_{ix}x_t+W_{ih}h_{t-1}+b_i)\\ f_t & = \sigma(W_{fx}x_t+W_{fh}h_{t-1}+b_f)\\ o_t & = \sigma(W_{ox}x_t+W_{oh}h_{t-1}+b_o)\\ \tilde{c_t} & = \tanh(W_{cx}x_t+W_{ch}h_{t-1}+b_c)\\ c_t & = c_{t-1}\circ f_t + \tilde{c_t}\circ i_t\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split}$$

Inherits from dynet::RNNBuilder

Public Functions

VanillaLSTMBuilder()

Default Constructor.

VanillaLSTMBuilder(unsigned layers, unsigned input_dim, unsigned hidden_dim, Model &model)

Constructor for the VanillaLSTMBuilder.

Parameters
• layers: Number of layers
• input_dim: Dimention of the input $$x_t$$
• hidden_dim: Dimention of the hidden states $$h_t$$ and $$c_t$$
• model: Model holding the parameters

void set_dropout(float d)

Set the dropout rates to a unique value.

This has the same effect as set_dropout(d,d_h) except that all the dropout rates are set to the same value.

Parameters
• d: Dropout rate to be applied on all of $$x,h$$

void set_dropout(float d, float d_r)

Set the dropout rates.

The dropout implemented here is the variational dropout with tied weights introduced in Gal, 2016 More specifically, dropout masks $$\mathbf{z_x}\sim \mathrm{Bernoulli}(1-d_x)$$, $$\mathbf{z_h}\sim \mathrm{Bernoulli}(1-d_h)$$ are sampled at the start of each sequence. The dynamics of the cell are then modified to :

$$\begin{split} i_t & =\sigma(W_{ix}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{ih}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_i)\\ f_t & = \sigma(W_{fx}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{fh}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_f)\\ o_t & = \sigma(W_{ox}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{oh}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_o)\\ \tilde{c_t} & = \tanh(W_{cx}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{ch}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_c)\\ c_t & = c_{t-1}\circ f_t + \tilde{c_t}\circ i_t\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split}$$

For more detail as to why scaling is applied, see the “Unorthodox” section of the documentation

Parameters
• d: Dropout rate $$d_x$$ for the input $$x_t$$
• d_h: Dropout rate $$d_x$$ for the output $$h_t$$

void disable_dropout()

Set all dropout rates to 0.

This is equivalent to set_dropout(0) or set_dropout(0,0,0)

void set_dropout_masks(unsigned batch_size = 1)

Set dropout masks at the beginning of a sequence for a specific bathc size.

If this function is not called on batched input, the same mask will be applied across all batch elements. Use this to apply different masks to each batch element

Parameters
• batch_size: Batch size

struct dynet::RNNBuilder
#include <rnn.h>

interface for constructing an RNN, LSTM, GRU, etc.

[long description]

Subclassed by dynet::DeepLSTMBuilder, dynet::FastLSTMBuilder, dynet::GRUBuilder, dynet::LSTMBuilder, dynet::SimpleRNNBuilder, dynet::TreeLSTMBuilder, dynet::VanillaLSTMBuilder

Public Functions

RNNBuilder()

Default constructor.

RNNPointer state() const

Get pointer to the current state.

Return
Pointer to the current state

void new_graph(ComputationGraph &cg)

Initialize with new computation graph.

call this to reset the builder when you are working with a newly created ComputationGraph object

Parameters
• cg: Computation graph

void start_new_sequence(const std::vector<Expression> &h_0 = {})

Reset for new sequence.

call this before add_input and after new_graph, when starting a new sequence on the same hypergraph.

Parameters
• h_0: h_0 is used to initialize hidden layers at timestep 0 to given values

Expression set_h(const RNNPointer &prev, const std::vector<Expression> &h_new = {})

Explicitly set the output state of a node.

Return
The hidden representation of the deepest layer
Parameters
• prev: Pointer to the previous state
• h_new: The new hidden state

Expression set_s(const RNNPointer &prev, const std::vector<Expression> &s_new = {})

Set the internal state of a node (for lstms/grus)

For RNNs without internal states (SimpleRNN, GRU...), this has the same behaviour as set_h

Return
The hidden representation of the deepest layer
Parameters
• prev: Pointer to the previous state
• s_new: The new state. Can be {new_c[0],...,new_c[n]} or {new_c[0],...,new_c[n], new_h[0],...,new_h[n]}

Expression add_input(const Expression &x)

Return
The hidden representation of the deepest layer
Parameters
• x: Input variable

Expression add_input(const RNNPointer &prev, const Expression &x)

Add another timestep, with arbitrary recurrent connection.

This allows you to define a recurrent connection to prev rather than to head[cur]. This can be used to construct trees, implement beam search, etc.

Return
The hidden representation of the deepest layer
Parameters
• prev: Pointer to the previous state
• x: Input variable

void rewind_one_step()

Rewind the last timestep.

• this DOES NOT remove the variables from the computation graph, it just means the next time step will see a different previous state. You can rewind as many times as you want.

RNNPointer get_head(const RNNPointer &p)

Return the RNN state that is the parent of p

• This can be used in implementing complex structures such as trees, etc.

void set_dropout(float d)

Set Dropout.

Parameters
• d: Dropout rate

void disable_dropout()

Disable Dropout.

In general, you should disable dropout at test time

virtual Expression back() const = 0

Returns node (index) of most recent output.

Return
Node (index) of most recent output

virtual std::vector<Expression> final_h() const = 0

Access the final output of each hidden layer.

Return
Final output of each hidden layer

virtual std::vector<Expression> get_h(RNNPointer i) const = 0

Access the output of any hidden layer.

Return
Output of each hidden layer at the given step
Parameters
• i: Pointer to the step which output you want to access

virtual std::vector<Expression> final_s() const = 0

Access the final state of each hidden layer.

This returns the state of each hidden layer, in a format that can be used in start_new_sequence (i.e. including any internal cell for LSTMs and the likes)

Return
vector containing, if it exists, the list of final internal states, followed by the list of final outputs for each layer

virtual std::vector<Expression> get_s(RNNPointer i) const = 0

Access the state of any hidden layer.

See final_s for details

Return
Internal state of each hidden layer at the given step
Parameters
• i: Pointer to the step which state you want to access

virtual unsigned num_h0_components() const = 0

Number of components in h_0

Return
Number of components in h_0

virtual void copy(const RNNBuilder &params) = 0

Copy the parameters of another builder.

Parameters
• params: RNNBuilder you want to copy parameters from.

void save_parameters_pretraining(const std::string &fname) const

This function saves all the parameters associated with a particular RNNBuilder‘s derived class to a file.

This should not be used to seralize models, it should only be used to save parameters for pretraining. If you are interested in serializing models, use the boost serialization API against your model class.

Parameters
• fname: File you want to save your model to.

void load_parameters_pretraining(const std::string &fname)

Loads all the parameters associated with a particular RNNBuilder‘s derived class from a file.

This should not be used to seralize models, it should only be used to load parameters from pretraining. If you are interested in serializing models, use the boost serialization API against your model class.

Parameters
• fname: File you want to read your model from.

struct dynet::SimpleRNNBuilder
#include <rnn.h>

This provides a builder for the simplest RNN with tanh nonlinearity.

The equation for this RNN is : $$h_t=\tanh(W_x x_t + W_h h_{t-1} + b)$$

Inherits from dynet::RNNBuilder

Public Functions

SimpleRNNBuilder(unsigned layers, unsigned input_dim, unsigned hidden_dim, Model &model, bool support_lags = false)

Builds a simple RNN.

Parameters
• layers: Number of layers
• input_dim: Dimension of the input
• hidden_dim: Hidden layer (and output) size
• model: Model holding the parameters
• support_lags: Allow for auxiliary output?

Expression add_auxiliary_input(const Expression &x, const Expression &aux)

Returns $$h_t=\tanh(W_x x_t + W_h h_{t-1} + W_y y + b)$$ where $$y$$ is an auxiliary output TODO : clarify
• x: Input expression
• aux: Auxiliary output expression