Examples¶
Here are some simple models coded in the examples of Dynet. Feel free to use and modify them.
Feedforward models¶
Although Dynet was primarily built for natural language processing purposes it is still possible to code feedforward nets. Here are some bricks and examples to do so.

enum
ffbuilders::
Activation
¶ Common activation functions used in multilayer perceptrons
Values:

SIGMOID
¶ SIGMOID
: Sigmoid function \(x\longrightarrow \frac {1} {1+e^{x}}\)

TANH
¶ TANH
: Tanh function \(x\longrightarrow \frac {1e^{2x}} {1+e^{2x}}\)

RELU
¶ RELU
: Rectified linear unit \(x\longrightarrow \max(0,x)\)

LINEAR
¶ LINEAR
: Identity function \(x\longrightarrow x\)

SOFTMAX
¶ SOFTMAX
: Softmax function \(\textbf{x}=(x_i)_{i=1,\dots,n}\longrightarrow \frac {e^{x_i}}{\sum_{j=1}^n e^{x_j} })_{i=1,\dots,n}\)


struct
Layer
¶  #include <mlp.h>
Simple layer structure.
Contains all parameters defining a layer
Public Functions

Layer
(unsigned input_dim, unsigned output_dim, Activation activation, float dropout_rate)¶ Build a feed forward layer.
 Parameters
input_dim
: Input dimensionoutput_dim
: Output dimensionactivation
: Activation functiondropout_rate
: Dropout rate


struct
MLP
¶  #include <mlp.h>
Simple multilayer perceptron.
Public Functions

MLP
(Model &model)¶ Default constructor.
Dont forget to add layers!

MLP
(Model &model, vector<Layer> layers)¶ Returns a Multilayer perceptron.
Creates a feedforward multilayer perceptron based on a list of layer descriptions
 Parameters
model
: Model to contain parameterslayers
: Layers description

void
append
(Model &model, Layer layer)¶ Append a layer at the end of the network.
[long description]
 Parameters
model
: [description]layer
: [description]

Expression
run
(Expression x, ComputationGraph &cg)¶ Run the MLP on an input vector/batch.
 Return
 [description]
 Parameters
x
: Input expression (vector or batch)cg
: Computation graph

Expression
get_nll
(Expression x, vector<unsigned> labels, ComputationGraph &cg)¶ Return the negative log likelihood for the (batched) pair (x,y)
For a batched input \(\{x_i\}_{i=1,\dots,N}\), \(\{y_i\}_{i=1,\dots,N}\), this computes \(\sum_{i=1}^N \log(P(y_i\vert x_i))\) where \(P(\textbf{y}\vert x_i)\) is modelled with ${softmax}(MLP(x_i))$
 Return
 Expression for the negative log likelihood on the batch
 Parameters
x
: Input batchlabels
: Output labelscg
: Computation graph

int
predict
(Expression x, ComputationGraph &cg)¶ Predict the most probable label.
Returns the argmax of the softmax of the networks output
 Return
 Label index
 Parameters
x
: Inputcg
: Computation graph

void
enable_dropout
()¶ Enable dropout.
This is supposed to be used during training or during testing if you want to sample outputs using montecarlo

void
disable_dropout
()¶ Disable dropout.
Do this during testing if you want a deterministic network

bool
is_dropout_enabled
()¶ Check wether dropout is enabled or not.
 Return
 Dropout state

Language models¶
Language modelling is one of the cornerstones of natural language processing. Dynet allows great flexibility in the creation of neural language models. Here are some examples.
 template <class Builder>

struct
RNNBatchLanguageModel
¶  #include <rnnlmbatch.h>
This structure wraps any RNN to train a language model with minibatching.
Recurrent neural network based language modelling maximizes the likelihood of a sentence \(\textbf s=(w_1,\dots,w_n)\) by modelling it as :
\(L(\textbf s)=p(w_1,\dots,w_n)=\prod_{i=1}^n p(w_i\vert w_1,\dots,w_{i1})\)
Where \(p(w_i\vert w_1,\dots,w_{i1})\) is given by the output of the RNN at step \(i\)
In the case of training with minibatching, the sentences must be of the same length in each minibatch. This requires some preprocessing (see
train_rnnlmbatch.cc
for example).Reference : Mikolov et al., 2010
 Template Parameters
Builder
: This can be any RNNBuilder
Public Functions

RNNBatchLanguageModel
(Model &model, unsigned LAYERS, unsigned INPUT_DIM, unsigned HIDDEN_DIM, unsigned VOCAB_SIZE)¶ Constructor for the batched RNN language model.
 Parameters
model
: Model to hold all parameters for trainingLAYERS
: Number of layers of the RNNINPUT_DIM
: Embedding dimension for the wordsHIDDEN_DIM
: Dimension of the hidden statesVOCAB_SIZE
: Size of the input vocabulary

Expression
getNegLogProb
(const vector<vector<int>> &sents, unsigned id, unsigned bsize, unsigned &tokens, ComputationGraph &cg)¶ Computes the negative log probability on a batch.
 Return
 Expression for $ \(\sum_{s\in\mathrm{batch}}\log(p(s))\)
 Parameters
sents
: Full training setid
: Start index of the batchbsize
: Batch size (id
+bsize
should be smaller than the size of the dataset)tokens
: Number of tokens processed by the model (used for loos per token computation)cg
: Computation graph

void
RandomSample
(const dynet::Dict &d, int max_len = 150, float temp = 1.0)¶ Samples a string of words/characters from the model.
This can be used to debug and/or have fun. Try it on new datasets!
 Parameters
d
: Dictionary to use (should be same as the one used for training)max_len
: maximu number of tokens to generatetemp
: Temperature for sampling (the softmax computed is \(\frac{e^{\frac{r_t^{(i)}}{T}}}{\sum_{j=1}^{\vert V\vert}e^{\frac{r_t^{(j)}}{T}}}\)). Intuitively lower temperature > less deviation from the distribution (= more “standard” samples)
Sequence to sequence models¶
Dynet is well suited for the variety of sequence to sequence models used in modern NLP. Here are some precoded structs implementing the most common one.
 template <class Builder>

struct
EncoderDecoder
¶  #include <encdec.h>
This structure is a “vanilla” encoder decoder model.
This sequence to sequence network models the conditional probability \(p(y_1,\dots,y_m\vert x_1,\dots,x_n)=\prod_{i=1}^m p(y_i\vert \textbf{e},y_1,\dots,y_{i1})\) where \(\textbf{e}=ENC(x_1,\dots,x_n)\) is an encoding of the input sequence produced by a recurrent neural network.
Typically \(\textbf{e}\) is the concatenated cell and output vector of a (multilayer) LSTM.
Sequence to sequence models were introduced in Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation .
Our implementation is more akin to the one from Sequence to sequence learning with neural networks .
 Template Parameters
Builder
: This can theoretically be any RNNbuilder. It’s only been tested with an LSTM as of now
Public Functions

EncoderDecoder
()¶ Default builder.

EncoderDecoder
(Model &model, unsigned num_layers, unsigned input_dim, unsigned hidden_dim, bool bwd = false)¶ Creates an EncoderDecoder.
 Parameters
model
: Model holding the parametersnum_layers
: Number of layers (same in the ecoder and decoder)input_dim
: Dimension of the word/char embeddingshidden_dim
: Dimension of the hidden statesbwd
: Set totrue
to make the encoder bidirectional. This doubles the number of parameters in the encoder. This will also add parameters for an affine transformation from the bidirectional encodings (of size num_layers * 2 * hidden_dim) to encodings of size num_layers * hidden_dim compatible with the decoder

Expression
encode
(const vector<vector<int>> &isents, unsigned id, unsigned bsize, unsigned &chars, ComputationGraph &cg)¶ Batched encoding.
Encodes a batch of sentences of the same size (don’t forget to pad them)
 Return
 Returns the expression for the negative (batched) encoding
 Parameters
isents
: Whole datasetid
: Index of the start of the batchbsize
: Batch sizechars
: Number of tokens processed (used to compute loss per characters)cg
: Computation graph

Expression
encode
(const vector<int> &insent, ComputationGraph &cg)¶ Single sentence version of
encode
Note : this just creates a trivial dataset and feed it to the batched version with batch_size 1. It’s not very effective so don’t use it for training.
 Return
 Expression of the encoding
 Parameters
insent
: Input sentencecg
: Computation graph

Expression
decode
(const Expression i_nc, const vector<vector<int>> &osents, int id, int bsize, ComputationGraph &cg)¶ Batched decoding.
[long description]
 Return
 Expression for the negative log likelihood
 Parameters
i_nc
: Encoding (should be batched)osents
: Output sentences datasetid
: Start index of the batchbsize
: Batch size (should be consistent with the shape ofi_nc
)cg
: Computation graph

Expression
decode
(const Expression i_nc, const vector<int> &osent, ComputationGraph &cg)¶ Single sentence version of
decode
For similar reasons as
encode
, this is not really efficient. USed the batched version directly for training Return
 Expression for the negative log likelihood
 Parameters
i_nc
: Encodingosent
: Output sentencecg
: Computation graph

vector<int>
generate
(const vector<int> &insent, ComputationGraph &cg)¶ Generate a sentence from an input sentence.
Samples at each timestep ducring decoding. Possible variations are greedy decoding and beam search for better performance
 Return
 Generated sentence (indices in the dictionary)
 Parameters
insent
: Input sentencecg
: Computation Graph

vector<int>
generate
(Expression i_nc, unsigned oslen, ComputationGraph &cg)¶ Generate a sentence from an encoding.
You can use this directly to generate random sentences
 Return
 Generated sentence (indices in the dictionary)
 Parameters
i_nc
: Input encodingoslen
: Maximum length of outputcg
: Computation graph