Here are some simple models coded in the examples of Dynet. Feel free to use and modify them.

Feed-forward models

Although Dynet was primarily built for natural language processing purposes it is still possible to code feed-forward nets. Here are some bricks and examples to do so.

enum ffbuilders::Activation

Common activation functions used in multilayer perceptrons



SIGMOID : Sigmoid function \(x\longrightarrow \frac {1} {1+e^{-x}}\)


TANH : Tanh function \(x\longrightarrow \frac {1-e^{-2x}} {1+e^{-2x}}\)


RELU : Rectified linear unit \(x\longrightarrow \max(0,x)\)


LINEAR : Identity function \(x\longrightarrow x\)


SOFTMAX : Softmax function \(\textbf{x}=(x_i)_{i=1,\dots,n}\longrightarrow \frac {e^{x_i}}{\sum_{j=1}^n e^{x_j} })_{i=1,\dots,n}\)

struct Layer
#include <mlp.h>

Simple layer structure.

Contains all parameters defining a layer

Public Functions

Layer(unsigned input_dim, unsigned output_dim, Activation activation, float dropout_rate)

Build a feed forward layer.

  • input_dim: Input dimension
  • output_dim: Output dimension
  • activation: Activation function
  • dropout_rate: Dropout rate

Public Members

unsigned input_dim

Input dimension

unsigned output_dim

Output dimension

Activation activation = LINEAR

Activation function

float dropout_rate = 0

Dropout rate

struct MLP
#include <mlp.h>

Simple multilayer perceptron.

Public Functions

MLP(ParameterCollection &model)

Default constructor.

Dont forget to add layers!

MLP(ParameterCollection &model, vector<Layer> layers)

Returns a Multilayer perceptron.

Creates a feedforward multilayer perceptron based on a list of layer descriptions

  • model: ParameterCollection to contain parameters
  • layers: Layers description

void append(ParameterCollection &model, Layer layer)

Append a layer at the end of the network.

[long description]

  • model: [description]
  • layer: [description]

Expression run(Expression x, ComputationGraph &cg)

Run the MLP on an input vector/batch.

  • x: Input expression (vector or batch)
  • cg: Computation graph

Expression get_nll(Expression x, vector<unsigned> labels, ComputationGraph &cg)

Return the negative log likelihood for the (batched) pair (x,y)

For a batched input \(\{x_i\}_{i=1,\dots,N}\), \(\{y_i\}_{i=1,\dots,N}\), this computes \(\sum_{i=1}^N \log(P(y_i\vert x_i))\) where \(P(\textbf{y}\vert x_i)\) is modelled with ${softmax}(MLP(x_i))$

Expression for the negative log likelihood on the batch
  • x: Input batch
  • labels: Output labels
  • cg: Computation graph

int predict(Expression x, ComputationGraph &cg)

Predict the most probable label.

Returns the argmax of the softmax of the networks output

Label index
  • x: Input
  • cg: Computation graph

void enable_dropout()

Enable dropout.

This is supposed to be used during training or during testing if you want to sample outputs using montecarlo

void disable_dropout()

Disable dropout.

Do this during testing if you want a deterministic network

bool is_dropout_enabled()

Check wether dropout is enabled or not.

Dropout state

Language models

Language modelling is one of the cornerstones of natural language processing. Dynet allows great flexibility in the creation of neural language models. Here are some examples.

template <class Builder>
struct RNNBatchLanguageModel
#include <rnnlm-batch.h>

This structure wraps any RNN to train a language model with minibatching.

Recurrent neural network based language modelling maximizes the likelihood of a sentence \(\textbf s=(w_1,\dots,w_n)\) by modelling it as :

\(L(\textbf s)=p(w_1,\dots,w_n)=\prod_{i=1}^n p(w_i\vert w_1,\dots,w_{i-1})\)

Where \(p(w_i\vert w_1,\dots,w_{i-1})\) is given by the output of the RNN at step \(i\)

In the case of training with minibatching, the sentences must be of the same length in each minibatch. This requires some preprocessing (see train_rnnlm-batch.cc for example).

Reference : Mikolov et al., 2010

Template Parameters
  • Builder: This can be any RNNBuilder

Public Functions

RNNBatchLanguageModel(ParameterCollection &model, unsigned LAYERS, unsigned INPUT_DIM, unsigned HIDDEN_DIM, unsigned VOCAB_SIZE)

Constructor for the batched RNN language model.

  • model: ParameterCollection to hold all parameters for training
  • LAYERS: Number of layers of the RNN
  • INPUT_DIM: Embedding dimension for the words
  • HIDDEN_DIM: Dimension of the hidden states
  • VOCAB_SIZE: Size of the input vocabulary

Expression getNegLogProb(const vector<vector<int>> &sents, unsigned id, unsigned bsize, unsigned &tokens, ComputationGraph &cg)

Computes the negative log probability on a batch.

Expression for $ \(\sum_{s\in\mathrm{batch}}\log(p(s))\)
  • sents: Full training set
  • id: Start index of the batch
  • bsize: Batch size (id + bsize should be smaller than the size of the dataset)
  • tokens: Number of tokens processed by the model (used for loos per token computation)
  • cg: Computation graph

void RandomSample(const dynet::Dict &d, int max_len = 150, float temp = 1.0)

Samples a string of words/characters from the model.

This can be used to debug and/or have fun. Try it on new datasets!

  • d: Dictionary to use (should be same as the one used for training)
  • max_len: maximu number of tokens to generate
  • temp: Temperature for sampling (the softmax computed is \(\frac{e^{\frac{r_t^{(i)}}{T}}}{\sum_{j=1}^{\vert V\vert}e^{\frac{r_t^{(j)}}{T}}}\)). Intuitively lower temperature -> less deviation from the distribution (= more “standard” samples)

Sequence to sequence models

Dynet is well suited for the variety of sequence to sequence models used in modern NLP. Here are some pre-coded structs implementing the most common one.

template <class Builder>
struct EncoderDecoder
#include <encdec.h>

This structure is a “vanilla” encoder decoder model.

This sequence to sequence network models the conditional probability \(p(y_1,\dots,y_m\vert x_1,\dots,x_n)=\prod_{i=1}^m p(y_i\vert \textbf{e},y_1,\dots,y_{i-1})\) where \(\textbf{e}=ENC(x_1,\dots,x_n)\) is an encoding of the input sequence produced by a recurrent neural network.

Typically \(\textbf{e}\) is the concatenated cell and output vector of a (multilayer) LSTM.

Sequence to sequence models were introduced in Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation .

Our implementation is more akin to the one from Sequence to sequence learning with neural networks .

Template Parameters
  • Builder: This can theoretically be any RNNbuilder. It’s only been tested with an LSTM as of now

Public Functions


Default builder.

EncoderDecoder(ParameterCollection &model, unsigned num_layers, unsigned input_dim, unsigned hidden_dim, bool bwd = false)

Creates an EncoderDecoder.

  • model: ParameterCollection holding the parameters
  • num_layers: Number of layers (same in the ecoder and decoder)
  • input_dim: Dimension of the word/char embeddings
  • hidden_dim: Dimension of the hidden states
  • bwd: Set to true to make the encoder bidirectional. This doubles the number of parameters in the encoder. This will also add parameters for an affine transformation from the bidirectional encodings (of size num_layers * 2 * hidden_dim) to encodings of size num_layers * hidden_dim compatible with the decoder

Expression encode(const vector<vector<int>> &isents, unsigned id, unsigned bsize, unsigned &chars, ComputationGraph &cg)

Batched encoding.

Encodes a batch of sentences of the same size (don’t forget to pad them)

Returns the expression for the negative (batched) encoding
  • isents: Whole dataset
  • id: Index of the start of the batch
  • bsize: Batch size
  • chars: Number of tokens processed (used to compute loss per characters)
  • cg: Computation graph

Expression encode(const vector<int> &insent, ComputationGraph &cg)

Single sentence version of encode

Note : this just creates a trivial dataset and feed it to the batched version with batch_size 1. It’s not very effective so don’t use it for training.

Expression of the encoding
  • insent: Input sentence
  • cg: Computation graph

Expression decode(const Expression i_nc, const vector<vector<int>> &osents, int id, int bsize, ComputationGraph &cg)

Batched decoding.

[long description]

Expression for the negative log likelihood
  • i_nc: Encoding (should be batched)
  • osents: Output sentences dataset
  • id: Start index of the batch
  • bsize: Batch size (should be consistent with the shape of i_nc)
  • cg: Computation graph

Expression decode(const Expression i_nc, const vector<int> &osent, ComputationGraph &cg)

Single sentence version of decode

For similar reasons as encode, this is not really efficient. USed the batched version directly for training

Expression for the negative log likelihood
  • i_nc: Encoding
  • osent: Output sentence
  • cg: Computation graph

vector<int> generate(const vector<int> &insent, ComputationGraph &cg)

Generate a sentence from an input sentence.

Samples at each timestep ducring decoding. Possible variations are greedy decoding and beam search for better performance

Generated sentence (indices in the dictionary)
  • insent: Input sentence
  • cg: Computation Graph

vector<int> generate(Expression i_nc, unsigned oslen, ComputationGraph &cg)

Generate a sentence from an encoding.

You can use this directly to generate random sentences

Generated sentence (indices in the dictionary)
  • i_nc: Input encoding
  • oslen: Maximum length of output
  • cg: Computation graph