Examples¶

Here are some simple models coded in the examples of Dynet. Feel free to use and modify them.

Feed-forward models¶

Although Dynet was primarily built for natural language processing purposes it is still possible to code feed-forward nets. Here are some bricks and examples to do so.

enum ffbuilders::Activation¶

Common activation functions used in multilayer perceptrons

Values:

SIGMOID¶: SIGMOID : Sigmoid function $x\longrightarrow \frac {1} {1+e^{-x}}$

TANH¶: TANH : Tanh function $x\longrightarrow \frac {1-e^{-2x}} {1+e^{-2x}}$

RELU¶: RELU : Rectified linear unit $x\longrightarrow \max(0,x)$

LINEAR¶: LINEAR : Identity function $x\longrightarrow x$

SOFTMAX¶: SOFTMAX : Softmax function $\textbf{x}=(x_i)_{i=1,\dots,n}\longrightarrow \frac {e^{x_i}}{\sum_{j=1}^n e^{x_j} })_{i=1,\dots,n}$

struct Layer¶

#include <mlp.h>

Simple layer structure.

Contains all parameters defining a layer

Public Functions

Layer(unsigned input_dim, unsigned output_dim, Activation activation, float dropout_rate)¶

Build a feed forward layer.

Parameters

input_dim: Input dimension
output_dim: Output dimension
activation: Activation function
dropout_rate: Dropout rate

Public Members

unsigned input_dim¶: Input dimension

unsigned output_dim¶: Output dimension

Activation activation = LINEAR ¶: Activation function

float dropout_rate = 0¶: Dropout rate

struct MLP¶

#include <mlp.h>

Simple multilayer perceptron.

Public Functions

MLP(Model &model)¶

Default constructor.

Dont forget to add layers!

MLP(Model &model, vector<Layer> layers)¶

Returns a Multilayer perceptron.

Creates a feedforward multilayer perceptron based on a list of layer descriptions

Parameters

model: Model to contain parameters
layers: Layers description

void append(Model &model, Layer layer)¶

Append a layer at the end of the network.

[long description]

Parameters

model: [description]
layer: [description]

Expression run(Expression x, ComputationGraph &cg)¶

Run the MLP on an input vector/batch.

Return

[description]

Parameters

x: Input expression (vector or batch)
cg: Computation graph

Expression get_nll(Expression x, vector<unsigned> labels, ComputationGraph &cg)¶

Return the negative log likelihood for the (batched) pair (x,y)

For a batched input $\{x_i\}_{i=1,\dots,N}$, $\{y_i\}_{i=1,\dots,N}$, this computes $\sum_{i=1}^N \log(P(y_i\vert x_i))$ where $P(\textbf{y}\vert x_i)$ is modelled with ${softmax}(MLP(x_i))$

Return

Expression for the negative log likelihood on the batch

Parameters

x: Input batch
labels: Output labels
cg: Computation graph

int predict(Expression x, ComputationGraph &cg)¶

Predict the most probable label.

Returns the argmax of the softmax of the networks output

Return

Label index

Parameters

x: Input
cg: Computation graph

void enable_dropout()¶

Enable dropout.

This is supposed to be used during training or during testing if you want to sample outputs using montecarlo

void disable_dropout()¶

Disable dropout.

Do this during testing if you want a deterministic network

bool is_dropout_enabled()¶

Check wether dropout is enabled or not.

Return: Dropout state

Language models¶

Language modelling is one of the cornerstones of natural language processing. Dynet allows great flexibility in the creation of neural language models. Here are some examples.

template <class Builder>

struct RNNBatchLanguageModel¶

#include <rnnlm-batch.h>

This structure wraps any RNN to train a language model with minibatching.

Recurrent neural network based language modelling maximizes the likelihood of a sentence $\textbf s=(w_1,\dots,w_n)$ by modelling it as :

$L(\textbf s)=p(w_1,\dots,w_n)=\prod_{i=1}^n p(w_i\vert w_1,\dots,w_{i-1})$

Where $p(w_i\vert w_1,\dots,w_{i-1})$ is given by the output of the RNN at step $i$

In the case of training with minibatching, the sentences must be of the same length in each minibatch. This requires some preprocessing (see train_rnnlm-batch.cc for example).

Reference : Mikolov et al., 2010

Template Parameters

Builder: This can be any RNNBuilder

Public Functions

RNNBatchLanguageModel(Model &model, unsigned LAYERS, unsigned INPUT_DIM, unsigned HIDDEN_DIM, unsigned VOCAB_SIZE)¶

Constructor for the batched RNN language model.

Parameters

model: Model to hold all parameters for training
LAYERS: Number of layers of the RNN
INPUT_DIM: Embedding dimension for the words
HIDDEN_DIM: Dimension of the hidden states
VOCAB_SIZE: Size of the input vocabulary

Expression getNegLogProb(const vector<vector<int>> &sents, unsigned id, unsigned bsize, unsigned &tokens, ComputationGraph &cg)¶

Computes the negative log probability on a batch.

Return

Expression for $ $\sum_{s\in\mathrm{batch}}\log(p(s))$

Parameters

sents: Full training set
id: Start index of the batch
bsize: Batch size (id + bsize should be smaller than the size of the dataset)
tokens: Number of tokens processed by the model (used for loos per token computation)
cg: Computation graph

void RandomSample(const dynet::Dict &d, int max_len = 150, float temp = 1.0)¶

Samples a string of words/characters from the model.

This can be used to debug and/or have fun. Try it on new datasets!

Parameters

d: Dictionary to use (should be same as the one used for training)
max_len: maximu number of tokens to generate
temp: Temperature for sampling (the softmax computed is $\frac{e^{\frac{r_t^{(i)}}{T}}}{\sum_{j=1}^{\vert V\vert}e^{\frac{r_t^{(j)}}{T}}}$). Intuitively lower temperature -> less deviation from the distribution (= more “standard” samples)

Sequence to sequence models¶

Dynet is well suited for the variety of sequence to sequence models used in modern NLP. Here are some pre-coded structs implementing the most common one.

template <class Builder>

struct EncoderDecoder¶

#include <encdec.h>

This structure is a “vanilla” encoder decoder model.

This sequence to sequence network models the conditional probability $p(y_1,\dots,y_m\vert x_1,\dots,x_n)=\prod_{i=1}^m p(y_i\vert \textbf{e},y_1,\dots,y_{i-1})$ where $\textbf{e}=ENC(x_1,\dots,x_n)$ is an encoding of the input sequence produced by a recurrent neural network.

Typically $\textbf{e}$ is the concatenated cell and output vector of a (multilayer) LSTM.

Sequence to sequence models were introduced in Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation .

Our implementation is more akin to the one from Sequence to sequence learning with neural networks .

Template Parameters

Builder: This can theoretically be any RNNbuilder. It’s only been tested with an LSTM as of now

Public Functions

EncoderDecoder()¶: Default builder.

EncoderDecoder(Model &model, unsigned num_layers, unsigned input_dim, unsigned hidden_dim, bool bwd = false)¶

Creates an EncoderDecoder.

Parameters

model: Model holding the parameters
num_layers: Number of layers (same in the ecoder and decoder)
input_dim: Dimension of the word/char embeddings
hidden_dim: Dimension of the hidden states
bwd: Set to true to make the encoder bidirectional. This doubles the number of parameters in the encoder. This will also add parameters for an affine transformation from the bidirectional encodings (of size num_layers * 2 * hidden_dim) to encodings of size num_layers * hidden_dim compatible with the decoder

Expression encode(const vector<vector<int>> &isents, unsigned id, unsigned bsize, unsigned &chars, ComputationGraph &cg)¶

Batched encoding.

Encodes a batch of sentences of the same size (don’t forget to pad them)

Return

Returns the expression for the negative (batched) encoding

Parameters

isents: Whole dataset
id: Index of the start of the batch
bsize: Batch size
chars: Number of tokens processed (used to compute loss per characters)
cg: Computation graph

Expression encode(const vector<int> &insent, ComputationGraph &cg)¶

Single sentence version of encode

Note : this just creates a trivial dataset and feed it to the batched version with batch_size 1. It’s not very effective so don’t use it for training.

Return

Expression of the encoding

Parameters

insent: Input sentence
cg: Computation graph

Expression decode(const Expression i_nc, const vector<vector<int>> &osents, int id, int bsize, ComputationGraph &cg)¶

Batched decoding.

[long description]

Return

Expression for the negative log likelihood

Parameters

i_nc: Encoding (should be batched)
osents: Output sentences dataset
id: Start index of the batch
bsize: Batch size (should be consistent with the shape of i_nc)
cg: Computation graph

Expression decode(const Expression i_nc, const vector<int> &osent, ComputationGraph &cg)¶

Single sentence version of decode

For similar reasons as encode, this is not really efficient. USed the batched version directly for training

Return

Expression for the negative log likelihood

Parameters

i_nc: Encoding
osent: Output sentence
cg: Computation graph

vector<int> generate(const vector<int> &insent, ComputationGraph &cg)¶

Generate a sentence from an input sentence.

Samples at each timestep ducring decoding. Possible variations are greedy decoding and beam search for better performance

Return

Generated sentence (indices in the dictionary)

Parameters

insent: Input sentence
cg: Computation Graph

vector<int> generate(Expression i_nc, unsigned oslen, ComputationGraph &cg)¶

Generate a sentence from an encoding.

You can use this directly to generate random sentences

Return

Generated sentence (indices in the dictionary)

Parameters

i_nc: Input encoding
oslen: Maximum length of output
cg: Computation graph