Examples¶
Here are some simple models coded in the examples of Dynet. Feel free to use and modify them.
Feed-forward models¶
Although Dynet was primarily built for natural language processing purposes it is still possible to code feed-forward nets. Here are some bricks and examples to do so.
-
enum ffbuilders
::
Activation
¶ Common activation functions used in multilayer perceptrons
Values:
-
ffbuilders
SIGMOID
¶ SIGMOID
: Sigmoid function \(x\longrightarrow \frac {1} {1+e^{-x}}\)
-
ffbuilders
TANH
¶ TANH
: Tanh function \(x\longrightarrow \frac {1-e^{-2x}} {1+e^{-2x}}\)
-
ffbuilders
RELU
¶ RELU
: Rectified linear unit \(x\longrightarrow \max(0,x)\)
-
ffbuilders
LINEAR
¶ LINEAR
: Identity function \(x\longrightarrow x\)
-
ffbuilders
SOFTMAX
¶ SOFTMAX
: Softmax function \(\textbf{x}=(x_i)_{i=1,\dots,n}\longrightarrow \frac {e^{x_i}}{\sum_{j=1}^n e^{x_j} })_{i=1,\dots,n}\)
-
ffbuilders
-
struct
Layer
¶ - #include <mlp.h>
Simple layer structure.
Contains all parameters defining a layer
Public Functions
-
struct
MLP
¶ - #include <mlp.h>
Simple multilayer perceptron.
Public Functions
-
MLP
MLP
(Model &model, vector<Layer> layers)¶ Returns a Multilayer perceptron.
Creates a feedforward multilayer perceptron based on a list of layer descriptions
- Parameters
model
: Model to contain parameterslayers
: Layers description
-
void MLP
append
(Model &model, Layer layer)¶ Append a layer at the end of the network.
[long description]
- Parameters
model
: [description]layer
: [description]
-
Expression MLP
run
(Expression x, ComputationGraph &cg)¶ Run the MLP on an input vector/batch.
- Return
- [description]
- Parameters
x
: Input expression (vector or batch)cg
: Computation graph
-
Expression MLP
get_nll
(Expression x, vector<unsigned> labels, ComputationGraph &cg)¶ Return the negative log likelihood for the (batched) pair (x,y)
For a batched input \(\{x_i\}_{i=1,\dots,N}\), \(\{y_i\}_{i=1,\dots,N}\), this computes \(\sum_{i=1}^N \log(P(y_i\vert x_i))\) where \(P(\textbf{y}\vert x_i)\) is modelled with ${softmax}(MLP(x_i))$
- Return
- Expression for the negative log likelihood on the batch
- Parameters
x
: Input batchlabels
: Output labelscg
: Computation graph
-
int MLP
predict
(Expression x, ComputationGraph &cg)¶ Predict the most probable label.
Returns the argmax of the softmax of the networks output
- Return
- Label index
- Parameters
x
: Inputcg
: Computation graph
-
void MLP
enable_dropout
()¶ Enable dropout.
This is supposed to be used during training or during testing if you want to sample outputs using montecarlo
-
MLP
Language models¶
Language modelling is one of the cornerstones of natural language processing. Dynet allows great flexibility in the creation of neural language models. Here are some examples.
- template <class Builder>
-
struct
RNNBatchLanguageModel
¶ - #include <rnnlm-batch.h>
This structure wraps any RNN to train a language model with minibatching.
Recurrent neural network based language modelling maximizes the likelihood of a sentence \(\textbf s=(w_1,\dots,w_n)\) by modelling it as :
\(L(\textbf s)=p(w_1,\dots,w_n)=\prod_{i=1}^n p(w_i\vert w_1,\dots,w_{i-1})\)
Where \(p(w_i\vert w_1,\dots,w_{i-1})\) is given by the output of the RNN at step \(i\)
In the case of training with minibatching, the sentences must be of the same length in each minibatch. This requires some preprocessing (see
train_rnnlm-batch.cc
for example).Reference : Mikolov et al., 2010
- Template Parameters
Builder
: This can be any RNNBuilder
Public Functions
-
RNNBatchLanguageModel
RNNBatchLanguageModel
(Model &model, unsigned LAYERS, unsigned INPUT_DIM, unsigned HIDDEN_DIM, unsigned VOCAB_SIZE)¶ Constructor for the batched RNN language model.
- Parameters
model
: Model to hold all parameters for trainingLAYERS
: Number of layers of the RNNINPUT_DIM
: Embedding dimension for the wordsHIDDEN_DIM
: Dimension of the hidden statesVOCAB_SIZE
: Size of the input vocabulary
-
Expression RNNBatchLanguageModel
getNegLogProb
(const vector<vector<int>> &sents, unsigned id, unsigned bsize, unsigned &tokens, ComputationGraph &cg)¶ Computes the negative log probability on a batch.
- Return
- Expression for $ \(\sum_{s\in\mathrm{batch}}\log(p(s))\)
- Parameters
sents
: Full training setid
: Start index of the batchbsize
: Batch size (id
+bsize
should be smaller than the size of the dataset)tokens
: Number of tokens processed by the model (used for loos per token computation)cg
: Computation graph
-
void RNNBatchLanguageModel
RandomSample
(const dynet::Dict &d, int max_len = 150, float temp = 1.0)¶ Samples a string of words/characters from the model.
This can be used to debug and/or have fun. Try it on new datasets!
- Parameters
d
: Dictionary to use (should be same as the one used for training)max_len
: maximu number of tokens to generatetemp
: Temperature for sampling (the softmax computed is \(\frac{e^{\frac{r_t^{(i)}}{T}}}{\sum_{j=1}^{\vert V\vert}e^{\frac{r_t^{(j)}}{T}}}\)). Intuitively lower temperature -> less deviation from the distribution (= more “standard” samples)
Sequence to sequence models¶
Dynet is well suited for the variety of sequence to sequence models used in modern NLP. Here are some pre-coded structs implementing the most common one.
- template <class Builder>
-
struct
EncoderDecoder
¶ - #include <encdec.h>
This structure is a “vanilla” encoder decoder model.
This sequence to sequence network models the conditional probability \(p(y_1,\dots,y_m\vert x_1,\dots,x_n)=\prod_{i=1}^m p(y_i\vert \textbf{e},y_1,\dots,y_{i-1})\) where \(\textbf{e}=ENC(x_1,\dots,x_n)\) is an encoding of the input sequence produced by a recurrent neural network.
Typically \(\textbf{e}\) is the concatenated cell and output vector of a (multilayer) LSTM.
Sequence to sequence models were introduced in Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation .
Our implementation is more akin to the one from Sequence to sequence learning with neural networks .
- Template Parameters
Builder
: This can theoretically be any RNNbuilder. It’s only been tested with an LSTM as of now
Public Functions
-
EncoderDecoder
EncoderDecoder
()¶ Default builder.
-
EncoderDecoder
EncoderDecoder
(Model &model, unsigned num_layers, unsigned input_dim, unsigned hidden_dim, bool bwd = false)¶ Creates an EncoderDecoder.
- Parameters
model
: Model holding the parametersnum_layers
: Number of layers (same in the ecoder and decoder)input_dim
: Dimension of the word/char embeddingshidden_dim
: Dimension of the hidden statesbwd
: Set totrue
to make the encoder bidirectional. This doubles the number of parameters in the encoder. This will also add parameters for an affine transformation from the bidirectional encodings (of size num_layers * 2 * hidden_dim) to encodings of size num_layers * hidden_dim compatible with the decoder
-
Expression EncoderDecoder
encode
(const vector<vector<int>> &isents, unsigned id, unsigned bsize, unsigned &chars, ComputationGraph &cg)¶ Batched encoding.
Encodes a batch of sentences of the same size (don’t forget to pad them)
- Return
- Returns the expression for the negative (batched) encoding
- Parameters
isents
: Whole datasetid
: Index of the start of the batchbsize
: Batch sizechars
: Number of tokens processed (used to compute loss per characters)cg
: Computation graph
-
Expression EncoderDecoder
encode
(const vector<int> &insent, ComputationGraph &cg)¶ Single sentence version of
encode
Note : this just creates a trivial dataset and feed it to the batched version with batch_size 1. It’s not very effective so don’t use it for training.
- Return
- Expression of the encoding
- Parameters
insent
: Input sentencecg
: Computation graph
-
Expression EncoderDecoder
decode
(const Expression i_nc, const vector<vector<int>> &osents, int id, int bsize, ComputationGraph &cg)¶ Batched decoding.
[long description]
- Return
- Expression for the negative log likelihood
- Parameters
i_nc
: Encoding (should be batched)osents
: Output sentences datasetid
: Start index of the batchbsize
: Batch size (should be consistent with the shape ofi_nc
)cg
: Computation graph
-
Expression EncoderDecoder
decode
(const Expression i_nc, const vector<int> &osent, ComputationGraph &cg)¶ Single sentence version of
decode
For similar reasons as
encode
, this is not really efficient. USed the batched version directly for training- Return
- Expression for the negative log likelihood
- Parameters
i_nc
: Encodingosent
: Output sentencecg
: Computation graph
-
vector<int> EncoderDecoder
generate
(const vector<int> &insent, ComputationGraph &cg)¶ Generate a sentence from an input sentence.
Samples at each timestep ducring decoding. Possible variations are greedy decoding and beam search for better performance
- Return
- Generated sentence (indices in the dictionary)
- Parameters
insent
: Input sentencecg
: Computation Graph
-
vector<int> EncoderDecoder
generate
(Expression i_nc, unsigned oslen, ComputationGraph &cg)¶ Generate a sentence from an encoding.
You can use this directly to generate random sentences
- Return
- Generated sentence (indices in the dictionary)
- Parameters
i_nc
: Input encodingoslen
: Maximum length of outputcg
: Computation graph