Examples¶
Here are some simple models coded in the examples of Dynet. Feel free to use and modify them.
Feed-forward models¶
Although Dynet was primarily built for natural language processing purposes it is still possible to code feed-forward nets. Here are some bricks and examples to do so.
-
enum
ffbuilders::
Activation
¶ Common activation functions used in multilayer perceptrons
Values:
-
SIGMOID
¶ SIGMOID
: Sigmoid function \(x\longrightarrow \frac {1} {1+e^{-x}}\)
-
TANH
¶ TANH
: Tanh function \(x\longrightarrow \frac {1-e^{-2x}} {1+e^{-2x}}\)
-
RELU
¶ RELU
: Rectified linear unit \(x\longrightarrow \max(0,x)\)
-
LINEAR
¶ LINEAR
: Identity function \(x\longrightarrow x\)
-
SOFTMAX
¶ SOFTMAX
: Softmax function \(\textbf{x}=(x_i)_{i=1,\dots,n}\longrightarrow \frac {e^{x_i}}{\sum_{j=1}^n e^{x_j} })_{i=1,\dots,n}\)
-
-
struct
Layer
¶ - #include <mlp.h>
Simple layer structure.
Contains all parameters defining a layer
Public Functions
-
Layer
(unsigned input_dim, unsigned output_dim, Activation activation, float dropout_rate)¶ Build a feed forward layer.
- Parameters
input_dim
: Input dimensionoutput_dim
: Output dimensionactivation
: Activation functiondropout_rate
: Dropout rate
-
-
struct
MLP
¶ - #include <mlp.h>
Simple multilayer perceptron.
Public Functions
-
MLP
(Model &model)¶ Default constructor.
Dont forget to add layers!
-
MLP
(Model &model, vector<Layer> layers)¶ Returns a Multilayer perceptron.
Creates a feedforward multilayer perceptron based on a list of layer descriptions
- Parameters
model
: Model to contain parameterslayers
: Layers description
-
void
append
(Model &model, Layer layer)¶ Append a layer at the end of the network.
[long description]
- Parameters
model
: [description]layer
: [description]
-
Expression
run
(Expression x, ComputationGraph &cg)¶ Run the MLP on an input vector/batch.
- Return
- [description]
- Parameters
x
: Input expression (vector or batch)cg
: Computation graph
-
Expression
get_nll
(Expression x, vector<unsigned> labels, ComputationGraph &cg)¶ Return the negative log likelihood for the (batched) pair (x,y)
For a batched input \(\{x_i\}_{i=1,\dots,N}\), \(\{y_i\}_{i=1,\dots,N}\), this computes \(\sum_{i=1}^N \log(P(y_i\vert x_i))\) where \(P(\textbf{y}\vert x_i)\) is modelled with ${softmax}(MLP(x_i))$
- Return
- Expression for the negative log likelihood on the batch
- Parameters
x
: Input batchlabels
: Output labelscg
: Computation graph
-
int
predict
(Expression x, ComputationGraph &cg)¶ Predict the most probable label.
Returns the argmax of the softmax of the networks output
- Return
- Label index
- Parameters
x
: Inputcg
: Computation graph
-
void
enable_dropout
()¶ Enable dropout.
This is supposed to be used during training or during testing if you want to sample outputs using montecarlo
-
void
disable_dropout
()¶ Disable dropout.
Do this during testing if you want a deterministic network
-
bool
is_dropout_enabled
()¶ Check wether dropout is enabled or not.
- Return
- Dropout state
-
Language models¶
Language modelling is one of the cornerstones of natural language processing. Dynet allows great flexibility in the creation of neural language models. Here are some examples.
- template <class Builder>
-
struct
RNNBatchLanguageModel
¶ - #include <rnnlm-batch.h>
This structure wraps any RNN to train a language model with minibatching.
Recurrent neural network based language modelling maximizes the likelihood of a sentence \(\textbf s=(w_1,\dots,w_n)\) by modelling it as :
\(L(\textbf s)=p(w_1,\dots,w_n)=\prod_{i=1}^n p(w_i\vert w_1,\dots,w_{i-1})\)
Where \(p(w_i\vert w_1,\dots,w_{i-1})\) is given by the output of the RNN at step \(i\)
In the case of training with minibatching, the sentences must be of the same length in each minibatch. This requires some preprocessing (see
train_rnnlm-batch.cc
for example).Reference : Mikolov et al., 2010
- Template Parameters
Builder
: This can be any RNNBuilder
Public Functions
-
RNNBatchLanguageModel
(Model &model, unsigned LAYERS, unsigned INPUT_DIM, unsigned HIDDEN_DIM, unsigned VOCAB_SIZE)¶ Constructor for the batched RNN language model.
- Parameters
model
: Model to hold all parameters for trainingLAYERS
: Number of layers of the RNNINPUT_DIM
: Embedding dimension for the wordsHIDDEN_DIM
: Dimension of the hidden statesVOCAB_SIZE
: Size of the input vocabulary
-
Expression
getNegLogProb
(const vector<vector<int>> &sents, unsigned id, unsigned bsize, unsigned &tokens, ComputationGraph &cg)¶ Computes the negative log probability on a batch.
- Return
- Expression for $ \(\sum_{s\in\mathrm{batch}}\log(p(s))\)
- Parameters
sents
: Full training setid
: Start index of the batchbsize
: Batch size (id
+bsize
should be smaller than the size of the dataset)tokens
: Number of tokens processed by the model (used for loos per token computation)cg
: Computation graph
-
void
RandomSample
(const dynet::Dict &d, int max_len = 150, float temp = 1.0)¶ Samples a string of words/characters from the model.
This can be used to debug and/or have fun. Try it on new datasets!
- Parameters
d
: Dictionary to use (should be same as the one used for training)max_len
: maximu number of tokens to generatetemp
: Temperature for sampling (the softmax computed is \(\frac{e^{\frac{r_t^{(i)}}{T}}}{\sum_{j=1}^{\vert V\vert}e^{\frac{r_t^{(j)}}{T}}}\)). Intuitively lower temperature -> less deviation from the distribution (= more “standard” samples)
Sequence to sequence models¶
Dynet is well suited for the variety of sequence to sequence models used in modern NLP. Here are some pre-coded structs implementing the most common one.
- template <class Builder>
-
struct
EncoderDecoder
¶ - #include <encdec.h>
This structure is a “vanilla” encoder decoder model.
This sequence to sequence network models the conditional probability \(p(y_1,\dots,y_m\vert x_1,\dots,x_n)=\prod_{i=1}^m p(y_i\vert \textbf{e},y_1,\dots,y_{i-1})\) where \(\textbf{e}=ENC(x_1,\dots,x_n)\) is an encoding of the input sequence produced by a recurrent neural network.
Typically \(\textbf{e}\) is the concatenated cell and output vector of a (multilayer) LSTM.
Sequence to sequence models were introduced in Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation .
Our implementation is more akin to the one from Sequence to sequence learning with neural networks .
- Template Parameters
Builder
: This can theoretically be any RNNbuilder. It’s only been tested with an LSTM as of now
Public Functions
-
EncoderDecoder
()¶ Default builder.
-
EncoderDecoder
(Model &model, unsigned num_layers, unsigned input_dim, unsigned hidden_dim, bool bwd = false)¶ Creates an EncoderDecoder.
- Parameters
model
: Model holding the parametersnum_layers
: Number of layers (same in the ecoder and decoder)input_dim
: Dimension of the word/char embeddingshidden_dim
: Dimension of the hidden statesbwd
: Set totrue
to make the encoder bidirectional. This doubles the number of parameters in the encoder. This will also add parameters for an affine transformation from the bidirectional encodings (of size num_layers * 2 * hidden_dim) to encodings of size num_layers * hidden_dim compatible with the decoder
-
Expression
encode
(const vector<vector<int>> &isents, unsigned id, unsigned bsize, unsigned &chars, ComputationGraph &cg)¶ Batched encoding.
Encodes a batch of sentences of the same size (don’t forget to pad them)
- Return
- Returns the expression for the negative (batched) encoding
- Parameters
isents
: Whole datasetid
: Index of the start of the batchbsize
: Batch sizechars
: Number of tokens processed (used to compute loss per characters)cg
: Computation graph
-
Expression
encode
(const vector<int> &insent, ComputationGraph &cg)¶ Single sentence version of
encode
Note : this just creates a trivial dataset and feed it to the batched version with batch_size 1. It’s not very effective so don’t use it for training.
- Return
- Expression of the encoding
- Parameters
insent
: Input sentencecg
: Computation graph
-
Expression
decode
(const Expression i_nc, const vector<vector<int>> &osents, int id, int bsize, ComputationGraph &cg)¶ Batched decoding.
[long description]
- Return
- Expression for the negative log likelihood
- Parameters
i_nc
: Encoding (should be batched)osents
: Output sentences datasetid
: Start index of the batchbsize
: Batch size (should be consistent with the shape ofi_nc
)cg
: Computation graph
-
Expression
decode
(const Expression i_nc, const vector<int> &osent, ComputationGraph &cg)¶ Single sentence version of
decode
For similar reasons as
encode
, this is not really efficient. USed the batched version directly for training- Return
- Expression for the negative log likelihood
- Parameters
i_nc
: Encodingosent
: Output sentencecg
: Computation graph
-
vector<int>
generate
(const vector<int> &insent, ComputationGraph &cg)¶ Generate a sentence from an input sentence.
Samples at each timestep ducring decoding. Possible variations are greedy decoding and beam search for better performance
- Return
- Generated sentence (indices in the dictionary)
- Parameters
insent
: Input sentencecg
: Computation Graph
-
vector<int>
generate
(Expression i_nc, unsigned oslen, ComputationGraph &cg)¶ Generate a sentence from an encoding.
You can use this directly to generate random sentences
- Return
- Generated sentence (indices in the dictionary)
- Parameters
i_nc
: Input encodingoslen
: Maximum length of outputcg
: Computation graph