# Python Reference Manual¶

## Dynet global parameters¶

### DynetParams¶

class dynet.DynetParams

This object holds the global parameters of Dynet

You should only need to use this after importing dynet as :

import _dynet / import _gdynet

See the documentation for more details

from_args(shared_parameters=None)

Gets parameters from the command line arguments

You can still modify the parameters after calling this. See the documentation about command line arguments for more details

Keyword Arguments:
shared_parameters ([type]) – [description] (default: None)
init()

Initialize dynet with the current dynetparams object.

This is one way, you can’t uninitialize dynet

set_mem(mem)

Set the memory allocated to dynet

The unit is MB

Parameters: mem (number) – memory size in MB
set_random_seed(random_seed)

Set random seed for dynet

Parameters: random_seed (number) – Random seed
set_requested_gpus(requested_gpus)

Number of requested gpus

Currently only 1 is supported

Parameters: requested_gpus (number) – number of requested gpus
set_shared_parameters(shared_parameters)

Shared parameters

Parameters: shared_parameters (bool) – shared parameters
set_weight_decay(weight_decay)

Set weight decay parameter

Parameters: weight_decay (float) – weight decay parameter

### Initialization functions¶

dynet.init(shared_parameters=None)

Initialize dynet

Initializes dynet from command line arguments. Do not use after

import dynet

only after

import _dynet / import _gdynet
Keyword Arguments:
shared_parameters (bool) – [description] (default: None)
dynet.init_from_params(params)

Initialize from DynetParams

Same as

params.init()
Parameters: params (DynetParams) – dynet parameters

## Model and Parameters¶

### Model¶

class dynet.Model

A model holds Parameters. Use it to create, load and save parameters.

add_lookup_parameters(dim, init=None)

Add a lookup parameter to the model

Parameters: Keyword Arguments: dim (tuple) – Shape of the parameter. The first dimension is the lookup dimension init (dynet.PyInitializer) – Initializer (default: GlorotInitializer) Created LookupParameter (dynet.LookupParameters)
add_parameters(dim, init=None)

Add a parameter to the model

Parameters: Keyword Arguments: dim (tuple) – Shape of the parameter init (dynet.PyInitializer) – Initializer (default: GlorotInitializer) Created Parameter (dynet.Parameters)
from_file(fname)

Create model from file

Loads all parameters in file and returns model holding them

Parameters: fname (str) – File name Created model (dynet.Model)
load(fname)

Load a list of parameters from file

Parameters: fname (str) – File name List of parameters loaded from file (list)
load_all(fname)

Load all parameters in model from file

Parameters: fname (str) – File name
parameters_from_numpy(array)

Create parameter from numpy array

Parameters: array (np.ndarray) – Numpy array Parameter (dynet.Parameters)
save(fname, components=None)

Save a list of parameters to file

Parameters: Keyword Arguments: fname (str) – File name components (list) – List of parameters to save (default: None)
save_all(fname)

Save all parameters in model to file

Parameters: fname (str) – File name

### Parameters and LookupParameters¶

class dynet.Parameters

Parameters class

Parameters are things that are optimized. in contrast to a system like Torch where computational modules may have their own parameters, in DyNet parameters are just parameters.

as_array()

Return as a numpy array.

Returns: values of the parameter np.ndarray
clip_inplace(left, right)

Clip the values in the parameter to a fixed range [left, right] (in place)

Returns: None
expr(update=True)

Returns the parameter as an expression

This is the same as calling

dy.parameter(param)
Parameters: update (bool) – If this is set to False, the parameter won’t be updated during the backward pass Expression of the parameter Expression
get_index()

Get parameter index

Returns: Index of the parameter unsigned
grad_as_array()

Return gradient as a numpy array.

Returns: values of the gradient w.r.t. this parameter np.ndarray
is_updated()

check whether the parameter is updated or not

Returns: Update status bool
load_array(arr)

Deprecated

scale(s)

Scales the parameter

Parameters: s (float) – Scale
set_updated(b)

Set parameter as “updated”

Parameters: b (bool) – updated status
shape()

[summary]

[description]

Returns: [description] [type]
zero()

Set the parameter to zero

class dynet.LookupParameters
as_array()

Return as a numpy array.

grad_as_array()

Return gradients as a numpy array.

scale(s)

Scales the parameter

Parameters: s (float) – Scale

### Parameters initializers¶

class dynet.PyInitializer

Base class for parameter initializer

class dynet.NormalInitializer(mean=0, var=1)

Initialize the parameters with a gaussian distribution

Keyword Arguments:

• mean (number) – Mean of the distribution (default: 0)
• var (number) – Variance of the distribution (default: 1)
class dynet.UniformInitializer(scale)

Initialize the parameters with a uniform distribution

Parameters: scale (number) – Parmeters are sampled from $$\mathcal U([-\texttt{scale},\texttt{scale}])$$
class dynet.ConstInitializer(c)

Initialize the parameters with a constant value

Parameters: c (number) – Value to initialize the parameters
class dynet.IdentityInitializer

Initialize the parameters as the identity

Only works with square matrices

class dynet.GlorotInitializer(is_lookup=False, gain=1.0)

Initializes the weights according to Glorot & Bengio (2011)

If the dimensions of the parameter matrix are $$m,n$$, the weights are sampled from $$\mathcal U([-g\sqrt{\frac{6}{m+n}},g\sqrt{\frac{6}{m+n}}])$$

The gain $$g$$ depends on the activation function :

• $$\text{tanh}$$ : 1.0
• $$\text{ReLU}$$ : 0.5
• $$\text{sigmoid}$$ : 4.0
• Any smooth function $$f$$ : $$\frac{1}{f'(0)}$$
Keyword Arguments:

• is_lookup (bool) – Whether the parameter is alookup parameter (default: False)
• gain (number) – Gain (Depends on the activation function) (default: 1.0)
class dynet.SaxeInitializer(scale=1.0)

Initializes according to Saxe et al. (2014)

Initializes as a random orthonormal matrix (unimplemented for GPU)
Keyword Arguments:
scale (number): scale to apply to the orthonormal matrix
class dynet.FromFileInitializer(fname)

Initialize parameter from file

Parameters: fname (str) – File name
class dynet.NumpyInitializer(array)

Initialize from numpy array

Alternatively, use Model.parameters_from_numpy()

Parameters: array (np.ndarray) – Numpy array

## Computation Graph¶

dynet.renew_cg(immediate_compute=False, check_validity=False)

Renew the computation graph.

Call this before building any new computation graph

dynet.cg_version()

Varsion of the current computation graph

dynet.print_text_graphviz()
dynet.cg_checkpoint()

Saves the state of the computation graph

dynet.cg_revert()

Revert the computation graph state to the previous checkpoint

dynet.cg()

Get the current ComputationGraph

class dynet.ComputationGraph

Computation graph object

While the ComputationGraph is central to the inner workings of DyNet, from the user’s perspective, the only responsibility is to create a new computation graph for each training example.

parameters(params)

Same as dynet.parameters(params)

renew(immediate_compute=False, check_validity=False)

Same as dynet.renew_cg()

version()

Same as dynet.cg_version()

## Operations¶

### Expressions¶

class dynet.Expression

Expressions are the building block of a Dynet computation graph.

Expressions are the main data types being manipulated in a DyNet program. Each expression represents a sub-computation in a computation graph.

backward(full=False)

Run the backward pass based on this expression

The parameter full specifies whether the gradients should be computed for all nodes (True) or only non-constant nodes (False).

By default, a node is constant unless

1. it is a parameter node
2. it depends on a non-constant node

Thus, functions of constants and inputs are considered as constants.

Turn full on if you want to retrieve gradients w.r.t. inputs for instance. By default this is turned off, so that the backward pass ignores nodes which have no influence on gradients w.r.t. parameters for efficiency.

Parameters: full (bool) – Whether to compute all gradients (including with respect to constant nodes).
dim()

Dimension of the expression

Returns a tuple (dims,batch_dim) where dims is the tuple of dimensions of each batch element

Returns: dimension tuple
forward(recalculate=False)

This runs incremental forward on the entire graph

May not be optimal in terms of efficiency. Prefer values

Keyword Arguments:
recalculate (bool) – Recalculate the computation graph (for static graphs with new inputs) (default: False)
gradient()

Returns the value of the expression as a numpy array

The last dimension is the batch size (if it’s > 1).

Make sure to call backward on a downstream expression before calling this.

If the Expression is a constant expression (meaning it’s not a function of a parameter), dynet won’t compute it’s gradient for the sake of efficiency. You need to manually force the gradient computation by adding the agument full=True to backward

Returns: numpy array of values np.ndarray
npvalue(recalculate=False)

Returns the value of the expression as a numpy array

The last dimension is the batch size (if it’s > 1)

Keyword Arguments:
recalculate (bool) – Recalculate the computation graph (for static graphs with new inputs) (default: False)
Returns:numpy array of values
Return type:np.ndarray
scalar_value(recalculate=False)

Returns value of an expression as a scalar

This only works if the expression is a scalar

Keyword Arguments:
recalculate (bool) – Recalculate the computation graph (for static graphs with new inputs) (default: False)
Returns:Scalar value of the expression
Return type:float
tensor_value(recalculate=False)

Returns the value of the expression as a Tensor.

This is useful if you want to use the value for other on-device calculations that are not part of the computation graph, i.e. using argmax.

Keyword Arguments:
recalculate (bool) – Recalculate the computation graph (for static graphs with new inputs) (default: False)
Returns:a dynet Tensor object.
Return type:Tensor
value(recalculate=False)

Gets the value of the expression in the most relevant format

this returns the same thing as scalar_value, vec_value, npvalue depending on whether the number of dimensions of the expression is 0, 1 or 2+

Keyword Arguments:
recalculate (bool) – Recalculate the computation graph (for static graphs with new inputs) (default: False)
Returns:Value of the expression
Return type:float, list, np.ndarray
vec_value(recalculate=False)

Returns the value of the expression as a vector

In case of a multidimensional expression, the values are flattened according to a column major ordering

Keyword Arguments:
recalculate (bool) – Recalculate the computation graph (for static graphs with new inputs) (default: False)
Returns:Array of values
Return type:list

### Operations¶

Operations are used to build expressions

#### Input operations¶

dynet.parameter(p, update=True)

Load a parameter in the computation graph

Get the expression corresponding to a parameter

Parameters: p (Parameter,LookupParameter) – Parameter to load (can be a lookup parameter as well) update (bool) – If this is set to False, the parameter won’t be updated during the backward pass Parameter expression Expression NotImplementedError – Only works with parameters and lookup parameters
dynet.inputTensor(arr, batched=False)

Creates a tensor expression based on a numpy array or a list.

The dimension is inferred from the shape of the input. if batched=True, the last dimension is used as a batch dimension if arr is a list of numpy ndarrays, this returns a batched expression where the batch elements are the elements of the list

Parameters: Keyword Arguments: arr (list,np.ndarray) – Values : numpy ndarray OR list of np.ndarray OR multidimensional list of floats batched (bool) – Whether to use the last dimension as a batch dimension (default: False) Input expression _vecInputExpression TypeError – If the type is not respected
dynet.scalarInput(s)
dynet.vecInput(dim)

Input an empty vector

Parameters: dim (number) – Size Corresponding expression _vecInputExpression
dynet.inputVector(v)

Input a vector by values

Parameters: v (vector[float]) – Values Corresponding expression _vecInputExpression
dynet.matInput(d1, d2)

DEPRECATED : use inputTensor

TODO : remove this

Parameters: d1 (int) – [description] d2 (int) – [description] [description] dynet.Expression
dynet.inputMatrix(v, d)

DEPRECATED : use inputTensor

TODO : remove this

inputMatrix(vector[float] v, tuple d)

Create a matrix literal. First argument is a list of floats (or a flat numpy array). Second argument is a dimension. Returns: an expression. Usage example:

x = inputMatrix([1,2,3,4,5,6],(2,3))
x.npvalue()
-->
array([[ 1.,  3.,  5.],
[ 2.,  4.,  6.]])

dynet.lookup(p, index=0, update=True)

Pick an embedding from a lookup parameter and returns it as a expression

param p: Lookup parameter to pick from LookupParameters
Keyword Arguments:

• index (number) – Lookup index (default: 0)
• update (bool) – Whether to update the lookup parameter [(default: True)
Returns:

Expression for the embedding

Return type:

_lookupExpression

dynet.lookup_batch(p, indices, update=True)

Look up parameters.

The mini-batched version of lookup. The resulting expression will be a mini-batch of parameters, where the “i”th element of the batch corresponds to the parameters at the position specified by the “i”th element of “indices”

Parameters: Keyword Arguments: p (LookupParameters) – Lookup parameter to pick from indices (list(int)) – Indices to look up for each batch element update (bool) – Whether to update the lookup parameter (default: True) Expression for the batched embeddings _lookupBatchExpression
dynet.zeroes(dim, batch_size=1)

Create an input full of zeros

Create an input full of zeros, sized according to dimensions dim

Parameters: Keyword Arguments: dim (tuple) – Dimension of the tensor batch_size (number) – Batch size of the tensor (default: (1)) A “d” dimensioned zero tensor dynet.Expression
dynet.random_normal(dim, batch_size=1)

Create a random normal vector

Create a vector distributed according to normal distribution with mean 0, variance 1.

Parameters: Keyword Arguments: dim (tuple) – Dimension of the tensor batch_size (number) – Batch size of the tensor (default: (1)) A “d” dimensioned normally distributed tensor dynet.Expression
dynet.random_bernoulli(dim, p, scale=1.0, batch_size=1)

Create a random bernoulli tensor

Create a tensor distributed according to bernoulli distribution with parameter $$p$$.

Parameters: Keyword Arguments: dim (tuple) – Dimension of the tensor p (number) – Parameter of the bernoulli distribution scale (number) – Scaling factor to apply to the sampled tensor (default: (1.0)) batch_size (number) – Batch size of the tensor (default: (1)) A “d” dimensioned bernoulli distributed tensor dynet.Expression
dynet.random_uniform(dim, left, right, batch_size=1)

Create a random uniform tensor

Create a tensor distributed according to uniform distribution with boundaries left and right.

Parameters: Keyword Arguments: dim (tuple) – Dimension of the tensor left (number) – Lower bound of the uniform distribution right (number) – Upper bound of the uniform distribution batch_size (number) – Batch size of the tensor (default: (1)) A “d” dimensioned uniform distributed tensor dynet.Expression
dynet.noise(x, stddev)

Add gaussian noise to an expression.

Parameters: x (dynet.Expression) – Input expression stddev (number) – The standard deviation of the gaussian $$y\sim\mathcal N(x,\texttt{stddev})$$ dynet.Expression

#### Arithmetic operations¶

dynet.cdiv(x, y)

Componentwise division

Do a componentwise division where each value is equal to $$\frac{x_i}{y_i}$$

Parameters: x (dynet.Expression) – The first input expression y (dynet.Expression) – The second input expression An expression where the ith element is equal to $$\frac{x_i}{y_i}$$ dynet.Expression
dynet.cmult(x, y)

Componentwise multiplication

Do a componentwise multiplication where each value is equal to $$x_i\times y_i$$

Parameters: x (dynet.Expression) – The first input expression y (dynet.Expression) – The second input expression An expression where the ith element is equal to $$x_i\times y_i$$ dynet.Expression
dynet.colwise_add(x, y)

Add vector $$y$$ to each column of matrix $$x$$

Parameters: x (dynet.Expression) – An MxN matrix y (dynet.Expression) – A length M vector An expression where $$y$$ is added to each column of $$x$$ dynet.Expression
dynet.squared_norm(x)

Squared norm

The squared norm of the values of x: $$\Vert x\Vert_2^2=\sum_i x_i^2$$.

Parameters: x (dynet.Expression) – Input expression $$\Vert x\Vert_2^2=\sum_i x_i^2$$ dynet.Expression
dynet.tanh(x)

Hyperbolic tangent

Elementwise calculation of the hyperbolic tangent

Parameters: x (dynet.Expression) – Input expression $$\tanh(x)$$ dynet.Expression
dynet.exp(x)

Natural exponent

Calculate elementwise $$y_i = e^{x_i}$$

Parameters: x (dynet.Expression) – Input expression $$e^{x}$$ dynet.Expression
dynet.square(x)

Square

Calculate elementwise $$y_i = x_i^2$$

Parameters: x (dynet.Expression) – Input expression $$y = x^2$$ dynet.Expression
dynet.sqrt(x)

Square root

Calculate elementwise $$y_i = \sqrt{x_i}$$

Parameters: x (dynet.Expression) – Input expression $$y = \sqrt{x}$$ dynet.Expression
dynet.abs(x)

Absolute value

Calculate elementwise $$y_i = \vert x_i\vert$$

Parameters: x (dynet.Expression) – Input expression $$y = \vert x\vert$$ dynet.Expression
dynet.erf(x)

Gaussian error function

Elementwise calculation of the Gaussian error function $$y_i = \text{erf}(x_i)=\frac {1}{\sqrt{\pi}}\int_{-x_i}^{x_i}e^{-t^2}\mathrm{d}t$$

Parameters: x (dynet.Expression) – Input expression $$y_i = \text{erf}(x_i)$$ dynet.Expression
dynet.cube(x)

Calculate elementwise $$y_i = x_i^3$$

Parameters: x (dynet.Expression) – Input expression $$y = x^3$$ dynet.Expression
dynet.log(x)

Natural logarithm

Elementwise calculation of the natural logarithm $$y_i = \ln(x_i)$$

Parameters: x (dynet.Expression) – Input expression $$y_i = \ln(x_i)$$ dynet.Expression
dynet.lgamma(x)

Log gamma

Calculate elementwise log gamma function $$y_i = \ln(\Gamma(x_i))$$

Parameters: x (dynet.Expression) – Input expression $$y_i = \ln(\Gamma(x_i))$$ dynet.Expression
dynet.logistic(x)

Logistic sigmoid function

Calculate elementwise $$y_i = \frac{1}{1+e^{-x_i}}$$

Parameters: x (dynet.Expression) – Input expression $$y_i = \frac{1}{1+e^{-x_i}}$$ dynet.Expression
dynet.rectify(x)

Rectifier (or ReLU, Rectified Linear Unit)

Calculate elementwise recitifer (ReLU) function $$y_i = \max(x_i,0)$$

Parameters: x (dynet.Expression) – Input expression $$y_i = \max(x_i,0)$$ dynet.Expression
dynet.sparsemax(x)

Sparsemax

The sparsemax function (Martins et al. 2016), which is similar to softmax, but induces sparse solutions where most of the vector elements are zero. Note: This function is not yet implemented on GPU.

Parameters: x (dynet.Expression) – Input expression The sparsemax of the scores dynet.Expression
dynet.softsign(x)

Softsign function

Calculate elementwise the softsign function $$y_i = \frac{x_i}{1+\vert x_i\vert}$$

Parameters: x (dynet.Expression) – Input expression $$y_i = \frac{x_i}{1+\vert x_i\vert}$$ dynet.Expression
dynet.pow(x, y)

Power function

Calculate an output where the ith element is equal to $$x_i^{y_i}$$

Parameters: x (dynet.Expression) – The first input expression y (dynet.Expression) – The second input expression $$x_i^{y_i}$$ dynet.Expression
dynet.bmin(x, y)

Minimum

Calculate an output where the ith element is $$\min(x_i,y_i)$$

Parameters: x (dynet.Expression) – The first input expression y (dynet.Expression) – The second input expression $$\min(x_i,y_i)$$ dynet.Expression
dynet.bmax(x, y)

Maximum

Calculate an output where the ith element is $$\max(x_i,y_i)$$

Parameters: x (dynet.Expression) – The first input expression y (dynet.Expression) – The second input expression $$\max(x_i,y_i)$$ dynet.Expression
dynet.transpose(x, dims=[1, 0])

Transpose a matrix

Get the transpose of the matrix, or if dims is specified shuffle the dimensions arbitrarily.

Note: This is O(1) if either the row or column dimension is 1, and O(n) otherwise.

Parameters: x (dynet.Expression) – Input expression dims (list) – The dimensions to swap. The ith dimension of the output will be equal to the dims[i] dimension of the input. dims must have the same number of dimensions as x. $$x^T$$ / the shuffled expression dynet.Expression
dynet.sum_cols(x)

[summary]

[description]

Parameters: x (dynet.Expression) – dynet.Expression
dynet.sum_elems(x)

Sum all elements

Sum all the elements in an expression.

Parameters: x (dynet.Expression) – Input expression The sum of all of its elements dynet.Expression
dynet.sum_batches(x)

Sum over minibatches

Sum an expression that consists of multiple minibatches into one of equal dimension but with only a single minibatch. This is useful for summing loss functions at the end of minibatch training.

Parameters: x (dynet.Expression) – Input expression An expression with a single batch dynet.Expression
dynet.fold_rows(x, nrows=2)

[summary]

[description]

Parameters: Keyword Arguments: x (dynet.Expression) – nrows {number} (unsigned) – (default: (2)) dynet.Expression
dynet.esum(xs)

Sum

This performs an elementwise sum over all the expressions in xs

Parameters: xs (list) – A list of expression of same dimension An expression where the ith element is equal to $$\sum_{j=0}\texttt{xs[}j\texttt{][}i\texttt{]}$$ dynet.Expression
dynet.logsumexp(xs)

Log, sum, exp

The elementwise “logsumexp” function that calculates $$\ln(\sum_i e^{xs_i})$$, used in adding probabilities in the log domain.

Parameters: xs (list) – A list of expression of same dimension An expression where the ith element is equal to $$\ln\left(\sum_{j=0}e^{\texttt{xs[}j\texttt{][}i\texttt{]}}\right)$$ dynet.Expression
dynet.average(xs)

Average

This performs an elementwise average over all the expressions in xs

Parameters: xs (list) – A list of expression of same dimension An expression where the ith element is equal to $$\frac{1}{\texttt{len(xs)}}\sum_{j=0}\texttt{xs[}j\texttt{][}i\texttt{]}$$ dynet.Expression
dynet.emax(xs)

Max

This performs an elementwise max over all the expressions in xs

Parameters: xs (list) – A list of expression of same dimension An expression where the ith element is equal to $$\max_j\texttt{xs[}j\texttt{][}i\texttt{]}$$ dynet.Expression

#### Loss/Probability operations¶

dynet.softmax(x)

Softmax

The softmax function normalizes each column to ensure that all values are between 0 and 1 and add to one by applying the $$\frac{e^{x_i}}{sum_j e^{x_j}}$$.

Parameters: x (dynet.Expression) – Input expression $$\frac{e^{x_i}}{\sum_j e^{x_j}}$$ dynet.Expression
dynet.log_softmax(x, restrict=None)

Restricted log softmax

The log softmax function calculated over only a subset of the vector elements. The elements to be included are set by the restriction variable. All elements not included in restriction are set to negative infinity.

Parameters: Keyword Arguments: x (dynet.Expression) – Input expression restrict (list) – List of log softmax to compute (default: (None)) A vector with the log softmax over the specified elements dynet.Expression
dynet.pairwise_rank_loss(x, y, m=1.0)

Pairwise rank loss

A margin-based loss, where every margin violation for each pair of values is penalized: $$\sum_i \max(x_i-y_i+m, 0)$$

Parameters: Keyword Arguments: x (dynet.Expression) – The first input expression y (dynet.Expression) – The second input expression m (number) – The margin (default: (1.0)) The pairwise rank loss dynet.Expression
dynet.poisson_loss(x, y)

Poisson loss

The negative log probability of y according to a Poisson distribution with parameter x. Useful in Poisson regression where, we try to predict the parameters of a Possion distribution to maximize the probability of data y.

Parameters: x (dynet.Expression) – The first input expression y (dynet.Expression) – The second input expression The Poisson loss dynet.Expression
dynet.huber_distance(x, y, c=1.345)

Huber distance

The huber distance between values of x and y parameterized by c, $$\sum_i L_c(x_i, y_i)$$ where:

$\begin{split}L_c(x, y) = \begin{cases}{lr} \frac{1}{2}(y - x)^2 & \textrm{for } \vert y - f(x)\vert \le c, \\ c\, \vert y - f(x)\vert - \frac{1}{2}c^2 & \textrm{otherwise.} \end{cases}\end{split}$
Parameters: Keyword Arguments: x (dynet.Expression) – The first input expression y (dynet.Expression) – The second input expression c (number) – The parameter of the huber distance parameterizing the cuttoff (default: (1.345)) The huber distance dynet.Expression
dynet.pickneglogsoftmax(x, v)

Negative softmax log likelihood

This function takes in a vector of scores x, and performs a log softmax, takes the negative, and selects the likelihood corresponding to the element v. This is perhaps the most standard loss function for training neural networks to predict one out of a set of elements.

Parameters: x (dynet.Expression) – Input scores v (int) – True class $$-\log\left(\frac{e^{x_v}}{\sum_j e^{x_j}}\right)$$ dynet.Expression
dynet.pickneglogsoftmax_batch(x, vs)

Negative softmax log likelihood on a batch

This function takes in a batched vector of scores x, and performs a log softmax, takes the negative, and selects the likelihood corresponding to the elements vs. This is perhaps the most standard loss function for training neural networks to predict one out of a set of elements.

Parameters: x (dynet.Expression) – Input scores v (list) – True classes $$-\sum_{v\in \texttt{vs}}\log\left(\frac{e^{x_v}}{\sum_j e^{x_j}}\right)$$ dynet.Expression
dynet.kmh_ngram(x, v)

[summary]

[description]

Parameters: x (dynet.Expression) – v (dynet.Expression) – dynet.Expression
dynet.squared_distance(x, y)

Squared distance

The squared distance between values of x and y: $$\Vert x-y\Vert_2^2=\sum_i (x_i-y_i)^2$$.

Parameters: x (dynet.Expression) – The first input expression y (dynet.Expression) – The second input expression $$\Vert x-y\Vert_2^2=\sum_i (x_i-y_i)^2$$ dynet.Expression
dynet.l1_distance(x, y)

L1 distance

L1 distance between values of x and y: $$\Vert x-y\Vert_1=\sum_i \vert x_i-y_i\vert$$.

Parameters: x (dynet.Expression) – The first input expression y (dynet.Expression) – The second input expression $$\Vert x-y\Vert_1=\sum_i \vert x_i-y_i\vert$$. dynet.Expression
dynet.binary_log_loss(x, y)

Binary log loss

The log loss of a binary decision according to the sigmoid sigmoid function $$- \sum_i (y_i \ln(x_i) + (1-y_i) \ln(1-x_i))$$

Parameters: x (dynet.Expression) – The first input expression y (dynet.Expression) – The second input expression $$- \sum_i (y_i \ln(x_i) + (1-y_i) \ln(1-x_i))$$ dynet.Expression

#### Flow/Shaping operations¶

dynet.pick(e, index=0, dim=0)

Pick element.

Pick a single element/row/column/sub-tensor from an expression. This will result in the dimension of the tensor being reduced by 1.

Parameters: Keyword Arguments: e (Expression) – Expression to pick from index (number) – Index to pick (default: 0) dim (number) – Dimension to pick from (default: 0) Picked expression _pickerExpression
dynet.pick_batch(e, indices, dim=0)

Batched pick.

Pick elements from multiple batches.

Parameters: e (Expression) – Expression to pick from indices (list) – Indices to pick dim (number) – Dimension to pick from (default: 0) Picked expression _pickerBatchExpression
dynet.pickrange(x, v, u)

Pick range of elements

Pick a range of elements from an expression.

Parameters: x (dynet.Expression) – input expression v (int) – Beginning index u (int) – End index The value of {x[v],...,x[u]} dynet.Expression
dynet.pick_batch_elem(x, v)

Pick batch element.

Pick batch element from a batched expression. For a Tensor with 3 batch elements:

$\begin{split}\begin{pmatrix} x_{1,1,1} & x_{1,1,2} \\ x_{1,2,1} & x_{1,2,2} \\ \end{pmatrix}\\ \begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix}\\ \begin{pmatrix} x_{3,1,1} & x_{3,1,2} \\ x_{3,2,1} & x_{3,2,2} \\ \end{pmatrix}\end{split}$

pick_batch_elem(t, 1) will return a Tensor of

$\begin{split}\begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix}\end{split}$
Parameters: x (dynet.Expression) – Input expression v (int) – The index of the batch element to be picked. The expression of picked batch element. The picked element is a tensor whose batch dimension equals to one. dynet.Expression
dynet.pick_batch_elems(x, vs)

Pick batch element.

Pick batch element from a batched expression. For a Tensor with 3 batch elements:

$\begin{split}\begin{pmatrix} x_{1,1,1} & x_{1,1,2} \\ x_{1,2,1} & x_{1,2,2} \\ \end{pmatrix}\\ \begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix}\\ \begin{pmatrix} x_{3,1,1} & x_{3,1,2} \\ x_{3,2,1} & x_{3,2,2} \\ \end{pmatrix}\end{split}$

pick_batch_elems(t, [2, 3]) will return a Tensor of

$\begin{split}\begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix}\\ \begin{pmatrix} x_{3,1,1} & x_{3,1,2} \\ x_{3,2,1} & x_{3,2,2} \\ \end{pmatrix}\end{split}$
Parameters: x (dynet.Expression) – Input expression vs (list) – A list of indices of the batch elements to be picked. The expression of picked batch elements. The batch elements is a tensor whose batch dimension equals to the size of list v. dynet.Expression
dynet.reshape(x, d, batch_size=1)

Reshape to another size

This node reshapes a tensor to another size, without changing the underlying layout of the data. The layout of the data in DyNet is column-major, so if we have a 3x4 matrix :

$\begin{split}\begin{pmatrix} x_{1,1} & x_{1,2} & x_{1,3} & x_{1,4} \\ x_{2,1} & x_{2,2} & x_{2,3} & x_{2,4} \\ x_{3,1} & x_{3,2} & x_{3,3} & x_{3,4} \\ \end{pmatrix}\end{split}$

and transform it into a 2x6 matrix, it will be rearranged as:

$\begin{split}\begin{pmatrix} x_{1,1} & x_{3,1} & x_{2,2} & x_{1,3} & x_{3,3} & x_{2,4} \\ x_{2,1} & x_{1,2} & x_{3,2} & x_{2,3} & x_{1,4} & x_{3,4} \\ \end{pmatrix}\end{split}$

Note: This is O(1) for forward, and O(n) for backward.

Parameters: Keyword Arguments: x (dynet.Expression) – Input expression d (tuple) – New dimension batch_size (int) – New batch size (default: (1)) The reshaped expression dynet.Expression
dynet.select_rows(x, rs)

Select rows

Select a subset of rows of a matrix.

Parameters: x (dynet.Expression) – Input expression rs (list) – The rows to extract An expression containing the selected rows dynet.Expression
dynet.select_cols(x, cs)

Select columns

Select a subset of columns of a matrix.

Parameters: x (dynet.Expression) – Input expression cs (list) – The columns to extract An expression containing the selected columns dynet.Expression
dynet.concatenate_cols(xs)

Concatenate columns

Perform a concatenation of the columns in multiple expressions. All expressions must have the same number of rows.

Parameters: xs (list) – A list of expressions The expression with the columns concatenated dynet.Expression
dynet.concatenate(xs, d=0)

Concatenate

Perform a concatenation of multiple expressions along a particular dimension. All expressions must have the same dimensions except for the dimension to be concatenated (rows by default).
Parameters: xs (list) – A list of expressions d – The dimension along with to perform concatenation The expression concatenated along the particular dimension dynet.Expression
dynet.concatenate_to_batch(xs)

Concatenate list of expressions to a single batched expression

Perform a concatenation of several expressions along the batch dimension. All expressions must have the same shape except for the batch dimension.

Parameters: xs (list) – A list of expressions of same dimension (except batch size) The expression with the batch dimensions concatenated dynet.Expression
dynet.max_dim(x, d=0)

Max out through a dimension

Select out a element/row/column/sub-tensor from an expression, with maximum value along a given dimension. This will result in the dimension of the expression being reduced by 1.

Parameters: Keyword Arguments: x (dynet.Expression) – Input expression d (int) – Dimension on which to perform the maxout (default: (0)) An expression of sub-tensor with max value along dimension d dynet.Expression
dynet.min_dim(x, d=0)

Min out through a dimension

Select out a element/row/column/sub-tensor from an expression, with minimum value along a given dimension. This will result in the dimension of the expression being reduced by 1.

Parameters: Keyword Arguments: x (dynet.Expression) – Input expression d (int) – Dimension on which to perform the minout (default: (0)) An expression of sub-tensor with min value along dimension d dynet.Expression
dynet.nobackprop(x)

Prevent backprop

This node has no effect on the forward pass, but prevents gradients from flowing backward during the backward pass. This is useful when there’s a subgraph for which you don’t want loss passed back to the parameters.

Parameters: x (dynet.Expression) – Input expression An output expression containing the same as input (only effects on backprop process) dynet.Expression
dynet.flip_gradient(x)

Negative backprop

This node has no effect on the forward pass, but takes negative on backprop process. This operation is widely used in adversarial networks.

Parameters: x (dynet.Expression) – Input expression An output expression containing the same as input (only effects on backprop process) dynet.Expression

#### Noise operations¶

dynet.noise(x, stddev)

Add gaussian noise to an expression.

Parameters: x (dynet.Expression) – Input expression stddev (number) – The standard deviation of the gaussian $$y\sim\mathcal N(x,\texttt{stddev})$$ dynet.Expression
dynet.dropout(x, p)

Dropout

With a fixed probability, drop out (set to zero) nodes in the input expression, and scale the remaining nodes by 1/p. Note that there are two kinds of dropout:

• Regular dropout: where we perform dropout at training time and then scale outputs by p at test time.
• Inverted dropout: where we perform dropout and scaling at training time, and do not need to do anything at test time.

DyNet implements the latter, so you only need to apply dropout at training time, and do not need to perform scaling and test time.

Parameters: x (dynet.Expression) – Input expression p (dynet.Expression) – The dropout probability The dropped out expression $$y=\frac{1}{1-\texttt{p}}x\circ z, z\sim\text{Bernoulli}(1-\texttt{p})$$ dynet.Expression
dynet.block_dropout(x, p)

Block dropout

Identical to the dropout operation, but either drops out all or no values in the expression, as opposed to making a decision about each value individually.

Parameters: x (dynet.Expression) – Input expression p (dynet.Expression) – The dropout probability The block dropout expression dynet.Expression

#### Linear algebra operations¶

dynet.affine_transform(exprs)

Affine transform

This performs an affine transform over an arbitrary (odd) number of expressions held in the input initializer list xs. The first expression is the “bias,” which is added to the expression as-is. The remaining expressions are multiplied together in pairs, then added. A very common usage case is the calculation of the score for a neural network layer (e.g. $$b + Wz$$) where b is the bias, W is the weight matrix, and z is the input. In this case xs[0] = b, xs[1] = W, and xs[2] = z.

Parameters: exprs (list) – A list containing an odd number of expressions An expression equal to: xs[0] + xs[1]*xs[2] + xs[3]*xs[4] + ... dynet.Expression
dynet.dot_product(x, y)

Dot Product

Calculate the dot product $$x^Ty=\sum_i x_iy_i$$

Parameters: x (dynet.Expression) – The first input expression y (dynet.Expression) – The second input expression $$x^Ty=\sum_i x_iy_i$$ dynet.Expression
dynet.inverse(x)

Matrix Inverse

Takes the inverse of a matrix (not implemented on GPU yet, although contributions are welcome: issue). Note that back-propagating through an inverted matrix can also be the source of stability problems sometimes.

Parameters: x (dynet.Expression) – Input expression Inverse of x dynet.Expression
dynet.trace_of_product(x, y)

Trace of Matrix Product

Takes the trace of the product of matrices. (not implemented on GPU yet, although contributions are welcome: issue).

Parameters: x (dynet.Expression) – The first input expression y (Expression) – The second input expression $$\text{Tr}(xy)$$ dynet.Expression
dynet.logdet(x)

Log determinant

Takes the log of the determinant of a matrix. (not implemented on GPU yet, although contributions are welcome: issue).

Parameters: x (dynet.Expression) – Input expression $$\log(\vert x\vert)$$ dynet.Expression

#### Convolution/Pooling operations¶

dynet.conv2d(x, f, stride, is_valid=True)

2D convolution without bias

2D convolution operator without bias parameters. VALID and SAME convolutions are supported.

Think about when stride is 1, the distinction:

• SAME: output size is the same with input size. To do so, one needs to pad the input so the filter can sweep outside of the input maps.
• VALID: output size shrinks by filter_size - 1, and the filters always sweep at valid positions inside the input maps. No padding needed.

In detail, assume

• Input feature maps: XH x XW x XC x N
• Filters: FH x FW x XC x FC
• Strides: strides[0] and strides[1] are row (h) and col (w) stride, respectively.

For the SAME convolution: the output height (YH) and width (YW) are computed as:

• YH = ceil(float(XH) / float(strides[0]))
• YW = ceil(float(XW) / float(strides[1]))

and the paddings are computed as:

• pad_along_height = max((YH - 1) * strides[0] + FH - XH, 0)
• pad_along_width = max((YW - 1) * strides[1] + FW - XW, 0)
• pad_top = pad_along_height / 2
• pad_bottom = pad_along_height - pad_top
• pad_left = pad_along_width / 2
• pad_right = pad_along_width - pad_left

For the VALID convolution: the output height (:codeYH) and width (YW) are computed as:

• YH = ceil(float(XH - FH + 1) / float(strides[0]))
• YW = ceil(float(XW - FW + 1) / float(strides[1]))

and the paddings are always zeros.

Parameters: Keyword Arguments: x (dynet.Expression) – The input feature maps: (H x W x Ci) x N (ColMaj), 3D tensor with an optional batch dimension f (dynet.Expression) – 2D convolution filters: H x W x Ci x Co (ColMaj), 4D tensor stride (list) – the row and column strides is_valid (bool) – ‘VALID’ convolution or ‘SAME’ convolution, default is True (‘VALID’) (default: (True)) The output feature maps (H x W x Co) x N, 3D tensor with an optional batch dimension dynet.Expression
dynet.conv2d_bias(x, f, b, stride, is_valid=True)

2D convolution with bias

2D convolution operator with bias parameters. VALID and SAME convolutions are supported.

Think about when stride is 1, the distinction:

• SAME: output size is the same with input size. To do so, one needs to pad the input so the filter can sweep outside of the input maps.
• VALID: output size shrinks by filter_size - 1, and the filters always sweep at valid positions inside the input maps. No padding needed.

In detail, assume

• Input feature maps: XH x XW x XC x N
• Filters: FH x FW x XC x FC
• Strides: strides[0] and strides[1] are row (h) and col (w) stride, respectively.

For the SAME convolution: the output height (YH) and width (YW) are computed as:

• YH = ceil(float(XH) / float(strides[0]))
• YW = ceil(float(XW) / float(strides[1]))

and the paddings are computed as:

• pad_along_height = max((YH - 1) * strides[0] + FH - XH, 0)
• pad_along_width = max((YW - 1) * strides[1] + FW - XW, 0)
• pad_top = pad_along_height / 2
• pad_bottom = pad_along_height - pad_top
• pad_left = pad_along_width / 2
• pad_right = pad_along_width - pad_left

For the VALID convolution: the output height (:codeYH) and width (YW) are computed as:

• YH = ceil(float(XH - FH + 1) / float(strides[0]))
• YW = ceil(float(XW - FW + 1) / float(strides[1]))

and the paddings are always zeros.

Parameters: Keyword Arguments: x (dynet.Expression) – The input feature maps: (H x W x Ci) x N (ColMaj), 3D tensor with an optional batch dimension f (dynet.Expression) – 2D convolution filters: H x W x Ci x Co (ColMaj), 4D tensor b (dynet.Expression) – The bias (1D: Ci) stride (list) – the row and column strides is_valid (bool) – ‘VALID’ convolution or ‘SAME’ convolution, default is True (‘VALID’) (default: (True)) The output feature maps (H x W x Co) x N, 3D tensor with an optional batch dimension dynet.Expression
dynet.filter1d_narrow(x, y)

[summary]

[description]

Parameters: x (dynet.Expression) – The first input expression y (dynet.Expression) – The second input expression TODO dynet.Expression
dynet.kmax_pooling(x, k, d=1)

Kmax-pooling operation

Select out k maximum values along a given dimension, in the same order as they appear. This will result in the size of the given dimension being changed to k.

Parameters: Keyword Arguments: x (dynet.Expression) – k (unsigned) – Number of maximum values to retrieve along the given dimension d (unsigned) – Dimension on which to perform kmax-pooling (default: (1)) dynet.Expression

#### Tensor operations¶

dynet.contract3d_1d(x, y)

Contracts a rank 3 tensor and a rank 1 tensor into a rank 2 tensor

The resulting tensor $$z$$ has coordinates $$z_ij = \sum_k x_{ijk} y_k$$

Parameters: x (dynet.Expression) – Rank 3 tensor y (dynet.Expression) – Vector Matrix dynet.Expression
dynet.contract3d_1d_bias(x, y, b)

Same as contract3d_1d with an additional bias parameter

The resulting tensor $$z$$ has coordinates $$z_{ij} = b_{ij}+\sum_k x_{ijk} y_k$$

Parameters: x (dynet.Expression) – Rank 3 tensor y (dynet.Expression) – Vector b (dynet.Expression) – Bias vector Matrix dynet.Expression
dynet.contract3d_1d_1d(x, y, z)

Contracts a rank 3 tensor and two rank 1 tensor into a rank 1 tensor

This is the equivalent of calling contract3d_1d and then performing a matrix vector multiplication.

The resulting tensor $$t$$ has coordinates $$t_i = \sum_{j,k} x_{ijk} y_k z_j$$

Parameters: x (dynet.Expression) – Rank 3 tensor y (dynet.Expression) – Vector z (dynet.Expression) – Vector Vector dynet.Expression
dynet.contract3d_1d_1d_bias(x, y, z, b)

Same as contract3d_1d_1d with an additional bias parameter

This is the equivalent of calling contract3d_1d and then performing an affine transform.

The resulting tensor $$t$$ has coordinates $$t_i = b_i + \sum_{j,k} x_{ijk} y_k z_j$$

Parameters: x (dynet.Expression) – Rank 3 tensor y (dynet.Expression) – Vector z (dynet.Expression) – Vector b (dynet.Expression) – Bias vector Vector dynet.Expression

#### Normalization operations¶

dynet.layer_norm(x, g, b)

Layer normalization

Performs layer normalization :

$\begin{split}\begin{split} \mu &= \frac 1 n \sum_{i=1}^n x_i\\ \sigma &= \sqrt{\frac 1 n \sum_{i=1}^n (x_i-\mu)^2}\\ y&=\frac {\boldsymbol{g}} \sigma \circ (\boldsymbol{x}-\mu) + \boldsymbol{b}\\ \end{split}\end{split}$

Reference : Ba et al., 2016

Parameters: x (dynet.Expression) – Input expression (possibly batched) g (dynet.Expression) – Gain (same dimension as x, no batch dimension) b (dynet.Expression) – Bias (same dimension as x, no batch dimension) An expression of the same dimension as x dynet.Expression

## Recurrent Neural Networks¶

### RNN Builders¶

class dynet._RNNBuilder
disable_dropout()

[summary]

[description]

initial_state(vecs=None)

Get a dynet.RNNState

This initializes a dynet.RNNState by loading the parameters in the computation graph

Parameters: vecs (list) – Initial hidden state for each layer as a list of dynet.Expression s (default: {None}) dynet.RNNState used to feed inputs/transduces sequences, etc... dynet.RNNState
initial_state_from_raw_vectors(vecs=None)

Get a dynet.RNNState

This initializes a dynet.RNNState by loading the parameters in the computation graph

Use this if you want to initialize the hidden states with values directly rather than expressions.

Parameters: vecs (list) – Initial hidden state for each layer as a list of numpy arrays (default: {None}) dynet.RNNState used to feed inputs/transduces sequences, etc... dynet.RNNState
set_dropout(f)

[summary]

[description]

Parameters: f (float) – [description]
class dynet.SimpleRNNBuilder

[summary]

[description]

get_parameter_expressions()

Retrieve the internal parameters expressions of the RNN

The output is a list with one item per layer. Each item is a list containing $$W_{hx},W_{hh},b_h$$

Returns: List of parameter expressions for each layer list ValueError – This raises an expression if initial_state hasn’t been called because it requires thr parameters to be loaded in the computation graph. However it prevents the parameters to be loaded twice in the computation graph (compared to dynet.parameter(rnn.get_parameters()[0][0]) for example).
get_parameters()

Retrieve the internal parameters of the RNN

The output is a list with one item per layer. Each item is a list containing $$W_{hx},W_{hh},b_h$$

Returns: List of parameters for each layer list
class dynet.GRUBuilder

[summary]

[description]

get_parameter_expressions()

Retrieve the internal parameters expressions of the GRU

The output is a list with one item per layer. Each item is a list containing $$W_{zx},W_{zh},b_z,W_{rx},W_{rh},b_r,W_{hx},W_{hh},b_h$$

Returns: List of parameter expressions for each layer list ValueError – This raises an expression if initial_state hasn’t been called because it requires thr parameters to be loaded in the computation graph. However it prevents the parameters to be loaded twice in the computation graph (compared to dynet.parameter(rnn.get_parameters()[0][0]) for example).
get_parameters()

Retrieve the internal parameters of the GRU

The output is a list with one item per layer. Each item is a list containing $$W_{zx},W_{zh},b_z,W_{rx},W_{rh},b_r,W_{hx},W_{hh},b_h$$

Returns: List of parameters for each layer list
class dynet.LSTMBuilder

[summary]

[description]

get_parameter_expressions()

Retrieve the internal parameters expressions of the LSTM

The output is a list with one item per layer. Each item is a list containing $$W_{ix},W_{ih},W_{ic},b_i,W_{ox},W_{oh},W_{oc},b_o,W_{cx},W_{ch},b_c$$

Returns: List of parameter expressions for each layer list ValueError – This raises an expression if initial_state hasn’t been called because it requires thr parameters to be loaded in the computation graph. However it prevents the parameters to be loaded twice in the computation graph (compared to dynet.parameter(rnn.get_parameters()[0][0]) for example).
get_parameters()

Retrieve the internal parameters of the LSTM

The output is a list with one item per layer. Each item is a list containing $$W_{ix},W_{ih},W_{ic},b_i,W_{ox},W_{oh},W_{oc},b_o,W_{cx},W_{ch},b_c$$

Returns: List of parameters for each layer list
class dynet.VanillaLSTMBuilder

VanillaLSTM allows to create an “standard” LSTM, ie with decoupled input and forget gate and no peepholes connections

This cell runs according to the following dynamics :

$\begin{split}\begin{split} i_t & =\sigma(W_{ix}x_t+W_{ih}h_{t-1}+b_i)\\ f_t & = \sigma(W_{fx}x_t+W_{fh}h_{t-1}+b_f+1)\\ o_t & = \sigma(W_{ox}x_t+W_{oh}h_{t-1}+b_o)\\ \tilde{c_t} & = \tanh(W_{cx}x_t+W_{ch}h_{t-1}+b_c)\\ c_t & = c_{t-1}\circ f_t + \tilde{c_t}\circ i_t\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split}\end{split}$
Parameters: layers (int) – Number of layers input_dim (int) – Dimension of the input hidden_dim (int) – Dimension of the recurrent units model (dynet.Model) – Model to hold the parameters ln_lstm (bool) – Whether to use layer normalization
get_parameter_expressions()

Retrieve the internal parameters expressions of the VanillaLSTM

The output is a list with one item per layer. Each item is a list containing $$W_x,W_h,b$$ where $$W_x,W_h$$ are stacked version of the individual gates matrices:

Returns: List of parameter expressions for each layer list ValueError – This raises an expression if initial_state hasn’t been called because it requires thr parameters to be loaded in the computation graph. However it prevents the parameters to be loaded twice in the computation graph (compared to dynet.parameter(rnn.get_parameters()[0][0]) for example).
get_parameters()

Retrieve the internal parameters of the VanillaLSTM

The output is a list with one item per layer. Each item is a list containing $$W_x,W_h,b$$ where $$W_x,W_h$$ are stacked version of the individual gates matrices:

Returns: List of parameters for each layer list
set_dropout_masks(batch_size=1)

Set dropout masks at the beginning of a sequence for a specific batch size

If this function is not called on batched input, the same mask will be applied across all batch elements. Use this to apply different masks to each batch element

You need to call this __AFTER__ calling initial_state

Parameters: batch_size (int) – Batch size (default: {1})
set_dropouts(d, d_r)

Set the dropout rates

The dropout implemented here is the variational dropout with tied weights introduced in Gal, 2016

More specifically, dropout masks $$\mathbf{z_x}\sim \text(1-d_x)$$, $$\mathbf{z_h}\sim \text{Bernoulli}(1-d_h)$$ are sampled at the start of each sequence.

The dynamics of the cell are then modified to :

$\begin{split}\begin{split} i_t & =\sigma(W_{ix}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{ih}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_i)\\ f_t & = \sigma(W_{fx}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{fh}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_f)\\ o_t & = \sigma(W_{ox}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{oh}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_o)\\ \tilde{c_t} & = anh(W_{cx}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{ch}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_c)\\ c_t & = c_{t-1}\circ f_t + \tilde{c_t}\circ i_t\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split}\end{split}$

For more detail as to why scaling is applied, see the “Unorthodox” section of the documentation

Parameters: d (number) – Dropout rate $$d_x$$ for the input $$x_t$$ d_r (number) – Dropout rate $$d_x$$ for the output $$h_t$$
class dynet.FastLSTMBuilder

[summary]

[description]

get_parameter_expressions()

Retrieve the internal parameters expressions of the FastLSTM

The output is a list with one item per layer. Each item is a list containing $$W_{ix},W_{ih},W_{ic},b_i,W_{ox},W_{oh},W_{oc},b_o,W_{cx},W_{ch},b_c$$

Returns: List of parameter expressions for each layer list ValueError – This raises an expression if initial_state hasn’t been called because it requires thr parameters to be loaded in the computation graph. However it prevents the parameters to be loaded twice in the computation graph (compared to dynet.parameter(rnn.get_parameters()[0][0]) for example).
get_parameters()

Retrieve the internal parameters of the FastLSTM

The output is a list with one item per layer. Each item is a list containing $$W_{ix},W_{ih},W_{ic},b_i,W_{ox},W_{oh},W_{oc},b_o,W_{cx},W_{ch},b_c$$

Returns: List of parameters for each layer list
class dynet.BiRNNBuilder(num_layers, input_dim, hidden_dim, model, rnn_builder_factory, builder_layers=None)

Bases: object

Builder for BiRNNs that delegates to regular RNNs and wires them together.

builder = BiRNNBuilder(1, 128, 100, model, LSTMBuilder) [o1,o2,o3] = builder.transduce([i1,i2,i3])
add_inputs(es)

returns the list of state pairs (stateF, stateB) obtained by adding inputs to both forward (stateF) and backward (stateB) RNNs. :param es: a list of Expression :type es: list

code:.transduce(xs) is different from .add_inputs(xs) in the following way:

• code:.add_inputs(xs) returns a list of RNNState pairs. RNNState objects can be
queried in various ways. In particular, they allow access to the previous state, as well as to the state-vectors (h() and s() )
• .transduce(xs) returns a list of Expression. These are just the output
expressions. For many cases, this suffices. transduce is much more memory efficient than add_inputs.
transduce(es)

returns the list of output Expressions obtained by adding the given inputs to the current state, one by one, to both the forward and backward RNNs, and concatenating.

@param es: a list of Expression

.transduce(xs) is different from .add_inputs(xs) in the following way:

.add_inputs(xs) returns a list of RNNState pairs. RNNState objects can be
queried in various ways. In particular, they allow access to the previous state, as well as to the state-vectors (h() and s() )
.transduce(xs) returns a list of Expression. These are just the output
expressions. For many cases, this suffices. transduce is much more memory efficient than add_inputs.

### RNN state¶

class dynet.RNNState

This is the main class for working with RNNs / LSTMs / GRUs. Request an RNNState initial_state() from a builder, and then progress from there.

add_input(x)

This computes $$h_t = \text{RNN}(x_t)$$

Parameters: x (dynet.Expression) – Input expression New RNNState dynet.RNNState
add_inputs(xs)

Returns the list of states obtained by adding the given inputs to the current state, one by one.

see also transduce(xs)

.transduce(xs) is different from .add_inputs(xs) in the following way:

• .add_inputs(xs) returns a list of RNNState. RNNState objects can be
queried in various ways. In particular, they allow access to the previous state, as well as to the state-vectors (h() and s() )
• .transduce(xs) returns a list of Expression. These are just the output
expressions. For many cases, this suffices.

transduce is much more memory efficient than add_inputs.

Parameters: xs (list) – list of input expressions New RNNState dynet.RNNState
b()

Get the underlying RNNBuilder

In case you need to set dropout or other stuff.

Returns: Underlying RNNBuilder dynet.RNNBuilder
h()

tuple of expressions representing the output of each hidden layer of the current step. the actual output of the network is at h()[-1].

prev()

Gets previous RNNState

In case you need to rewind

s()

tuple of expressions representing the hidden state of the current step.

For SimpleRNN, s() is the same as h() For LSTM, s() is a series of of memory vectors, followed the series followed by the series returned by h().

set_h(es=None)

Manually set the output $$h_t$$

Parameters: es (list) – List of expressions, one for each layer (default: {None}) New RNNState dynet.RNNState
set_s(es=None)

Manually set the hidden states

This is different from set_h because, for LSTMs for instance this also sets the cell state. The format is [new_c[0],...,new_c[n],new_h[0],...,new_h[n]]

Parameters: es (list) – List of expressions, in this format : [new_c[0],...,new_c[n],new_h[0],...,new_h[n]] (default: {None}) New RNNState dynet.RNNState
transduce(xs)

returns the list of output Expressions obtained by adding the given inputs to the current state, one by one.

see also add_inputs(xs)

.transduce(xs) is different from .add_inputs(xs) in the following way:

• .add_inputs(xs) returns a list of RNNState. RNNState objects can be
queried in various ways. In particular, they allow access to the previous state, as well as to the state-vectors (h() and s() )
• .transduce(xs) returns a list of Expression. These are just the output
expressions. For many cases, this suffices.

transduce is much more memory efficient than add_inputs.

Parameters: xs (list) – list of input expressions New RNNState dynet.RNNState
class dynet.StackedRNNState(states, prev=None)
add_inputs(xs)

returns the list of states obtained by adding the given inputs to the current state, one by one.

## Optimizers¶

class dynet.Trainer

Generic trainer

get_clip_threshold()

Get clipping threshold

set_clip_threshold(thr)

Set clipping thershold

To deactivate clipping, set the threshold to be <=0

Parameters: thr (number) – Clipping threshold
set_sparse_updates(su)

DyNet trainers support two types of updates for lookup parameters, sparse and dense. Sparse updates are the default. They have the potential to be faster, as they only touch the parameters that have non-zero gradients. However, they may not always be faster (particulary on GPU with mini-batch training), and are not precisely numerically correct for some update rules such as MomentumTrainer and AdamTrainer. Thus, if you set this variable to false, the trainer will perform dense updates and be precisely correct, and maybe faster sometimes. :param su: flag to activate/deactivate sparse updates :type su: bool

status()

Outputs information about the trainer in the stderr

(number of updates since last call, number of clipped gradients, learning rate, etc...)

update(s=1.0)

Update the parameters

The update equation is different for each trainer, check the online c++ documentation for more details on what each trainer does

Keyword Arguments:
s (number) – Optional scaling factor to apply on the gradient. (default: 1.0)
update_epoch(r=1.0)

Update trainers hyper-parameters that depend on epochs

Basically learning rate decay.

Keyword Arguments:
r (number) – Number of epoch that passed (default: 1.0)
update_subset(updated_params, updated_lookups, s=1.0)

Update a subset of parameters

Only use this in last resort, a more elegant way to update only a subset of parameters is to use the “update” keyword in dy.parameter or Parameter.expr() to specify which parameters need to be updated __during the creation of the computation graph__

Parameters: Keyword Arguments: updated_params (list) – Indices of parameters to update updated_lookups (list) – Indices of lookup parameters to update s (number) – Optional scaling factor to apply on the gradient. (default: 1.0)
class dynet.SimpleSGDTrainer

Bases: dynet.Trainer

This trainer performs stochastic gradient descent, the goto optimization procedure for neural networks.

Parameters: Keyword Arguments: m (dynet.Model) – Model to be trained e0 (number) – Initial learning rate (default: 0.1) edecay (number) – Learning rate decay parameter (default: 0.0)
class dynet.CyclicalSGDTrainer

Bases: dynet.Trainer

This trainer performs stochastic gradient descent with a cyclical learning rate as proposed in Smith, 2015.

This uses a triangular function with optional exponential decay.

More specifically, at each update, the learning rate $$\eta$$ is updated according to :

$\begin{split} \begin{split} \text{cycle} &= \left\lfloor 1 + \frac{\texttt{it}}{2 \times\texttt{step_size}} \right\rfloor\\ x &= \left\vert \frac{\texttt{it}}{\texttt{step_size}} - 2 \times \text{cycle} + 1\right\vert\\ \eta &= \eta_{\text{min}} + (\eta_{\text{max}} - \eta_{\text{min}}) \times \max(0, 1 - x) \times \gamma^{\texttt{it}}\\ \end{split}\end{split}$
Parameters: Keyword Arguments: m (dynet.Model) – Model to be trained e0_min (number) – Lower learning rate (default: {0.01}) e0_max (number) – Upper learning rate (default: {0.1}) step_size (number) – Period of the triangular function in number of iterations (__not__ epochs). According to the original paper, this should be set around (2-8) x (training iterations in epoch) (default: {2000}) gamma (number) – Learning rate upper bound decay parameter (default: {0.0}) edecay (number) – Learning rate decay parameter. Ideally you shouldn’t use this with cyclical learning rate since decay is already handled by $$\gamma$$ (default: {0.0})
class dynet.MomentumSGDTrainer

Bases: dynet.Trainer

This is a modified version of the SGD algorithm with momentum to stablize the gradient trajectory.

Parameters: Keyword Arguments: m (dynet.Model) – Model to be trained e0 (number) – Initial learning rate (default: 0.1) mom (number) – Momentum (default: 0.9) edecay (number) – Learning rate decay parameter (default: 0.0)
class dynet.AdagradTrainer

Bases: dynet.Trainer

Parameters: Keyword Arguments: m (dynet.Model) – Model to be trained e0 (number) – Initial learning rate (default: 0.1) eps (number) – Epsilon parameter to prevent numerical instability (default: 1e-20) edecay (number) – Learning rate decay parameter (default: 0.0)
class dynet.AdadeltaTrainer

Bases: dynet.Trainer

Parameters: Keyword Arguments: m (dynet.Model) – Model to be trained eps (number) – Epsilon parameter to prevent numerical instability (default: 1e-6) rho (number) – Update parameter for the moving average of updates in the numerator (default: 0.95) edecay (number) – Learning rate decay parameter (default: 0.0)
class dynet.RMSPropTrainer

Bases: dynet.Trainer

RMSProp optimizer

The RMSProp optimizer is a variant of Adagrad where the squared sum of previous gradients is replaced with a moving average with parameter rho.

Parameters: Keyword Arguments: m (dynet.Model) – Model to be trained e0 (number) – Initial learning rate (default: 0.001) eps (number) – Epsilon parameter to prevent numerical instability (default: 1e-8) rho (number) – Update parameter for the moving average (rho = 0 is equivalent to using Adagrad) (default: 0.9) edecay (number) – Learning rate decay parameter (default: 0.0)
class dynet.AdamTrainer

Bases: dynet.Trainer