Operations¶
Operation Interface¶
The following functions define DyNet “Expressions,” which are used as an interface to the various functions that can be used to build DyNet computation graphs. Expressions for each specific function are listed below.
-
struct
Expression
¶ - #include <expr.h>
Expressions are the building block of a Dynet computation graph.
[long description]
Public Functions
-
Expression
(ComputationGraph *pg, VariableIndex i)¶ Base expression constructor.
Used when creating operations
- Parameters
pg
: Pointer to the computation graphi
: Variable index
-
const Tensor &
value
() const¶ Get value of the expression.
Throws a tuntime_error exception if no computation graph is available
- Return
- Value of the expression as a tensor
-
const Tensor &
gradient
() const¶ Get gradient of the expression.
Throws a tuntime_error exception if no computation graph is available
Make sure to call
backward
on a downstream expression before calling this.If the expression is a constant expression (meaning it’s not a function of a parameter), dynet won’t compute it’s gradient for the sake of efficiency. You need to manually force the gradient computation by adding the agument
full=true
tobackward
- Return
- Value of the expression as a tensor
-
Input Operations¶
These operations allow you to input something into the computation graph, either simple scalar/vector/matrix inputs from floats, or parameter inputs from a DyNet parameter object. They all requre passing a computation graph as input so you know which graph is being used for this particular calculation.
-
Expression
dynet
::
input
(ComputationGraph &g, real s, Device *device = dynet::default_device)¶ Scalar input.
Create an expression that represents the scalar value s
- Return
- An expression representing s
- Parameters
g
: Computation graphs
: Real numberdevice
: The place device for the input value, default_device by default
-
Expression
dynet
::
input
(ComputationGraph &g, const real *ps, Device *device = dynet::default_device)¶ Modifiable scalar input.
Create an expression that represents the scalar value *ps. If *ps is changed and the computation graph recalculated, the next forward pass will reflect the new value.
- Return
- An expression representing *ps
- Parameters
g
: Computation graphps
: Real number pointerdevice
: The place device for the input value, default_device by default
-
Expression
dynet
::
input
(ComputationGraph &g, const Dim &d, const std::vector<float> &data, Device *device = dynet::default_device)¶ Vector/matrix/tensor input.
Create an expression that represents a vector, matrix, or tensor input. The dimensions of the input are defined by
d
. So for example >input(g,{50},data)
: will result in a 50-length vector >input(g,{50,30},data)
: will result in a 50x30 matrix and so on, for an arbitrary number of dimensions. This function can also be used to import minibatched inputs. For example, if we have 10 examples in a minibatch, each with size 50x30, then we call >input(g,Dim({50,30},10),data)
The data vector “data” will contain the values used to fill the input, in column-major format. The length must add to the product of all dimensions in d.- Return
- An expression representing data
- Parameters
g
: Computation graphd
: Dimension of the input matrixdata
: A vector of data pointsdevice
: The place device for the input value, default_device by default
-
Expression
dynet
::
input
(ComputationGraph &g, const Dim &d, const std::vector<float> *pdata, Device *device = dynet::default_device)¶ Updatable vector/matrix/tensor input.
Similarly to input that takes a vector reference, input a vector, matrix, or tensor input. Because we pass the pointer, the data can be updated.
- Return
- An expression representing *pdata
- Parameters
g
: Computation graphd
: Dimension of the input matrixpdata
: A pointer to an (updatable) vector of data pointsdevice
: The place device for the input value, default_device by default
-
Expression
dynet
::
input
(ComputationGraph &g, const Dim &d, const std::vector<unsigned int> &ids, const std::vector<float> &data, float defdata = 0.f, Device *device = dynet::default_device)¶ Sparse vector input.
This operation takes input as a sparse matrix of index/value pairs. It is exactly the same as the standard input via vector reference, but sets all non-specified values to “defdata” and resets all others to the appropriate input values.
- Return
- An expression representing data
- Parameters
g
: Computation graphd
: Dimension of the input matrixids
: The indexes of the data points to updatedata
: The data points corresponding to each indexdefdata
: The default data with which to set the unspecified data pointsdevice
: The place device for the input value, default_device by default
-
Expression
dynet
::
one_hot
(ComputationGraph &g, unsigned int d, unsigned int idx, Device *device = dynet::default_device)¶ One hot vector.
This operation creates a one hot vector.
- Return
- An expression representing data
- Parameters
g
: Computation graphd
: Dimension of the input vectoridx
: The index we want to set to 1device
: The place device for the input value, default_device by default
-
Expression
dynet
::
one_hot
(ComputationGraph &g, unsigned int d, const std::vector<unsigned int> &ids, Device *device = dynet::default_device)¶ Batched one hot vectors.
This operation creates a batch of one hot vectors. The length of
ids
determines the batch size- Return
- An expression representing data
- Parameters
g
: Computation graphd
: Dimension of the input vectorids
: The indices we want to set to 1, one per batch elementdevice
: The place device for the input value, default_device by default
-
Expression
dynet
::
parameter
(ComputationGraph &g, Parameter p)¶ Load parameter.
Load parameters into the computation graph.
- Return
- An expression representing p
- Parameters
g
: Computation graphp
: Parameter object to load
-
Expression
dynet
::
parameter
(ComputationGraph &g, LookupParameter lp)¶ Load lookup parameter.
Load a full tensor of lookup parameters into the computation graph. Normally lookup parameters are accessed by using the lookup() function to grab a single element. However, in some cases we’ll want to access all of the parameters in the entire set of lookup parameters for some reason. In this case you can use this function. In this case, the first dimensions in the returned tensor will be equivalent to the dimensions that we would get if we get calling the lookup() function, and the size of the final dimension will be equal to the size of the vocabulary.
- Return
- An expression representing lp
- Parameters
g
: Computation graphlp
: LookupParameter object to load
-
Expression
dynet
::
const_parameter
(ComputationGraph &g, Parameter p)¶ Load constant parameters.
Load parameters into the computation graph, but prevent them from being updated when performing parameter update.
- Return
- An expression representing the constant p
- Parameters
g
: Computation graphp
: Parameter object to load
-
Expression
dynet
::
const_parameter
(ComputationGraph &g, LookupParameter lp)¶ Load constant lookup parameters.
Load lookup parameters into the computation graph, but prevent them from being updated when performing parameter update.
- Return
- An expression representing the constant lp
- Parameters
g
: Computation graphlp
: LookupParameter object to load
-
Expression
dynet
::
lookup
(ComputationGraph &g, LookupParameter p, unsigned index)¶ Look up parameter.
Look up parameters according to an index, and load them into the computation graph.
- Return
- An expression representing p[index]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadindex
: Index of the parameters within p
-
Expression
dynet
::
lookup
(ComputationGraph &g, LookupParameter p, const unsigned *pindex)¶ Look up parameters with modifiable index.
Look up parameters according to the *pindex, and load them into the computation graph. When *pindex changes, on the next computation of forward() the values will change.
- Return
- An expression representing p[*pindex]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadpindex
: Pointer index of the parameters within p
-
Expression
dynet
::
const_lookup
(ComputationGraph &g, LookupParameter p, unsigned index)¶ Look up parameter.
Look up parameters according to an index, and load them into the computation graph. Do not perform gradient update on the parameters.
- Return
- A constant expression representing p[index]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadindex
: Index of the parameters within p
-
Expression
dynet
::
const_lookup
(ComputationGraph &g, LookupParameter p, const unsigned *pindex)¶ Constant lookup parameters with modifiable index.
Look up parameters according to the *pindex, and load them into the computation graph. When *pindex changes, on the next computation of forward() the values will change. However, gradient updates will not be performend.
- Return
- A constant expression representing p[*pindex]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadpindex
: Pointer index of the parameters within p
-
Expression
dynet
::
lookup
(ComputationGraph &g, LookupParameter p, const std::vector<unsigned> &indices)¶ Look up parameters.
The mini-batched version of lookup. The resulting expression will be a mini-batch of parameters, where the “i”th element of the batch corresponds to the parameters at the position specified by the “i”th element of “indices”
- Return
- An expression with the “i”th batch element representing p[indices[i]]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadindices
: Index of the parameters at each position in the batch
-
Expression
dynet
::
lookup
(ComputationGraph &g, LookupParameter p, const std::vector<unsigned> *pindices)¶ Look up parameters.
The mini-batched version of lookup with modifiable parameter indices.
- Return
- An expression with the “i”th batch element representing p[*pindices[i]]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadpindices
: Pointer to lookup indices
-
Expression
dynet
::
const_lookup
(ComputationGraph &g, LookupParameter p, const std::vector<unsigned> &indices)¶ Look up parameters.
Mini-batched lookup that will not update the parameters.
- Return
- A constant expression with the “i”th batch element representing p[indices[i]]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadindices
: Lookup indices
-
Expression
dynet
::
const_lookup
(ComputationGraph &g, LookupParameter p, const std::vector<unsigned> *pindices)¶ Look up parameters.
Mini-batched lookup that will not update the parameters, with modifiable indices.
- Return
- A constant expression with the “i”th batch element representing p[*pindices[i]]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadpindices
: Lookup index pointers.
-
Expression
dynet
::
zeros
(ComputationGraph &g, const Dim &d)¶ Create an input full of zeros.
Create an input full of zeros, sized according to dimensions d.
- Return
- A
d
dimensioned zero tensor - Parameters
g
: Computation graphd
: The dimensions of the input
-
Expression
dynet
::
ones
(ComputationGraph &g, const Dim &d)¶ Create an input full of ones.
Create an input full of ones, sized according to dimensions d.
- Return
- A
d
dimensioned tensor of ones - Parameters
g
: Computation graphd
: The dimensions of the input
-
Expression
dynet
::
constant
(ComputationGraph &g, const Dim &d, float val)¶ Create an input with one constant value.
Create an input full of
val
, sized according to dimensions d.- Return
- A
d
dimensioned tensor filled with valueval
- Parameters
g
: Computation graphd
: The dimensions of the inputval
: The value of the input
-
Expression
dynet
::
random_normal
(ComputationGraph &g, const Dim &d, float mean = 0.f, float stddev = 1.0)¶ Create a random normal vector.
Create a vector distributed according to normal distribution with specified mean and standard deviation.
- Return
- A “d” dimensioned normally distributed vector
- Parameters
g
: Computation graphd
: The dimensions of the inputmean
: The mean of the distribution (default: 0.0)stddev
: The standard deviation of the distribution (default: 1.0)
-
Expression
dynet
::
random_bernoulli
(ComputationGraph &g, const Dim &d, real p, real scale = 1.0f)¶ Create a random bernoulli vector.
Create a vector distributed according to bernoulli distribution with parameter p.
- Return
- A “d” dimensioned bernoulli distributed vector
- Parameters
g
: Computation graphd
: The dimensions of the inputp
: The bernoulli p parameterscale
: A scaling factor for the output (“active” elements will receive this value)
-
Expression
dynet
::
random_uniform
(ComputationGraph &g, const Dim &d, real left, real right)¶ Create a random uniform vector.
Create a vector distributed according to uniform distribution with boundaries left and right.
- Return
- A “d” dimensioned uniform distributed vector
- Parameters
g
: Computation graphd
: The dimensions of the inputleft
: The left boundaryright
: The right boundary
-
Expression
dynet
::
random_gumbel
(ComputationGraph &g, const Dim &d, real mu = 0.0, real beta = 1.0)¶ Create a random Gumbel sampled vector.
Create a vector distributed according to a Gumbel distribution with the specified parameters. (Currently only the defaults of mu=0.0 and beta=1.0 supported.
- Return
- A “d” dimensioned Gumbel distributed vector
- Parameters
g
: Computation graphd
: The dimensions of the inputmu
: The mu parameterbeta
: The beta parameter
Arithmetic Operations¶
These operations perform basic arithemetic over values in the graph.
-
Expression
dynet
::
operator-
(const Expression &x)¶ Negation.
Negate the passed argument.
- Return
- The negation of x
- Parameters
x
: An input expression
-
Expression
dynet
::
operator+
(const Expression &x, const Expression &y)¶ Expression addition.
Add two expressions of the same dimensions.
- Return
- The sum of x and y
- Parameters
x
: The first inputy
: The second input
-
Expression
dynet
::
operator+
(const Expression &x, real y)¶ Scalar addition.
Add a scalar to an expression
- Return
- An expression equal to x, with every component increased by y
- Parameters
x
: The expressiony
: The scalar
-
Expression
dynet
::
operator+
(real x, const Expression &y)¶ Scalar addition.
Add a scalar to an expression
- Return
- An expression equal to y, with every component increased by x
- Parameters
x
: The scalary
: The expression
-
Expression
dynet
::
operator-
(const Expression &x, const Expression &y)¶ Expression subtraction.
Subtract one expression from another.
- Return
- An expression where the ith element is x_i minus y_i
- Parameters
x
: The expression from which to subtracty
: The expression to subtract
-
Expression
dynet
::
operator-
(real x, const Expression &y)¶ Scalar subtraction.
Subtract an expression from a scalar
- Return
- An expression where the ith element is x_i minus y
- Parameters
x
: The scalar from which to subtracty
: The expression to subtract
-
Expression
dynet
::
operator-
(const Expression &x, real y)¶ Scalar subtraction.
Subtract a scalar from an expression
- Return
- An expression where the ith element is x_i minus y
- Parameters
x
: The expression from which to subtracty
: The scalar to subtract
-
Expression
dynet
::
operator*
(const Expression &x, const Expression &y)¶ Matrix multiplication.
Multiply two matrices together. Like standard matrix multiplication, the second dimension of x and the first dimension of y must match.
- Return
- An expression x times y
- Parameters
x
: The left-hand matrixy
: The right-hand matrix
-
Expression
dynet
::
operator*
(const Expression &x, float y)¶ Matrix-scalar multiplication.
Multiply an expression component-wise by a scalar.
- Return
- An expression where the ith element is x_i times y
- Parameters
x
: The matrixy
: The scalar
-
Expression
dynet
::
operator*
(float y, const Expression &x)¶ Matrix-scalar multiplication.
Multiply an expression component-wise by a scalar.
- Return
- An expression where the ith element is x_i times y
- Parameters
x
: The scalary
: The matrix
-
Expression
dynet
::
operator/
(const Expression &x, float y)¶ Matrix-scalar division.
Divide an expression component-wise by a scalar.
- Return
- An expression where the ith element is x_i divided by y
- Parameters
x
: The matrixy
: The scalar
-
Expression
dynet
::
operator/
(const Expression &x, const Expression &y)¶ Matrix division.
Divide an expression by another.
- Return
- An expression where the ith element is x_i divided by y_i
- Parameters
x
: The left-hand matrixy
: The right-hand matrix
-
Expression
dynet
::
affine_transform
(const std::initializer_list<Expression> &xs)¶ Affine transform.
This performs an affine transform over an arbitrary (odd) number of expressions held in the input initializer list xs. The first expression is the “bias,” which is added to the expression as-is. The remaining expressions are multiplied together in pairs, then added. A very common usage case is the calculation of the score for a neural network layer (e.g. b + Wz) where b is the bias, W is the weight matrix, and z is the input. In this case xs[0] = b, xs[1] = W, and xs[2] = z.
- Return
- An expression equal to: xs[0] + xs[1]*xs[2] + xs[3]*xs[4] + …
- Parameters
xs
: An initializer list containing an odd number of expressions
-
Expression
dynet
::
sum
(const std::initializer_list<Expression> &xs)¶ Sum.
This performs an elementwise sum over all the expressions in xs
- Return
- An expression where the ith element is equal to xs[0][i] + xs[1][i] + …
- Parameters
xs
: An initializer list containing expressions
-
Expression
dynet
::
sum_elems
(const Expression &x)¶ Sum all elements.
Sum all the elements in an expression.
- Return
- The sum of all of its elements
- Parameters
x
: The input expression
-
Expression
dynet
::
moment_elems
(const Expression &x, unsigned r)¶ Compute moment over all elements.
Compute the moment of order \(r\), \(\frac 1 n\sum_{i=1}^nx_i^r\) over all the elements in each batch of the expression
- Return
- A scalar expression (with a potential batch dimension)
- Parameters
x
: The input mini-batched expressionr
: Order of the moment
-
Expression
dynet
::
mean_elems
(const Expression &x)¶ Compute mean over all elements.
Computes \(\frac 1 n\sum_{i=1}^nx_i\) over all the elements in each batch of the expression
- Return
- A scalar expression (with a potential batch dimension)
- Parameters
x
: The input mini-batched expression
-
Expression
dynet
::
std_elems
(const Expression &x)¶ Compute Standard deviation over all elements.
Computes \(\frac 1 n\sum_{i=1}^n(x_i -\mu)^2\) where \(\mu=\frac 1 n\sum_{i=1}^nx_i\) over all the elements in each batch of the expression
- Return
- A scalar expression (with a potential batch dimension)
- Parameters
x
: The input mini-batched expression
-
Expression
dynet
::
sum_batches
(const Expression &x)¶ Sum over minibatches.
Sum an expression that consists of multiple minibatches into one of equal dimension but with only a single minibatch. This is useful for summing loss functions at the end of minibatch training.
- Return
- An expression with a single batch
- Parameters
x
: The input mini-batched expression
-
Expression
dynet
::
moment_batches
(const Expression &x, unsigned r)¶ Compute moment over minibatches.
Compute the moment of order \(r\), \(\frac 1 n\sum_{i=1}^nx_i^r\) along the batch dimension
- Return
- An expression with a single batch
- Parameters
x
: The input mini-batched expressionr
: Order of the moment
-
Expression
dynet
::
mean_batches
(const Expression &x)¶ Compute mean over minibatches.
Computes \(\frac 1 n\sum_{i=1}^nx_i\) along the batch dimension
- Return
- An expression with a single batch
- Parameters
x
: The input mini-batched expression
-
Expression
dynet
::
std_batches
(const Expression &x)¶ Compute standard deviation over minibatches.
Computes \(\frac 1 n\sum_{i=1}^n(x_i -\mu)^2\) where \(\mu=\frac 1 n\sum_{i=1}^nx_i\) along the batch dimension
- Return
- A scalar expression (with a potential batch dimension)
- Parameters
x
: The input mini-batched expression
-
Expression
dynet
::
sum_dim
(const Expression &x, const std::vector<unsigned> &dims, bool b = false)¶ Compute sum along a specific dimension or dimensions.
Compute the sum along a specific dimension or dimensions
- Return
- An expression with |d| less dimensions and possibly dropped batch dimension
- Parameters
x
: The input mini-batched expressiond
: Dimensions along which to reduceb
: Whether to include batch dimension (default: false)
-
Expression
dynet
::
cumsum
(const Expression &x, unsigned d)¶ Compute cumulative sum along a specific dimension.
Compute the cumulative sum along a specific dimension: \(y_i=\sum_{j\leq i}x_j\)
- Return
- An expression of the same shape as the input
- Parameters
x
: The input mini-batched expressiond
: Dimensions along which to compute the cumulative sum
-
Expression
dynet
::
moment_dim
(const Expression &x, const std::vector<unsigned> &dims, unsigned r, bool b = false, unsigned n = 0)¶ Compute moment along a specific dimension.
Compute the moment of order \(r\), \(\frac 1 n\sum_{i=1}^nx_i^r\) along a specific dimension
- Return
- An expression with |d| less dimensions and possibly dropped batch dimension
- Parameters
x
: The input mini-batched expressiond
: Dimensions along which to reducer
: Order of the momentb
: Whether to include batch dimension (default: false)n
: If > 0, overwrite the n in the equation by this value, useful for masking (default: 0)
-
Expression
dynet
::
mean_dim
(const Expression &x, const std::vector<unsigned> &dims, bool b = false, unsigned n = 0)¶ Compute mean along a specific dimension.
Computes \(\frac 1 n\sum_{i=1}^nx_i\) along a specific dimension
- Return
- An expression with |d| less dimensions and possibly dropped batch dimension
- Parameters
x
: The input mini-batched expressiond
: Dimensions along which to reduceb
: Whether to include batch dimension (default: false)n
: If > 0, overwrite the n in the equation by this value, useful for masking (default: 0)
-
Expression
dynet
::
std_dim
(const Expression &x, const std::vector<unsigned> &dims, bool b = false, unsigned n = 0)¶ Compute standard deviation along an arbitrary dimension.
Computes \(\frac 1 n\sum_{i=1}^n(x_i -\mu)^2\) where \(\mu=\frac 1 n\sum_{i=1}^nx_i\) along an arbitrary dimension
- Return
- An expression with |d| less dimensions and possibly dropped batch dimension
- Parameters
x
: The input mini-batched expressiond
: Dimensions along which to reduceb
: Whether to include batch dimension (default: false)n
: If > 0, overwrite the n in the equation by this value, useful for masking (default: 0)
-
Expression
dynet
::
average
(const std::initializer_list<Expression> &xs)¶ Average.
This performs an elementwise average over all the expressions in xs
- Return
- An expression where the ith element is equal to (xs[0][i] + xs[1][i] + …)/|xs|
- Parameters
xs
: An initializer list containing expressions
-
Expression
dynet
::
sqrt
(const Expression &x)¶ Square root.
Elementwise square root.
- Return
- An expression where the ith element is equal to \(\sqrt(x_i)\)
- Parameters
x
: The input expression
-
Expression
dynet
::
abs
(const Expression &x)¶ Absolute value.
Elementwise absolute value.
- Return
- An expression where the ith element is equal to \(\vert x_i\vert\)
- Parameters
x
: The input expression
-
Expression
dynet
::
erf
(const Expression &x)¶ Gaussian error function.
Elementwise calculation of the Gaussian error function
- Return
- An expression where the ith element is equal to erf(x_i)
- Parameters
x
: The input expression
-
Expression
dynet
::
asin
(const Expression &x)¶ Inverse sine.
Elementwise calculation of the inverse sine
- Return
- An expression where the ith element is equal to asin(x_i)
- Parameters
x
: The input expression
-
Expression
dynet
::
acos
(const Expression &x)¶ Inverse cosine.
Elementwise calculation of the inverse cosine
- Return
- An expression where the ith element is equal to acos(x_i)
- Parameters
x
: The input expression
-
Expression
dynet
::
atan
(const Expression &x)¶ Inverse tangent.
Elementwise calculation of the inverse tangent
- Return
- An expression where the ith element is equal to atan(x_i)
- Parameters
x
: The input expression
-
Expression
dynet
::
sin
(const Expression &x)¶ Sine.
Elementwise calculation of the sine
- Return
- An expression where the ith element is equal to sin(x_i)
- Parameters
x
: The input expression
-
Expression
dynet
::
cos
(const Expression &x)¶ Cosine.
Elementwise calculation of the cosine
- Return
- An expression where the ith element is equal to cos(x_i)
- Parameters
x
: The input expression
-
Expression
dynet
::
tan
(const Expression &x)¶ Tangent.
Elementwise calculation of the tangent
- Return
- An expression where the ith element is equal to tan(x_i)
- Parameters
x
: The input expression
-
Expression
dynet
::
sinh
(const Expression &x)¶ Hyperbolic sine.
Elementwise calculation of the hyperbolic sine
- Return
- An expression where the ith element is equal to sinh(x_i)
- Parameters
x
: The input expression
-
Expression
dynet
::
cosh
(const Expression &x)¶ Hyperbolic cosine.
Elementwise calculation of the hyperbolic cosine
- Return
- An expression where the ith element is equal to cosh(x_i)
- Parameters
x
: The input expression
-
Expression
dynet
::
tanh
(const Expression &x)¶ Hyperbolic tangent.
Elementwise calculation of the hyperbolic tangent
- Return
- An expression where the ith element is equal to tanh(x_i)
- Parameters
x
: The input expression
-
Expression
dynet
::
asinh
(const Expression &x)¶ Inverse hyperbolic sine.
Elementwise calculation of the inverse hyperbolic sine
- Return
- An expression where the ith element is equal to asinh(x_i)
- Parameters
x
: The input expression
-
Expression
dynet
::
acosh
(const Expression &x)¶ Inverse hyperbolic cosine.
Elementwise calculation of the inverse hyperbolic cosine
- Return
- An expression where the ith element is equal to acosh(x_i)
- Parameters
x
: The input expression
-
Expression
dynet
::
atanh
(const Expression &x)¶ Inverse hyperbolic tangent.
Elementwise calculation of the inverse hyperbolic tangent
- Return
- An expression where the ith element is equal to atanh(x_i)
- Parameters
x
: The input expression
-
Expression
dynet
::
exp
(const Expression &x)¶ Natural exponent.
Calculate elementwise y_i = e^{x_i}
- Return
- An expression where the ith element is equal to e^{x_i}
- Parameters
x
: The input expression
-
Expression
dynet
::
square
(const Expression &x)¶ Square.
Calculate elementwise y_i = x_i^2
- Return
- An expression where the ith element is equal to x_i^2
- Parameters
x
: The input expression
-
Expression
dynet
::
cube
(const Expression &x)¶ Cube.
Calculate elementwise y_i = x_i^3
- Return
- An expression where the ith element is equal to x_i^3
- Parameters
x
: The input expression
-
Expression
dynet
::
log_sigmoid
(const Expression &x)¶ Log sigmoid.
Calculate elementwise \(y_i = \ln(\frac{1}{1+e^{-x_i}})\) This is more numerically stable than
log(logistic(x))
- Return
- An expression where the ith element is equal to \(y_i = \ln(\frac{1}{1+e^{-x_i}})\)
- Parameters
x
: The input expression
-
Expression
dynet
::
lgamma
(const Expression &x)¶ Log gamma.
Calculate elementwise y_i = ln(gamma(x_i))
- Return
- An expression where the ith element is equal to ln(gamma(x_i))
- Parameters
x
: The input expression
-
Expression
dynet
::
log
(const Expression &x)¶ Logarithm.
Calculate the elementwise natural logarithm y_i = ln(x_i)
- Return
- An expression where the ith element is equal to ln(x_i)
- Parameters
x
: The input expression
-
Expression
dynet
::
logistic
(const Expression &x)¶ Logistic sigmoid function.
Calculate elementwise y_i = 1/(1+e^{-x_i})
- Return
- An expression where the ith element is equal to y_i = 1/(1+e^{-x_i})
- Parameters
x
: The input expression
-
Expression
dynet
::
rectify
(const Expression &x)¶ Rectifier.
Calculate elementwise the recitifer (ReLU) function y_i = max(x_i,0)
- Return
- An expression where the ith element is equal to max(x_i,0)
- Parameters
x
: The input expression
-
Expression
dynet
::
elu
(const Expression &x, float alpha = 1.f)¶ Exponential Linear Unit.
Calculate elementwise the function
\( y_i = \left\{\begin{array}{lr} x_i, & \text{if } x>0\\ \alpha\times(e^{x_i} - 1), & \text{if }x\leqslant 0\\ \end{array}\right. \)
Reference: Clevert et al., 2015
- Return
- An expression where the ith element is equal to \(\text{ELU}(x_i, \alpha)\)
- Parameters
x
: The input expression
-
Expression
dynet
::
selu
(const Expression &x)¶ Scaled Exponential Linear Unit (SELU)
Calculate elementwise the function
\( y_i = \lambda\times\left\{\begin{array}{lr} x_i, & \text{if } x>0\\ \alpha\times(e^{x_i} - 1), & \text{if }x\leqslant 0\\ \end{array}\right. \)
With \( \begin{split} \lambda &=\texttt{1.0507009873554804934193349852946}\\ \alpha &=\texttt{1.6732632423543772848170429916717}\\ \end{split} \)
Reference: Klambaouer et al., 2017
- Return
- An expression where the ith element is equal to \(\text{SELU}(x_i)\)
- Parameters
x
: The input expression
-
Expression
dynet
::
silu
(const Expression &x, float beta = 1.f)¶ SILU / SiL / Swish.
Calculate elementwise y_i = x_i / (1 + e^{-beta * x_i})
Reference: Hendrycks and Gimpel, 2016, Elfwing et al, 2017, and Ramachandran et al., 2017
- Return
- An expression where the ith element is equal to y_i = x_i / (1 + e^{-beta * x_i})
- Parameters
x
: The input expression
-
Expression
dynet
::
softsign
(const Expression &x)¶ Soft Sign.
Calculate elementwise the softsign function y_i = x_i/(1+|x_i|)
- Return
- An expression where the ith element is equal to x_i/(1+|x_i|)
- Parameters
x
: The input expression
-
Expression
dynet
::
pow
(const Expression &x, const Expression &y)¶ Power function.
Calculate an output where the ith element is equal to x_i^y
- Return
- An expression where the ith element is equal to x_i^y
- Parameters
x
: The input expressiony
: The exponent expression(scalar expression)
-
Expression
dynet
::
min
(const Expression &x, const Expression &y)¶ Minimum.
Calculate an output where the ith element is min(x_i,y_i)
- Return
- An expression where the ith element is equal to min(x_i,y_i)
- Parameters
x
: The first input expressiony
: The second input expression
-
Expression
dynet
::
max
(const Expression &x, const Expression &y)¶ Maximum.
Calculate an output where the ith element is max(x_i,y_i)
- Return
- An expression where the ith element is equal to max(x_i,y_i)
- Parameters
x
: The first input expressiony
: The second input expression
-
Expression
dynet
::
max
(const std::initializer_list<Expression> &xs)¶ Max.
This performs an elementwise max over all the expressions in xs
- Return
- An expression where the ith element is equal to max(xs[0][i], xs[1][i], …)
- Parameters
xs
: An initializer list containing expressions
-
Expression
dynet
::
dot_product
(const Expression &x, const Expression &y)¶ Dot Product.
Calculate the dot product sum_i x_i*y_i
- Return
- An expression equal to the dot product
- Parameters
x
: The input expressiony
: The input expression
-
Expression
dynet
::
circ_conv
(const Expression &u, const Expression &v)¶ Circular convolution.
Calculate the circular convolution
- Return
- An expression equal to the circular convolution
- Parameters
x
: The input expressiony
: The input expression
-
Expression
dynet
::
circ_corr
(const Expression &u, const Expression &v)¶ Circular correlation.
Calculate the circular correlation
- Return
- An expression equal to the circular correlation
- Parameters
x
: The input expressiony
: The input expression
-
Expression
dynet
::
cmult
(const Expression &x, const Expression &y)¶ Componentwise multiply.
Multiply two expressions component-wise, broadcasting dimensions if necessary as follows:
- When number of dimensions differ, we add dimensions of size 1 to make the number of dimensions match
- Now, every dimensions is required to have matching size, or one of the dimensions must equal 1 (in which case it will be broadcasted)
- In the same way, the batch dimension must match, or equal 1 in which case it will be broadcasted
- The resulting tensor’s dimensionality is thus determined as the max of both inputs at every position
- Return
- An expression where the ith element is equal to x_i*y_i
- Parameters
x
: The first input expressiony
: The second input expression
-
Expression
dynet
::
cdiv
(const Expression &x, const Expression &y)¶ Componentwise division.
Divide an expressions component-wise by another, broadcasting dimensions (currently only of the second expression!) if necessary as follows:
- When number of dimensions differ, we add dimensions of size 1 to make the number of dimensions match
- Now, every dimensions is required to have matching size, or the dim size of the right expression must equal 1 (in which case it will be broadcasted)
- In the same way, the batch sizes must match, or the batch size of the right expression must equal 1 in which case it will be broadcasted
- The resulting tensor’s dimensionality is thus determined as the max of both inputs at every position
- Return
- An expression where the ith element is equal to x_i/y_i
- Parameters
x
: The first input expressiony
: The second input expression
-
Expression
dynet
::
colwise_add
(const Expression &x, const Expression &bias)¶ Columnwise addition.
Add vector “bias” to each column of matrix “x”
- Return
- An expression where bias is added to each column of x
- Parameters
x
: An MxN matrixbias
: A length M vector
Probability/Loss Operations¶
These operations are used for calculating probabilities, or calculating loss functions for use in training.
-
Expression
dynet
::
softmax
(const Expression &x, unsigned d = 0)¶ Softmax.
The softmax function normalizes each column to ensure that all values are between 0 and 1 and add to one by applying the e^{x[i]}/{sum_j e^{x[j]}}.
- Return
- A vector or matrix after calculating the softmax
- Parameters
x
: A vector or matrixd
: dimension to normalize over (default: 0)
-
Expression
dynet
::
log_softmax
(const Expression &x)¶ Log softmax.
The log softmax function normalizes each column to ensure that all values are between 0 and 1 and add to one by applying the e^{x[i]}/{sum_j e^{x[j]}}, then takes the log
- Return
- A vector or matrix after calculating the log softmax
- Parameters
x
: A vector or matrix
-
Expression
dynet
::
log_softmax
(const Expression &x, const std::vector<unsigned> &restriction)¶ Restricted log softmax.
The log softmax function calculated over only a subset of the vector elements. The elements to be included are set by the
restriction
variable. All elements not included inrestriction
are set to negative infinity.- Return
- A vector with the log softmax over the specified elements
- Parameters
x
: A vector over which to calculate the softmaxrestriction
: The elements over which to calculate the softmax
-
Expression
dynet
::
logsumexp_dim
(const Expression &x, unsigned d)¶ Log, sum, exp by dimension.
The “logsumexp” function calculated over a particular dimension \(ln(\sum_i e^{xs_i})\), used in adding probabilities in the log domain.
- Return
- The result.
- Parameters
x
: Expression with respect to which to calculate the logsumexp.d
: The dimension along which to do the logsumexp.
-
Expression
dynet
::
logsumexp
(const std::initializer_list<Expression> &xs)¶ Log, sum, exp.
The elementwise “logsumexp” function that calculates \(ln(\sum_i e^{xs_i})\), used in adding probabilities in the log domain.
- Return
- The result.
- Parameters
xs
: Expressions with respect to which to calculate the logsumexp.
-
Expression
dynet
::
pickneglogsoftmax
(const Expression &x, unsigned v)¶ Negative softmax log likelihood.
This function takes in a vector of scores
x
, and performs a log softmax, takes the negative, and selects the likelihood corresponding to the elementv
. This is perhaps the most standard loss function for training neural networks to predict one out of a set of elements.- Return
- The negative log likelihood of element
v
after taking the softmax - Parameters
x
: A vector of scoresv
: The element with which to calculate the loss
-
Expression
dynet
::
pickneglogsoftmax
(const Expression &x, const unsigned *pv)¶ Modifiable negative softmax log likelihood.
This function calculates the negative log likelihood after the softmax with respect to index
*pv
. This computes the same value as the previous function that passes the indexv
by value, but instead passes by pointer so the value*pv
can be modified without re-constructing the computation graph. This can be used in situations where we want to create a computation graph once, then feed it different data points.- Return
- The negative log likelihood of element
*pv
after taking the softmax - Parameters
x
: A vector of scorespv
: A pointer to the index of the correct element
-
Expression
dynet
::
pickneglogsoftmax
(const Expression &x, const std::vector<unsigned> &v)¶ Batched negative softmax log likelihood.
This function is similar to standard pickneglogsoftmax, but calculates loss with respect to multiple batch elements. The input will be a mini-batch of score vectors where the number of batch elements is equal to the number of indices in
v
.- Return
- The negative log likelihoods over all the batch elements
- Parameters
x
: An expression with vectors of scores over N batch elementsv
: A size-N vector indicating the index with respect to all the batch elements
-
Expression
dynet
::
pickneglogsoftmax
(const Expression &x, const std::vector<unsigned> *pv)¶ Modifiable batched negative softmax log likelihood.
This function is a combination of modifiable pickneglogsoftmax and batched pickneglogsoftmax:
pv
can be modified without re-creating the computation graph.- Return
- The negative log likelihoods over all the batch elements
- Parameters
x
: An expression with vectors of scores over N batch elementspv
: A pointer to the indexes
-
Expression
dynet
::
hinge
(const Expression &x, unsigned index, float m = 1.0)¶ Hinge loss.
This expression calculates the hinge loss, formally expressed as: \( \text{hinge}(x,index,m) = \sum_{i \ne index} \max(0, m-x[index]+x[i]). \)
- Return
- The hinge loss of candidate
index
with respect to marginm
- Parameters
x
: A vector of scoresindex
: The index of the correct candidatem
: The margin
-
Expression
dynet
::
hinge
(const Expression &x, const unsigned *pindex, float m = 1.0)¶ Modifiable hinge loss.
This function calculates the hinge loss with with respect to index
*pindex
. This computes the same value as the previous function that passes the indexindex
by value, but instead passes by pointer so the value*pindex
can be modified without re-constructing the computation graph. This can be used in situations where we want to create a computation graph once, then feed it different data points.- Return
- The hinge loss of candidate
*pindex
with respect to marginm
- Parameters
x
: A vector of scorespindex
: A pointer to the index of the correct candidatem
: The margin
-
Expression
dynet
::
hinge
(const Expression &x, const std::vector<unsigned> &indices, float m = 1.0)¶ Batched hinge loss.
The same as hinge loss, but for the case where
x
is a mini-batched tensor withindices.size()
batch elements, andindices
is a vector indicating the index of each of the correct elements for these elements.- Return
- The hinge loss of each mini-batch
- Parameters
x
: A mini-batch of vectors withindices.size()
batch elementsindices
: The indices of the correct candidates for each batch elementm
: The margin
-
Expression
dynet
::
hinge
(const Expression &x, const std::vector<unsigned> *pindices, float m = 1.0)¶ Batched modifiable hinge loss.
A combination of the previous batched and modifiable hinge loss functions, where vector
*pindices
can be modified.- Return
- The hinge loss of each mini-batch
- Parameters
x
: A mini-batch of vectors withindices.size()
batch elementspindices
: Pointer to the indices of the correct candidates for each batch elementm
: The margin
-
Expression
dynet
::
hinge_dim
(const Expression &x, const std::vector<unsigned> &indices, unsigned d = 0, float m = 1.0)¶ Dimensionwise hinge loss.
This expression calculates the hinge loss over a particular dimension
d
.- Return
- A vector of hinge losses for each index in
indices
. - Parameters
x
: A matrix of scoresindices
: The indices of the correct candidate (equal in length to the dimension not specified by “d”)d
: The dimension over which to calculate the loss (0 or 1)m
: The margin
-
Expression
dynet
::
hinge_dim
(const Expression &x, const std::vector<unsigned> *pindex, unsigned d = 0, float m = 1.0)¶ Modifiable dimensionwise hinge loss.
This function calculates the modifiable version of dimensionwise hinge loss.
- Return
- A vector of hinge losses for each index in
indices
. - Parameters
x
: A vector of scorespindex
: A pointer to the index of the correct candidated
: The dimension over which to calculate the loss (0 or 1)m
: The margin
-
Expression
dynet
::
hinge_dim
(const Expression &x, const std::vector<std::vector<unsigned>> &indices, unsigned d = 0, float m = 1.0)¶ Batched dimensionwise hinge loss.
The same as dimensionwise hinge loss, but for the case where
x
is a mini-batched tensor withindices.size()
batch elements.- Return
- A vector of hinge losses for each mini-batch
- Parameters
x
: A mini-batch of vectors withindices.size()
batch elementsindices
: The indices of the correct candidates for each batch elementd
: The dimension over which to calculate the loss (0 or 1)m
: The margin
-
Expression
dynet
::
hinge_dim
(const Expression &x, const std::vector<std::vector<unsigned>> *pindices, unsigned d = 0, float m = 1.0)¶ Batched modifiable hinge loss.
A combination of the previous batched and modifiable hinge loss functions, where vector
*pindices
can be modified.- Return
- The hinge loss of each mini-batch
- Parameters
x
: A mini-batch of vectors withindices.size()
batch elementspindices
: Pointer to the indices of the correct candidates for each batch elementd
: The dimension over which to calculate the loss (0 or 1)m
: The margin
-
Expression
dynet
::
sparsemax
(const Expression &x)¶ Sparsemax.
The sparsemax function (Martins et al. 2016), which is similar to softmax, but induces sparse solutions where most of the vector elements are zero. Note: This function is not yet implemented on GPU.
- Return
- The sparsemax of the scores
- Parameters
x
: A vector of scores
-
Expression
dynet
::
sparsemax_loss
(const Expression &x, const std::vector<unsigned> &target_support)¶ Sparsemax loss.
The sparsemax loss function (Martins et al. 2016), which is similar to softmax loss, but induces sparse solutions where most of the vector elements are zero. It has a gradient similar to the sparsemax function and thus is useful for optimizing when the sparsemax will be used at test time. Note: This function is not yet implemented on GPU.
- Return
- The sparsemax loss of the labels
- Parameters
x
: A vector of scorestarget_support
: The target correct labels.
-
Expression
dynet
::
sparsemax_loss
(const Expression &x, const std::vector<unsigned> *ptarget_support)¶ Modifiable sparsemax loss.
Similar to the sparsemax loss, but with ptarget_support being a pointer to a vector, allowing it to be modified without re-creating the compuation graph. Note: This function is not yet implemented on GPU.
- Return
- The sparsemax loss of the labels
- Parameters
x
: A vector of scoresptarget_support
: A pointer to the target correct labels.
-
Expression
dynet
::
constrained_softmax
(const Expression &x, const Expression &y)¶ Constrained softmax.
The constrained softmax function. Note: This function is not yet implemented on GPU.
- Return
- The constrained softmax of the scores.
- Parameters
x
: A vector of scoresy
: A vector of upper bound constraints on probabilities
-
Expression
dynet
::
squared_norm
(const Expression &x)¶ Squared norm.
The squared L2 norm of the values of x: \(\sum_i x_i^2\).
- Return
- The squared L2 norm
- Parameters
x
: A vector of values
-
Expression
dynet
::
l2_norm
(const Expression &x)¶ L2 norm.
The L2 norm of the values of x: \(\sqrt{\sum_i x_i^2}\).
- Return
- The L2 norm
- Parameters
x
: A vector of values
-
Expression
dynet
::
squared_distance
(const Expression &x, const Expression &y)¶ Squared distance.
The squared distance between values of
x
andy
: \(\sum_i (x_i-y_i)^2\).- Return
- The squared distance
- Parameters
x
: A vector of valuesy
: Another vector of values
-
Expression
dynet
::
l1_distance
(const Expression &x, const Expression &y)¶ L1 distance.
The L1 distance between values of
x
andy
: \(\sum_i |x_i-y_i|\).- Return
- The squared distance
- Parameters
x
: A vector of valuesy
: Another vector of values
-
Expression
dynet
::
huber_distance
(const Expression &x, const Expression &y, float c = 1.345f)¶ Huber distance.
The huber distance between values of
x
andy
parameterized byc,
\(\sum_i L_c(x_i, y_i)\) where:\( L_c(x, y) = \begin{cases} \frac{1}{2}(y - x)^2 & \textrm{for } |y - f(x)| \le c, \\ c\, |y - f(x)| - \frac{1}{2}c^2 & \textrm{otherwise.} \end{cases} \)
- Return
- The huber distance
- Parameters
x
: A vector of valuesy
: Another vector of valuesc
: The parameter of the huber distance parameterizing the cuttoff
-
Expression
dynet
::
binary_log_loss
(const Expression &x, const Expression &y)¶ Binary log loss.
The log loss of a binary decision according to the sigmoid sigmoid function \(- \sum_i (y_i * ln(x_i) + (1-y_i) * ln(1-x_i)) \)
- Return
- The log loss of the sigmoid function
- Parameters
x
: A vector of valuesy
: A vector of true answers
-
Expression
dynet
::
pairwise_rank_loss
(const Expression &x, const Expression &y, real m = 1.0)¶ Pairwise rank loss.
A margin-based loss, where every margin violation for each pair of values is penalized: \(\sum_i max(m - x_i + y_i, 0)\)
- Return
- The pairwise rank loss
- Parameters
x
: A vector of valuesy
: A vector of true answersm
: The margin
-
Expression
dynet
::
poisson_loss
(const Expression &x, unsigned y)¶ Poisson loss.
The negative log probability of
y
according to a Poisson distribution with parameterx
. Useful in Poisson regression where, we try to predict the parameters of a Possion distribution to maximize the probability of datay
.- Return
- The Poisson loss
- Parameters
x
: The parameter of the Poisson distribution.y
: The target value
-
Expression
dynet
::
poisson_loss
(const Expression &x, const unsigned *py)¶ Modifiable Poisson loss.
Similar to Poisson loss, but with the target value passed by pointer so that it can be modified without re-constructing the computation graph.
- Return
- The Poisson loss
- Parameters
x
: The parameter of the Poisson distribution.py
: A pointer to the target value
Flow/Shaping Operations¶
These operations control the flow of information through the graph, or the shape of the vectors/tensors used in the graph.
-
enum
flowoperations
::
ArgmaxGradient
¶ Gradient modes for the argmax operation.
Values:
-
zero_gradient
¶
-
straight_through_gradient
¶
-
-
Expression
dynet
::
nobackprop
(const Expression &x)¶ Prevent backprop.
This node has no effect on the forward pass, but prevents gradients from flowing backward during the backward pass. This is useful when there’s a subgraph for which you don’t want loss passed back to the parameters.
- Return
- The new expression
- Parameters
x
: The input expression
-
Expression
dynet
::
flip_gradient
(const Expression &x)¶ Flip gradient.
This node has no effect on the forward pass, but inverts the gradient on backprop. This operation is widely used in adversarial networks.
- Return
- An output expression containing the same as input (only effects the backprop process)
- Parameters
x
: The input expression
-
Expression
dynet
::
scale_gradient
(const Expression &x, float lambd = 1.0f)¶ Scale gradient by constant.
This node has no effect on the forward pass, but scales the gradient by lambda on backprop
- Return
- An output expression containing the same as input (only effects the backprop process)
- Parameters
x
: The input expression
-
Expression
dynet
::
argmax
(const Expression &x, ArgmaxGradient gradient_mode)¶ Argmax.
This node takes an input vector \(x\) and returns a one hot vector \(y\) such that \(y_{\text{argmax} x}=1\)
There are two gradient modes for this operation:
argmax(x, zero_gradient)
is the standard argmax operation. Note that this almost everywhere differentiable and its gradient is 0. It will stop your gradient
argmax(x, straight_through_gradient)
This gradient mode implements the straight-through estimator (Bengio et al., 2013). Its forward pass is the same as the argmax operation, but its gradient is the same as the identity function. Note that this does not technically correspond to a differentiable function (hence the name “estimator”).
Tensors of order \(>1\) are not supported yet
- Return
- The one hot argmax vector
- Parameters
x
: The input vector (can be batched)gradient_mode
: Specify the gradient type (zero or straight-through)
-
Expression
dynet
::
reshape
(const Expression &x, const Dim &d)¶ Reshape to another size.
This node reshapes a tensor to another size, without changing the underlying layout of the data. The layout of the data in DyNet is column-major, so if we have a 3x4 matrix
\( \begin{pmatrix} x_{1,1} & x_{1,2} & x_{1,3} & x_{1,4} \\ x_{2,1} & x_{2,2} & x_{2,3} & x_{2,4} \\ x_{3,1} & x_{3,2} & x_{3,3} & x_{3,4} \\ \end{pmatrix} \)
and transform it into a 2x6 matrix, it will be rearranged as:
\( \begin{pmatrix} x_{1,1} & x_{3,1} & x_{2,2} & x_{1,3} & x_{3,3} & x_{2,4} \\ x_{2,1} & x_{1,2} & x_{3,2} & x_{2,3} & x_{1,4} & x_{3,4} \\ \end{pmatrix} \)
**Note:** This is O(1) for forward, and O(n) for backward.
- Return
- The reshaped expression
- Parameters
x
: The input expressiond
: The new dimensions
-
Expression dynet::transpose(const Expression & x, const std::vector< unsigned > & dims = {1, 0})
Transpose a matrix.
Transpose a matrix or tensor, or if dims is specified shuffle the dimensions arbitrarily. Note: This is O(1) if either the row or column dimension is 1, and O(n) otherwise.
- Return
- The transposed/shuffled expression
- Parameters
x
: The input expressiondims
: The dimensions to swap. The ith dimension of the output will be equal to the dims[i] dimension of the input. dims must have the same number of dimensions as x.
-
Expression
dynet
::
select_rows
(const Expression &x, const std::vector<unsigned> &rows)¶ Select rows.
Select a subset of rows of a matrix.
- Return
- An expression containing the selected rows
- Parameters
x
: The input expressionrows
: The rows to extract
-
Expression
dynet
::
select_rows
(const Expression &x, const std::vector<unsigned> *prows)¶ Modifiable select rows.
Select a subset of rows of a matrix, where the elements of prows can be modified without re-creating the computation graph.
- Return
- An expression containing the selected rows
- Parameters
x
: The input expressionprows
: The rows to extract
-
Expression
dynet
::
select_cols
(const Expression &x, const std::vector<unsigned> &cols)¶ Select columns.
Select a subset of columns of a matrix. select_cols is more efficient than select_rows since DyNet uses column-major order.
- Return
- An expression containing the selected columns
- Parameters
x
: The input expressioncolumns
: The columns to extract
-
Expression
dynet
::
select_cols
(const Expression &x, const std::vector<unsigned> *pcols)¶ Modifiable select columns.
Select a subset of columns of a matrix, where the elements of pcols can be modified without re-creating the computation graph.
- Return
- An expression containing the selected columns
- Parameters
x
: The input expressionpcolumns
: The columns to extract
-
Expression
dynet
::
pick
(const Expression &x, unsigned v, unsigned d = 0)¶ Pick element.
Pick a single element/row/column/sub-tensor from an expression. This will result in the dimension of the tensor being reduced by 1.
- Return
- The value of x[v] along dimension d
- Parameters
x
: The input expressionv
: The index of the element to selectd
: The dimension along which to choose the element
-
Expression
dynet
::
pick
(const Expression &x, const std::vector<unsigned> &v, unsigned d = 0)¶ Batched pick.
Pick elements from multiple batches.
- Return
- A mini-batched expression containing the picked elements
- Parameters
x
: The input expressionv
: A vector of indicies to choose, one for each batch in the input expression.d
: The dimension along which to choose the elements
-
Expression
dynet
::
pick
(const Expression &x, const unsigned *pv, unsigned d = 0)¶ Modifiable pick element.
Pick a single element from an expression, where the index is passed by pointer so we do not need to re-create the computation graph every time.
- Return
- The value of x[*pv]
- Parameters
x
: The input expressionpv
: Pointer to the index of the element to selectd
: The dimension along which to choose the elements
-
Expression
dynet
::
pick
(const Expression &x, const std::vector<unsigned> *pv, unsigned d = 0)¶ Modifiable batched pick element.
Pick multiple elements from an input expression, where the indices are passed by pointer so we do not need to re-create the computation graph every time.
- Return
- A mini-batched expression containing the picked elements
- Parameters
x
: The input expressionpv
: A pointer to vector of indicies to choosed
: The dimension along which to choose the elements
-
Expression
dynet
::
pick_range
(const Expression &x, unsigned s, unsigned e, unsigned d = 0)¶ Pick range of elements.
Pick a range of elements from an expression.
- Return
- The value of {x[v],…,x[u]}
- Parameters
x
: The input expressions
: The start indexe
: The end indexd
: The dimension along which to pick
-
Expression
dynet
::
pick_batch_elem
(const Expression &x, unsigned v)¶ (Modifiable) Pick batch element.
Pick batch element from a batched expression. For a Tensor with 3 batch elements:
\( \begin{pmatrix} x_{1,1,1} & x_{1,1,2} \\ x_{1,2,1} & x_{1,2,2} \\ \end{pmatrix} \begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix} \begin{pmatrix} x_{3,1,1} & x_{3,1,2} \\ x_{3,2,1} & x_{3,2,2} \\ \end{pmatrix} \)
pick_batch_elem(t, 1) will return a Tensor of
\( \begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix} \)
- Return
- The expression of picked batch element. The picked element is a tensor whose
bd
equals to one. - Parameters
x
: The input expressionv
: The index of the batch element to be picked.
-
Expression
dynet
::
pick_batch_elems
(const Expression &x, const std::vector<unsigned> &v)¶ (Modifiable) Pick batch elements.
Pick several batch elements from a batched expression. For a Tensor with 3 batch elements:
\( \begin{pmatrix} x_{1,1,1} & x_{1,1,2} \\ x_{1,2,1} & x_{1,2,2} \\ \end{pmatrix} \begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix} \begin{pmatrix} x_{3,1,1} & x_{3,1,2} \\ x_{3,2,1} & x_{3,2,2} \\ \end{pmatrix} \)
pick_batch_elems(t, {1, 2}) will return a Tensor of with 2 batch elements:
\( \begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix} \begin{pmatrix} x_{3,1,1} & x_{3,1,2} \\ x_{3,2,1} & x_{3,2,2} \\ \end{pmatrix} \)
- Return
- The expression of picked batch elements. The batch elements is a tensor whose
bd
equals to the size of vectorv
. - Parameters
x
: The input expressionv
: A vector of indicies of the batch elements to be picked.
-
Expression
dynet
::
pick_batch_elem
(const Expression &x, const unsigned *v)¶ Pick batch element.
Pick batch element from a batched expression.
- Return
- The expression of picked batch element. The picked element is a tensor whose
bd
equals to one. - Parameters
x
: The input expressionv
: A pointer to the index of the correct element to be picked.
-
Expression
dynet
::
pick_batch_elems
(const Expression &x, const std::vector<unsigned> *pv)¶ Pick batch elements.
Pick several batch elements from a batched expression.
- Return
- The expression of picked batch elements. The batch elements is a tensor whose
bd
equals to the size of vectorv
. - Parameters
x
: The input expressionv
: A pointer to the indexes
-
Expression
dynet
::
concatenate_to_batch
(const std::initializer_list<Expression> &xs)¶ Concatenate list of expressions to a single batched expression.
Perform a concatenation of several expressions along the batch dimension. All expressions must have the same shape except for the batch dimension.
- Return
- The expression with the batch dimensions concatenated
- Parameters
xs
: The input expressions
-
Expression
dynet
::
strided_select
(const Expression &x, const std::vector<int> &strides, const std::vector<int> &from = {}, const std::vector<int> &to = {})¶ Strided select in multiple dimensions.
Select a range and/or stride of elements from an expression.
- Return
- The value of x[from[0]:to[0]:strides[0],..] (as it would be in numpy syntax)
- Parameters
x
: The input expressionstrides
: List of strides for each dimension, must be >= 1. Dimensions not included default to 1. Batch dimension can be included as very last dimension.from
: List of 0-based offsets (inclusive) for each dimension, must be >= 0. Dimensions not included default to 0. Batch dimension can be included as very last dimension.to
: List of highest 0-based index to select (exclusive) for each dimension, must be >= 0. Dimensions not included default to the corresponding dim size. Batch dimension can be included as very last dimension.
-
Expression
dynet
::
concatenate_cols
(const std::initializer_list<Expression> &xs)¶ Concatenate columns.
Perform a concatenation of the columns in multiple expressions. All expressions must have the same number of rows.
- Return
- The expression with the columns concatenated
- Parameters
xs
: The input expressions
-
Expression
dynet
::
concatenate
(const std::initializer_list<Expression> &xs, unsigned d = 0)¶ Concatenate.
Perform a concatenation of multiple expressions along a particular dimension. All expressions must have the same dimensions except for the dimension to be concatenated (rows by default).
- Return
- The expression with the specified dimension concatenated
- Parameters
xs
: The input expressionsd
: The dimension along which to perform concatenation
-
Expression
dynet
::
max_dim
(const Expression &x, unsigned d = 0)¶ Max out through a dimension.
Select out a element/row/column/sub-tensor from an expression, with maximum value along a given dimension. This will result in the dimension of the tensor being reduced by 1.
- Return
- An expression of sub-tensor with max value along dimension d
- Parameters
x
: The input expressiond
: The dimension along which to choose the element
-
Expression
dynet
::
min_dim
(const Expression &x, unsigned d = 0)¶ Min out through a dimension.
Select out a element/row/column/sub-tensor from an expression, with minimum value along a given dimension. This will result in the dimension of the tensor being reduced by 1.
- Return
- An expression of sub-tensor with min value along dimension d
- Parameters
x
: The input expressiond
: The dimension along which to choose the element
Noise Operations¶
These operations are used to add noise to the graph for purposes of making learning more robust.
-
Expression
dynet
::
noise
(const Expression &x, real stddev)¶ Gaussian noise.
Add gaussian noise to an expression.
- Return
- The noised expression
- Parameters
x
: The input expressionstddev
: The standard deviation of the gaussian
-
Expression
dynet
::
dropout
(const Expression &x, real p)¶ Dropout.
With a fixed probability, drop out (set to zero) nodes in the input expression, and scale the remaining nodes by 1/p. Note that there are two kinds of dropout:
- Regular dropout: where we perform dropout at training time and then
scale outputs by p at test time.
- Inverted dropout: where we perform dropout and scaling at training
time, and do not need to do anything at test time. DyNet implements the latter, so you only need to apply dropout at training time, and do not need to perform scaling and test time.
- Return
- The dropped out expression
- Parameters
x
: The input expressionp
: The dropout probability
- Regular dropout: where we perform dropout at training time and then
-
Expression
dynet
::
dropout_dim
(const Expression &x, unsigned d, real p)¶ Dropout along a specific dimension.
Identical to the dropout operation except the dropout mask is the same across one dimension. Use this if you want to drop columns or lines in a matrix for example
For now this only supports tensors of order <= 3 (with or without batch dimension)
- Return
- The dropped out expression
- Parameters
x
: The input expressiond
: The dimension along which to dropp
: The dropout probability
-
Expression
dynet
::
dropout_batch
(const Expression &x, real p)¶ Dropout entire elements of a minibatch.
Identical to the dropout operation except entire batch elements are dropped
- Return
- The dropped out expression
- Parameters
x
: The input expressionp
: The dropout probability
-
Expression
dynet
::
block_dropout
(const Expression &x, real p)¶ Block dropout.
Identical to the dropout operation, but either drops out all or no values in the expression, as opposed to making a decision about each value individually.
- Return
- The block dropout expression
- Parameters
x
: The input expressionp
: The block dropout probability
Tensor Operations¶
These operations are used for performing operations on higher order tensors.
Remark: Compiling the contraction operations takes a lot of time with CUDA. For this reason, only the CPU implementation is compiled by default. If you need those operations, you need to un-comment this line in the source before compiling. TODO: make this simpler.
-
Expression
dynet
::
contract3d_1d
(const Expression &x, const Expression &y)¶ Contracts a rank 3 tensor and a rank 1 tensor into a rank 2 tensor.
The resulting tensor \(z\) has coordinates \(z_ij = \sum_k x_{ijk} y_k\)
- Return
- Matrix
- Parameters
x
: Rank 3 tensory
: Vector
-
Expression
dynet
::
contract3d_1d_1d
(const Expression &x, const Expression &y, const Expression &z)¶ Contracts a rank 3 tensor and two rank 1 tensor into a rank 1 tensor.
This is the equivalent of calling
contract3d_1d
and then performing a matrix vector multiplication.The resulting tensor \(t\) has coordinates \(t_i = \sum_{j,k} x_{ijk} y_k z_j\)
- Return
- Vector
- Parameters
x
: Rank 3 tensory
: Vectorz
: Vector
-
Expression
dynet
::
contract3d_1d_1d
(const Expression &x, const Expression &y, const Expression &z, const Expression &b)¶ Same as
contract3d_1d_1d
with an additional bias parameter.This is the equivalent of calling
contract3d_1d
and then performing an affine transform.The resulting tensor \(t\) has coordinates \(t_i = b_i + \sum_{j,k} x_{ijk} y_k z_j\)
- Return
- Vector
- Parameters
x
: Rank 3 tensory
: Vectorz
: Vectorb
: Bias vector
-
Expression
dynet
::
contract3d_1d
(const Expression &x, const Expression &y, const Expression &b)¶ Same as
contract3d_1d
with an additional bias parameter.The resulting tensor \(z\) has coordinates \(z_{ij} = b_{ij}+\sum_k x_{ijk} y_k\)
- Return
- Matrix
- Parameters
x
: Rank 3 tensory
: Vectorb
: Bias matrix
Linear Algebra Operations¶
These operations are used for performing various operations common in linear algebra.
-
Expression
dynet
::
inverse
(const Expression &x)¶ Matrix Inverse.
Takes the inverse of a matrix (not implemented on GPU yet, although contributions are welcome: https://github.com/clab/dynet/issues/158). Note that back-propagating through an inverted matrix can also be the source of stability problems sometimes.
- Return
- The inverse of the matrix
- Parameters
x
: A square matrix
-
Expression
dynet
::
logdet
(const Expression &x)¶ Log determinant.
Takes the log of the determinant of a matrix. (not implemented on GPU yet, although contributions are welcome: https://github.com/clab/dynet/issues/158).
- Return
- The log of its determinant
- Parameters
x
: A square matrix
-
Expression
dynet
::
trace_of_product
(const Expression &x, const Expression &y)¶ Trace of Matrix Product.
Takes the trace of the product of matrices. (not implemented on GPU yet, although contributions are welcome: https://github.com/clab/dynet/issues/158).
- Return
- trace(x1 * x2)
- Parameters
x1
: A matrixx2
: Another matrix
Convolution Operations¶
These operations are convolution-related.
-
Expression
dynet
::
conv2d
(const Expression &x, const Expression &f, const std::vector<unsigned> &stride, bool is_valid = true)¶ conv2d without bias
2D convolution operator without bias parameters. ‘VALID’ and ‘SAME’ convolutions are supported. Think about when stride is 1, the distinction:
- SAME: output size is the same with input size. To do so, one needs to pad the input so the filter can sweep outside of the input maps.
- VALID: output size shrinks by filter_size - 1, and the filters always sweep at valid positions inside the input maps. No padding needed.
In detail, assume:
- Input feature maps: (XH x XW x XC) x N
- Filters: FH x FW x XC x FC, 4D tensor
- Strides: strides[0] and strides[1] are row (h) and col (w) stride, respectively.
For the SAME convolution: the output height (YH) and width (YW) are computed as:
- YH = ceil(float(XH) / float(strides[0]))
- YW = ceil(float(XW) / float(strides[1])) and the paddings are computed as:
- pad_along_height = max((YH - 1) * strides[0] + FH - XH, 0)
- pad_along_width = max((YW - 1) * strides[1] + FW - XW, 0)
- pad_top = pad_along_height / 2
- pad_bottom = pad_along_height - pad_top
- pad_left = pad_along_width / 2
- pad_right = pad_along_width - pad_left
For the VALID convolution: the output height (YH) and width (YW) are computed as:
- YH = ceil(float(XH - FH + 1) / float(strides[0]))
- YW = ceil(float(XW - FW + 1) / float(strides[1])) and the paddings are always zeros.
- Return
- The output feature maps (H x W x Co) x N, 3D tensor with an optional batch dimension
- Parameters
x
: The input feature maps: (H x W x Ci) x N (ColMaj), 3D tensor with an optional batch dimensionf
: 2D convolution filters: H x W x Ci x Co (ColMaj), 4D tensorstride
: the row and column stridesis_valid
: ‘VALID’ convolution or ‘SAME’ convolution, default is True (‘VALID’)
-
Expression
dynet
::
conv2d
(const Expression &x, const Expression &f, const Expression &b, const std::vector<unsigned> &stride, bool is_valid = true)¶ conv2d with bias
2D convolution operator with bias parameters. ‘VALID’ and ‘SAME’ convolutions are supported. Think about when stride is 1, the distinction:
- SAME: output size is the same with input size. To do so, one needs to pad the input so the filter can sweep outside of the input maps.
- VALID: output size shrinks by filter_size - 1, and the filters always sweep at valid positions inside the input maps. No padding needed.
In detail, assume:
- Input feature maps: XH x XW x XC x N
- Filters: FH x FW x XC x FC
- Strides: strides[0] and strides[1] are row (h) and col (w) stride, respectively.
For the SAME convolution: the output height (YH) and width (YW) are computed as:
- YH = ceil(float(XH) / float(strides[0]))
- YW = ceil(float(XW) / float(strides[1])) and the paddings are computed as:
- pad_along_height = max((YH - 1) * strides[0] + FH - XH, 0)
- pad_along_width = max((YW - 1) * strides[1] + FW - XW, 0)
- pad_top = pad_along_height / 2
- pad_bottom = pad_along_height - pad_top
- pad_left = pad_along_width / 2
- pad_right = pad_along_width - pad_left
For the VALID convolution: the output height (YH) and width (YW) are computed as:
- YH = ceil(float(XH - FH + 1) / float(strides[0]))
- YW = ceil(float(XW - FW + 1) / float(strides[1])) and the paddings are always zeros.
- Return
- The output feature maps (H x W x Co) x N, 3D tensor with an optional batch dimension
- Parameters
x
: The input feature maps: (H x W x Ci) x N (ColMaj), 3D tensor with an optional batch dimensionf
: 2D convolution filters: H x W x Ci x Co (ColMaj), 4D tensorb
: The bias (1D: Ci)stride
: the row and column stridesis_valid
: ‘VALID’ convolution or ‘SAME’ convolution, default is True (‘VALID’)
-
Expression
dynet
::
maxpooling2d
(const Expression &x, const std::vector<unsigned> &ksize, const std::vector<unsigned> &stride, bool is_valid = true)¶ maxpooling2d
2D maxpooling operator.
- Return
- The output feature maps (H x W x Co) x N, 3D tensor with an optional batch dimension
- Parameters
x
: The input feature maps: (H x W x Ci) x N (ColMaj), 3D tensor with an optional batch dimensionksize
: the height and width of the maxpooling2d window or kernelstride
: the row and column stridesis_valid
: ‘VALID’ or ‘SAME’ (see comments for conv2d) , default is True (‘VALID’)
Normalization Operations¶
This includes batch normalization and the likes.
-
Expression
dynet
::
layer_norm
(const Expression &x, const Expression &g, const Expression &b)¶ Layer normalization.
Performs layer normalization :
\( \begin{split} \mu &= \frac 1 n \sum_{i=1}^n x_i\\ \sigma &= \sqrt{\frac 1 n \sum_{i=1}^n (x_i-\mu)^2}\\ y&=\frac {\boldsymbol{g}} \sigma \circ (\boldsymbol{x}-\mu) + \boldsymbol{b}\\ \end{split} \)
Reference : Ba et al., 2016
- Return
- An expression of the same dimension as
x
- Parameters
x
: Input expression (possibly batched)g
: Gain (same dimension as x, no batch dimension)b
: Bias (same dimension as x, no batch dimension)
-
Expression
dynet
::
weight_norm
(const Expression &w, const Expression &g)¶ Weight normalization.
Performs weight normalization :
\( \begin{split} \hat{w} &= g\frac{w}{\Vert w\Vert}\\ \end{split} \)
Reference : Salimans, Kingma 2016
- Return
- An expression of the same dimension as
w
- Parameters
w
: Input expression (weight parameter)g
: Gain (scalar expression, usually also a parameter)
Device operations¶
These operations are device-related.
-
Expression
dynet
::
to_device
(const Expression &x, Device *device)¶ Copy tensor between devices.
Copy tensor from x’s device to device
- Return
- An expression of x’s tensor in device
- Parameters
x
: Input expressiondevice
: Device to place return tensor