Operations¶
Operation Interface¶
The following functions define DyNet “Expressions,” which are used as an interface to the various functions that can be used to build DyNet computation graphs. Expressions for each specific function are listed below.
-
struct
dynet::expr::
Expression
¶ - #include <expr.h>
Expressions are the building block of a Dynet computation graph.
[long description]
Public Functions
-
Expression
(ComputationGraph *pg, VariableIndex i)¶ Base expression constructor.
Used when creating operations
- Parameters
pg
: Pointer to the computation graphi
: Variable indexname
: Name of the expression
-
const Tensor &
value
() const¶ Get value of the expression.
Throws a tuntime_error exception if no computation graph is available
- Return
- Value of the expression as a tensor
-
const Tensor &
gradient
() const¶ Get gradient of the expression.
Throws a tuntime_error exception if no computation graph is available
Make sure to call
backward
on a downstream expression before calling this.If the expression is a constant expression (meaning it’s not a function of a parameter), dynet won’t compute it’s gradient for the sake of efficiency. You need to manually force the gradient computation by adding the agument
full=true
tobackward
- Return
- Value of the expression as a tensor
-
Input Operations¶
These operations allow you to input something into the computation graph, either simple scalar/vector/matrix inputs from floats, or parameter inputs from a DyNet parameter object. They all requre passing a computation graph as input so you know which graph is being used for this particular calculation.
-
Expression
dynet::expr::
input
(ComputationGraph &g, real s)¶ Scalar input.
Create an expression that represents the scalar value s
- Return
- An expression representing s
- Parameters
g
: Computation graphs
: Real number
-
Expression
dynet::expr::
input
(ComputationGraph &g, const real *ps)¶ Modifiable scalar input.
Create an expression that represents the scalar value *ps. If *ps is changed and the computation graph recalculated, the next forward pass will reflect the new value.
- Return
- An expression representing *ps
- Parameters
g
: Computation graphps
: Real number pointer
-
Expression
dynet::expr::
input
(ComputationGraph &g, const Dim &d, const std::vector<float> &data)¶ Vector/matrix/tensor input.
Create an expression that represents a vector, matrix, or tensor input. The dimensions of the input are defined by
d
. So for example >input(g,{50},data)
: will result in a 50-length vector >input(g,{50,30},data)
: will result in a 50x30 matrix and so on, for an arbitrary number of dimensions. This function can also be used to import minibatched inputs. For example, if we have 10 examples in a minibatch, each with size 50x30, then we call >input(g,Dim({50,30},10),data)
The data vector “data” will contain the values used to fill the input, in column-major format. The length must add to the product of all dimensions in d.- Return
- An expression representing data
- Parameters
g
: Computation graphd
: Dimension of the input matrixdata
: A vector of data points
-
Expression
dynet::expr::
input
(ComputationGraph &g, const Dim &d, const std::vector<float> *pdata)¶ Updatable vector/matrix/tensor input.
Similarly to input that takes a vector reference, input a vector, matrix, or tensor input. Because we pass the pointer, the data can be updated.
- Return
- An expression representing *pdata
- Parameters
g
: Computation graphd
: Dimension of the input matrixpdata
: A pointer to an (updatable) vector of data points
-
Expression
dynet::expr::
input
(ComputationGraph &g, const Dim &d, const std::vector<unsigned int> &ids, const std::vector<float> &data, float defdata = 0.f)¶ Sparse vector input.
This operation takes input as a sparse matrix of index/value pairs. It is exactly the same as the standard input via vector reference, but sets all non-specified values to “defdata” and resets all others to the appropriate input values.
- Return
- An expression representing data
- Parameters
g
: Computation graphd
: Dimension of the input matrixids
: The indexes of the data points to updatedata
: The data points corresponding to each indexdefdata
: The default data with which to set the unspecified data points
-
Expression
dynet::expr::
parameter
(ComputationGraph &g, Parameter p)¶ Load parameter.
Load parameters into the computation graph.
- Return
- An expression representing p
- Parameters
g
: Computation graphp
: Parameter object to load
-
Expression
dynet::expr::
parameter
(ComputationGraph &g, LookupParameter lp)¶ Load lookup parameter.
Load a full tensor of lookup parameters into the computation graph. Normally lookup parameters are accessed by using the lookup() function to grab a single element. However, in some cases we’ll want to access all of the parameters in the entire set of lookup parameters for some reason. In this case you can use this function. In this case, the first dimensions in the returned tensor will be equivalent to the dimensions that we would get if we get calling the lookup() function, and the size of the final dimension will be equal to the size of the vocabulary.
- Return
- An expression representing lp
- Parameters
g
: Computation graphlp
: LookupParameter object to load
-
Expression
dynet::expr::
const_parameter
(ComputationGraph &g, Parameter p)¶ Load constant parameters.
Load parameters into the computation graph, but prevent them from being updated when performing parameter update.
- Return
- An expression representing the constant p
- Parameters
g
: Computation graphp
: Parameter object to load
-
Expression
dynet::expr::
const_parameter
(ComputationGraph &g, LookupParameter lp)¶ Load constant lookup parameters.
Load lookup parameters into the computation graph, but prevent them from being updated when performing parameter update.
- Return
- An expression representing the constant lp
- Parameters
g
: Computation graphlp
: LookupParameter object to load
-
Expression
dynet::expr::
lookup
(ComputationGraph &g, LookupParameter p, unsigned index)¶ Look up parameter.
Look up parameters according to an index, and load them into the computation graph.
- Return
- An expression representing p[index]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadindex
: Index of the parameters within p
-
Expression
dynet::expr::
lookup
(ComputationGraph &g, LookupParameter p, const unsigned *pindex)¶ Look up parameters with modifiable index.
Look up parameters according to the *pindex, and load them into the computation graph. When *pindex changes, on the next computation of forward() the values will change.
- Return
- An expression representing p[*pindex]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadpindex
: Pointer index of the parameters within p
-
Expression
dynet::expr::
const_lookup
(ComputationGraph &g, LookupParameter p, unsigned index)¶ Look up parameter.
Look up parameters according to an index, and load them into the computation graph. Do not perform gradient update on the parameters.
- Return
- A constant expression representing p[index]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadindex
: Index of the parameters within p
-
Expression
dynet::expr::
const_lookup
(ComputationGraph &g, LookupParameter p, const unsigned *pindex)¶ Constant lookup parameters with modifiable index.
Look up parameters according to the *pindex, and load them into the computation graph. When *pindex changes, on the next computation of forward() the values will change. However, gradient updates will not be performend.
- Return
- A constant expression representing p[*pindex]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadpindex
: Pointer index of the parameters within p
-
Expression
dynet::expr::
lookup
(ComputationGraph &g, LookupParameter p, const std::vector<unsigned> &indices)¶ Look up parameters.
The mini-batched version of lookup. The resulting expression will be a mini-batch of parameters, where the “i”th element of the batch corresponds to the parameters at the position specified by the “i”th element of “indices”
- Return
- An expression with the “i”th batch element representing p[indices[i]]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadindices
: Index of the parameters at each position in the batch
-
Expression
dynet::expr::
lookup
(ComputationGraph &g, LookupParameter p, const std::vector<unsigned> *pindices)¶ Look up parameters.
The mini-batched version of lookup with modifiable parameter indices.
- Return
- An expression with the “i”th batch element representing p[*pindices[i]]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadpindices
: Pointer to lookup indices
-
Expression
dynet::expr::
const_lookup
(ComputationGraph &g, LookupParameter p, const std::vector<unsigned> &indices)¶ Look up parameters.
Mini-batched lookup that will not update the parameters.
- Return
- A constant expression with the “i”th batch element representing p[indices[i]]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadindices
: Lookup indices
-
Expression
dynet::expr::
const_lookup
(ComputationGraph &g, LookupParameter p, const std::vector<unsigned> *pindices)¶ Look up parameters.
Mini-batched lookup that will not update the parameters, with modifiable indices.
- Return
- A constant expression with the “i”th batch element representing p[*pindices[i]]
- Parameters
g
: Computation graphp
: LookupParameter object from which to loadpindices
: Lookup index pointers.
-
Expression
dynet::expr::
zeroes
(ComputationGraph &g, const Dim &d)¶ Create an input full of zeros.
Create an input full of zeros, sized according to dimensions d.
- Return
- A “d” dimensioned zero vector
- Parameters
g
: Computation graphd
: The dimensions of the input
-
Expression
dynet::expr::
random_normal
(ComputationGraph &g, const Dim &d)¶ Create a random normal vector.
Create a vector distributed according to normal distribution with mean 0, variance 1.
- Return
- A “d” dimensioned normally distributed vector
- Parameters
g
: Computation graphd
: The dimensions of the input
-
Expression
dynet::expr::
random_bernoulli
(ComputationGraph &g, const Dim &d, real p, real scale = 1.0f)¶ Create a random bernoulli vector.
Create a vector distributed according to bernoulli distribution with parameter p.
- Return
- A “d” dimensioned bernoulli distributed vector
- Parameters
g
: Computation graphd
: The dimensions of the inputp
: The bernoulli p parameterscale
: A scaling factor for the output (“active” elements will receive this value)
-
Expression
dynet::expr::
random_uniform
(ComputationGraph &g, const Dim &d, real left, real right)¶ Create a random uniform vector.
Create a vector distributed according to uniform distribution with boundaries left and right.
- Return
- A “d” dimensioned uniform distributed vector
- Parameters
g
: Computation graphd
: The dimensions of the inputleft
: The left boundaryright
: The right boundary
-
Expression
dynet::expr::
random_gumbel
(ComputationGraph &g, const Dim &d, real mu = 0.0, real beta = 1.0)¶ Create a random Gumbel sampled vector.
Create a vector distributed according to a Gumbel distribution with the specified parameters. (Currently only the defaults of mu=0.0 and beta=1.0 supported.
- Return
- A “d” dimensioned Gumbel distributed vector
- Parameters
g
: Computation graphd
: The dimensions of the inputmu
: The mu parameterbeta
: The beta parameter
Arithmetic Operations¶
These operations perform basic arithemetic over values in the graph.
-
Expression
dynet::expr::
operator-
(const Expression &x)¶ Negation.
Negate the passed argument.
- Return
- The negation of x
- Parameters
x
: An input expression
-
Expression
dynet::expr::
operator+
(const Expression &x, const Expression &y)¶ Expression addition.
Add two expressions of the same dimensions.
- Return
- The sum of x and y
- Parameters
x
: The first inputy
: The second input
-
Expression
dynet::expr::
operator+
(const Expression &x, real y)¶ Scalar addition.
Add a scalar to an expression
- Return
- An expression equal to x, with every component increased by y
- Parameters
x
: The expressiony
: The scalar
-
Expression
dynet::expr::
operator+
(real x, const Expression &y)¶ Scalar addition.
Add a scalar to an expression
- Return
- An expression equal to y, with every component increased by x
- Parameters
x
: The scalary
: The expression
-
Expression
dynet::expr::
operator-
(const Expression &x, const Expression &y)¶ Expression subtraction.
Subtract one expression from another.
- Return
- An expression where the ith element is x_i minus y_i
- Parameters
x
: The expression from which to subtracty
: The expression to subtract
-
Expression
dynet::expr::
operator-
(real x, const Expression &y)¶ Scalar subtraction.
Subtract an expression from a scalar
- Return
- An expression where the ith element is x_i minus y
- Parameters
x
: The scalar from which to subtracty
: The expression to subtract
-
Expression
dynet::expr::
operator-
(const Expression &x, real y)¶ Scalar subtraction.
Subtract a scalar from an expression
- Return
- An expression where the ith element is x_i minus y
- Parameters
x
: The expression from which to subtracty
: The scalar to subtract
-
Expression
dynet::expr::
operator*
(const Expression &x, const Expression &y)¶ Matrix multiplication.
Multiply two matrices together. Like standard matrix multiplication, the second dimension of x and the first dimension of y must match.
- Return
- An expression x times y
- Parameters
x
: The left-hand matrixy
: The right-hand matrix
-
Expression
dynet::expr::
operator*
(const Expression &x, float y)¶ Matrix-scalar multiplication.
Multiply an expression component-wise by a scalar.
- Return
- An expression where the ith element is x_i times y
- Parameters
x
: The matrixy
: The scalar
-
Expression
dynet::expr::
operator*
(float y, const Expression &x)¶ Matrix-scalar multiplication.
Multiply an expression component-wise by a scalar.
- Return
- An expression where the ith element is x_i times y
- Parameters
x
: The scalary
: The matrix
-
Expression
dynet::expr::
operator/
(const Expression &x, float y)¶ Matrix-scalar division.
Divide an expression component-wise by a scalar.
- Return
- An expression where the ith element is x_i divided by y
- Parameters
x
: The matrixy
: The scalar
-
Expression
dynet::expr::
affine_transform
(const std::initializer_list<Expression> &xs)¶ Affine transform.
This performs an affine transform over an arbitrary (odd) number of expressions held in the input initializer list xs. The first expression is the “bias,” which is added to the expression as-is. The remaining expressions are multiplied together in pairs, then added. A very common usage case is the calculation of the score for a neural network layer (e.g. b + Wz) where b is the bias, W is the weight matrix, and z is the input. In this case xs[0] = b, xs[1] = W, and xs[2] = z.
- Return
- An expression equal to: xs[0] + xs[1]*xs[2] + xs[3]*xs[4] + ...
- Parameters
xs
: An initializer list containing an odd number of expressions
-
Expression
dynet::expr::
sum
(const std::initializer_list<Expression> &xs)¶ Sum.
This performs an elementwise sum over all the expressions in xs
- Return
- An expression where the ith element is equal to xs[0][i] + xs[1][i] + ...
- Parameters
xs
: An initializer list containing expressions
-
Expression
dynet::expr::
sum_elems
(const Expression &x)¶ Sum all elements.
Sum all the elements in an expression.
- Return
- The sum of all of its elements
- Parameters
x
: The input expression
-
Expression
dynet::expr::
average
(const std::initializer_list<Expression> &xs)¶ Average.
This performs an elementwise average over all the expressions in xs
- Return
- An expression where the ith element is equal to (xs[0][i] + xs[1][i] + ...)/|xs|
- Parameters
xs
: An initializer list containing expressions
-
Expression
dynet::expr::
sqrt
(const Expression &x)¶ Square root.
Elementwise square root.
- Return
- An expression where the ith element is equal to \(\sqrt(x_i)\)
- Parameters
x
: The input expression
-
Expression
dynet::expr::
abs
(const Expression &x)¶ Absolute value.
Elementwise absolute value.
- Return
- An expression where the ith element is equal to \(\vert x_i\vert\)
- Parameters
x
: The input expression
-
Expression
dynet::expr::
erf
(const Expression &x)¶ Gaussian error function.
Elementwise calculation of the Gaussian error function
- Return
- An expression where the ith element is equal to erf(x_i)
- Parameters
x
: The input expression
-
Expression
dynet::expr::
tanh
(const Expression &x)¶ Hyperbolic tangent.
Elementwise calculation of the hyperbolic tangent
- Return
- An expression where the ith element is equal to tanh(x_i)
- Parameters
x
: The input expression
-
Expression
dynet::expr::
exp
(const Expression &x)¶ Natural exponent.
Calculate elementwise y_i = e^{x_i}
- Return
- An expression where the ith element is equal to e^{x_i}
- Parameters
x
: The input expression
-
Expression
dynet::expr::
square
(const Expression &x)¶ Square.
Calculate elementwise y_i = x_i^2
- Return
- An expression where the ith element is equal to x_i^2
- Parameters
x
: The input expression
-
Expression
dynet::expr::
cube
(const Expression &x)¶ Cube.
Calculate elementwise y_i = x_i^3
- Return
- An expression where the ith element is equal to x_i^3
- Parameters
x
: The input expression
-
Expression
dynet::expr::
lgamma
(const Expression &x)¶ Log gamma.
Calculate elementwise y_i = ln(gamma(x_i))
- Return
- An expression where the ith element is equal to ln(gamma(x_i))
- Parameters
x
: The input expression
-
Expression
dynet::expr::
log
(const Expression &x)¶ Logarithm.
Calculate the elementwise natural logarithm y_i = ln(x_i)
- Return
- An expression where the ith element is equal to ln(x_i)
- Parameters
x
: The input expression
-
Expression
dynet::expr::
logistic
(const Expression &x)¶ Logistic sigmoid function.
Calculate elementwise y_i = 1/(1+e^{-x_i})
- Return
- An expression where the ith element is equal to y_i = 1/(1+e^{-x_i})
- Parameters
x
: The input expression
-
Expression
dynet::expr::
rectify
(const Expression &x)¶ Rectifier.
Calculate elementwise the recitifer (ReLU) function y_i = max(x_i,0)
- Return
- An expression where the ith element is equal to max(x_i,0)
- Parameters
x
: The input expression
-
Expression
dynet::expr::
softsign
(const Expression &x)¶ Soft Sign.
Calculate elementwise the softsign function y_i = x_i/(1+|x_i|)
- Return
- An expression where the ith element is equal to x_i/(1+|x_i|)
- Parameters
x
: The input expression
-
Expression
dynet::expr::
pow
(const Expression &x, const Expression &y)¶ Power function.
Calculate an output where the ith element is equal to x_i^y_i
- Return
- An expression where the ith element is equal to x_i^y_i
- Parameters
x
: The input expressiony
: The exponent expression
-
Expression
dynet::expr::
min
(const Expression &x, const Expression &y)¶ Minimum.
Calculate an output where the ith element is min(x_i,y_i)
- Return
- An expression where the ith element is equal to min(x_i,y_i)
- Parameters
x
: The first input expressiony
: The second input expression
-
Expression
dynet::expr::
max
(const Expression &x, const Expression &y)¶ Maximum.
Calculate an output where the ith element is max(x_i,y_i)
- Return
- An expression where the ith element is equal to max(x_i,y_i)
- Parameters
x
: The first input expressiony
: The second input expression
-
Expression
dynet::expr::
max
(const std::initializer_list<Expression> &xs)¶ Max.
This performs an elementwise max over all the expressions in xs
- Return
- An expression where the ith element is equal to max(xs[0][i], xs[1][i], ...)
- Parameters
xs
: An initializer list containing expressions
-
Expression
dynet::expr::
dot_product
(const Expression &x, const Expression &y)¶ Dot Product.
Calculate the dot product sum_i x_i*y_i
- Return
- An expression equal to the dot product
- Parameters
x
: The input expressiony
: The input expression
-
Expression
dynet::expr::
cmult
(const Expression &x, const Expression &y)¶ Componentwise multiply.
Do a componentwise multiply where each value is equal to x_i*y_i. This function used to be called cwise_multiply.
- Return
- An expression where the ith element is equal to x_i*y_i
- Parameters
x
: The first input expressiony
: The second input expression
-
Expression
dynet::expr::
cdiv
(const Expression &x, const Expression &y)¶ Componentwise multiply.
Do a componentwise multiply where each value is equal to x_i/y_i
- Return
- An expression where the ith element is equal to x_i/y_i
- Parameters
x
: The first input expressiony
: The second input expression
-
Expression
dynet::expr::
colwise_add
(const Expression &x, const Expression &bias)¶ Columnwise addition.
Add vector “bias” to each column of matrix “x”
- Return
- An expression where bias is added to each column of x
- Parameters
x
: An MxN matrixbias
: A length M vector
Probability/Loss Operations¶
These operations are used for calculating probabilities, or calculating loss functions for use in training.
-
Expression
dynet::expr::
softmax
(const Expression &x)¶ Softmax.
The softmax function normalizes each column to ensure that all values are between 0 and 1 and add to one by applying the e^{x[i]}/{sum_j e^{x[j]}}.
- Return
- A vector or matrix after calculating the softmax
- Parameters
x
: A vector or matrix
-
Expression
dynet::expr::
log_softmax
(const Expression &x)¶ Log softmax.
The log softmax function normalizes each column to ensure that all values are between 0 and 1 and add to one by applying the e^{x[i]}/{sum_j e^{x[j]}}, then takes the log
- Return
- A vector or matrix after calculating the log softmax
- Parameters
x
: A vector or matrix
-
Expression
dynet::expr::
log_softmax
(const Expression &x, const std::vector<unsigned> &restriction)¶ Restricted log softmax.
The log softmax function calculated over only a subset of the vector elements. The elements to be included are set by the
restriction
variable. All elements not included inrestriction
are set to negative infinity.- Return
- A vector with the log softmax over the specified elements
- Parameters
x
: A vector over which to calculate the softmaxrestriction
: The elements over which to calculate the softmax
-
Expression
dynet::expr::
logsumexp
(const std::initializer_list<Expression> &xs)¶ Log, sum, exp.
The elementwise “logsumexp” function that calculates \(ln(\sum_i e^{xs_i})\), used in adding probabilities in the log domain.
- Return
- The result.
- Parameters
xs
: Expressions with respect to which to calculate the logsumexp.
-
Expression
dynet::expr::
pickneglogsoftmax
(const Expression &x, unsigned v)¶ Negative softmax log likelihood.
This function takes in a vector of scores
x
, and performs a log softmax, takes the negative, and selects the likelihood corresponding to the elementv
. This is perhaps the most standard loss function for training neural networks to predict one out of a set of elements.- Return
- The negative log likelihood of element
v
after taking the softmax - Parameters
x
: A vector of scoresv
: The element with which to calculate the loss
-
Expression
dynet::expr::
pickneglogsoftmax
(const Expression &x, const unsigned *pv)¶ Modifiable negative softmax log likelihood.
This function calculates the negative log likelihood after the softmax with respect to index
*pv
. This computes the same value as the previous function that passes the indexv
by value, but instead passes by pointer so the value*pv
can be modified without re-constructing the computation graph. This can be used in situations where we want to create a computation graph once, then feed it different data points.- Return
- The negative log likelihood of element
*pv
after taking the softmax - Parameters
x
: A vector of scorespv
: A pointer to the index of the correct element
-
Expression
dynet::expr::
pickneglogsoftmax
(const Expression &x, const std::vector<unsigned> &v)¶ Batched negative softmax log likelihood.
This function is similar to standard pickneglogsoftmax, but calculates loss with respect to multiple batch elements. The input will be a mini-batch of score vectors where the number of batch elements is equal to the number of indices in
v
.- Return
- The negative log likelihoods over all the batch elements
- Parameters
x
: An expression with vectors of scores over N batch elementsv
: A size-N vector indicating the index with respect to all the batch elements
-
Expression
dynet::expr::
pickneglogsoftmax
(const Expression &x, const std::vector<unsigned> *pv)¶ Modifiable batched negative softmax log likelihood.
This function is a combination of modifiable pickneglogsoftmax and batched pickneglogsoftmax:
pv
can be modified without re-creating the computation graph.- Return
- The negative log likelihoods over all the batch elements
- Parameters
x
: An expression with vectors of scores over N batch elementspv
: A pointer to the indexes
-
Expression
dynet::expr::
hinge
(const Expression &x, unsigned index, float m = 1.0)¶ Hinge loss.
This expression calculates the hinge loss, formally expressed as: \( \text{hinge}(x,index,m) = \sum_{i \ne index} \max(0, m-x[index]+x[i]). \)
- Return
- The hinge loss of candidate
index
with respect to marginm
- Parameters
x
: A vector of scoresindex
: The index of the correct candidatem
: The margin
-
Expression
dynet::expr::
hinge
(const Expression &x, const unsigned *pindex, float m = 1.0)¶ Modifiable hinge loss.
This function calculates the hinge loss with with respect to index
*pindex
. This computes the same value as the previous function that passes the indexindex
by value, but instead passes by pointer so the value*pindex
can be modified without re-constructing the computation graph. This can be used in situations where we want to create a computation graph once, then feed it different data points.- Return
- The hinge loss of candidate
*pindex
with respect to marginm
- Parameters
x
: A vector of scorespindex
: A pointer to the index of the correct candidatem
: The margin
-
Expression
dynet::expr::
hinge
(const Expression &x, const std::vector<unsigned> &indices, float m = 1.0)¶ Batched hinge loss.
The same as hinge loss, but for the case where
x
is a mini-batched tensor withindices.size()
batch elements, andindices
is a vector indicating the index of each of the correct elements for these elements.- Return
- The hinge loss of each mini-batch
- Parameters
x
: A mini-batch of vectors withindices.size()
batch elementsindices
: The indices of the correct candidates for each batch elementm
: The margin
-
Expression
dynet::expr::
hinge
(const Expression &x, const std::vector<unsigned> *pindices, float m = 1.0)¶ Batched modifiable hinge loss.
A combination of the previous batched and modifiable hinge loss functions, where vector
*pindices
can be modified.- Return
- The hinge loss of each mini-batch
- Parameters
x
: A mini-batch of vectors withindices.size()
batch elementspindices
: Pointer to the indices of the correct candidates for each batch elementm
: The margin
-
Expression
dynet::expr::
sparsemax
(const Expression &x)¶ Sparsemax.
The sparsemax function (Martins et al. 2016), which is similar to softmax, but induces sparse solutions where most of the vector elements are zero. Note: This function is not yet implemented on GPU.
- Return
- The sparsemax of the scores
- Parameters
x
: A vector of scores
-
Expression
dynet::expr::
sparsemax_loss
(const Expression &x, const std::vector<unsigned> &target_support)¶ Sparsemax loss.
The sparsemax loss function (Martins et al. 2016), which is similar to softmax loss, but induces sparse solutions where most of the vector elements are zero. It has a gradient similar to the sparsemax function and thus is useful for optimizing when the sparsemax will be used at test time. Note: This function is not yet implemented on GPU.
- Return
- The sparsemax loss of the labels
- Parameters
x
: A vector of scorestarget_support
: The target correct labels.
-
Expression
dynet::expr::
sparsemax_loss
(const Expression &x, const std::vector<unsigned> *ptarget_support)¶ Modifiable sparsemax loss.
Similar to the sparsemax loss, but with ptarget_support being a pointer to a vector, allowing it to be modified without re-creating the compuation graph. Note: This function is not yet implemented on GPU.
- Return
- The sparsemax loss of the labels
- Parameters
x
: A vector of scoresptarget_support
: A pointer to the target correct labels.
-
Expression
dynet::expr::
squared_norm
(const Expression &x)¶ Squared norm.
The squared norm of the values of x: \(\sum_i x_i^2\).
- Return
- The squared norm
- Parameters
x
: A vector of values
-
Expression
dynet::expr::
squared_distance
(const Expression &x, const Expression &y)¶ Squared distance.
The squared distance between values of
x
andy
: \(\sum_i (x_i-y_i)^2\).- Return
- The squared distance
- Parameters
x
: A vector of valuesy
: Another vector of values
-
Expression
dynet::expr::
l1_distance
(const Expression &x, const Expression &y)¶ L1 distance.
The L1 distance between values of
x
andy
: \(\sum_i |x_i-y_i|\).- Return
- The squared distance
- Parameters
x
: A vector of valuesy
: Another vector of values
-
Expression
dynet::expr::
huber_distance
(const Expression &x, const Expression &y, float c = 1.345f)¶ Huber distance.
The huber distance between values of
x
andy
parameterized byc,
\(\sum_i L_c(x_i, y_i)\) where:\( L_c(x, y) = \begin{cases}{lr} \frac{1}{2}(y - x)^2 & \textrm{for } |y - f(x)| \le c, \\ c\, |y - f(x)| - \frac{1}{2}c^2 & \textrm{otherwise.} \end{cases} \)
- Return
- The huber distance
- Parameters
x
: A vector of valuesy
: Another vector of valuesc
: The parameter of the huber distance parameterizing the cuttoff
-
Expression
dynet::expr::
binary_log_loss
(const Expression &x, const Expression &y)¶ Binary log loss.
The log loss of a binary decision according to the sigmoid sigmoid function \(- \sum_i (y_i * ln(x_i) + (1-y_i) * ln(1-x_i)) \)
- Return
- The log loss of the sigmoid function
- Parameters
x
: A vector of valuesy
: A vector of true answers
-
Expression
dynet::expr::
pairwise_rank_loss
(const Expression &x, const Expression &y, real m = 1.0)¶ Pairwise rank loss.
A margin-based loss, where every margin violation for each pair of values is penalized: \(\sum_i max(x_i-y_i+m, 0)\)
- Return
- The pairwise rank loss
- Parameters
x
: A vector of valuesy
: A vector of true answersm
: The margin
-
Expression
dynet::expr::
poisson_loss
(const Expression &x, unsigned y)¶ Poisson loss.
The negative log probability of
y
according to a Poisson distribution with parameterx
. Useful in Poisson regression where, we try to predict the parameters of a Possion distribution to maximize the probability of datay
.- Return
- The Poisson loss
- Parameters
x
: The parameter of the Poisson distribution.y
: The target value
-
Expression
dynet::expr::
poisson_loss
(const Expression &x, const unsigned *py)¶ Modifiable Poisson loss.
Similar to Poisson loss, but with the target value passed by pointer so that it can be modified without re-constructing the computation graph.
- Return
- The Poisson loss
- Parameters
x
: The parameter of the Poisson distribution.py
: A pointer to the target value
Flow/Shaping Operations¶
These operations control the flow of information through the graph, or the shape of the vectors/tensors used in the graph.
-
Expression
dynet::expr::
nobackprop
(const Expression &x)¶ Prevent backprop.
This node has no effect on the forward pass, but prevents gradients from flowing backward during the backward pass. This is useful when there’s a subgraph for which you don’t want loss passed back to the parameters.
- Return
- The new expression
- Parameters
x
: The input expression
-
Expression
dynet::expr::
flip_gradient
(const Expression &x)¶ Negative backprop.
This node has no effect on the forward pass, but takes negative on backprop process. This operation is widely used in adversarial networks.
- Return
- An output expression containing the same as input (only effects on backprop process)
- Parameters
x
: The input expression
-
Expression
dynet::expr::
reshape
(const Expression &x, const Dim &d)¶ Reshape to another size.
This node reshapes a tensor to another size, without changing the underlying layout of the data. The layout of the data in DyNet is column-major, so if we have a 3x4 matrix
\( \begin{pmatrix} x_{1,1} & x_{1,2} & x_{1,3} & x_{1,4} \\ x_{2,1} & x_{2,2} & x_{2,3} & x_{2,4} \\ x_{3,1} & x_{3,2} & x_{3,3} & x_{3,4} \\ \end{pmatrix} \)
and transform it into a 2x6 matrix, it will be rearranged as:
\( \begin{pmatrix} x_{1,1} & x_{3,1} & x_{2,2} & x_{1,3} & x_{3,3} & x_{2,4} \\ x_{2,1} & x_{1,2} & x_{3,2} & x_{2,3} & x_{1,4} & x_{3,4} \\ \end{pmatrix} \)
**Note:** This is O(1) for forward, and O(n) for backward.
- Return
- The reshaped expression
- Parameters
x
: The input expressiond
: The new dimensions
-
Expression dynet::expr::transpose(const Expression & x, const std::vector< unsigned > & dims = {1, 0})
Transpose a matrix.
Transpose a matrix or tensor, or if dims is specified shuffle the dimensions arbitrarily. Note: This is O(1) if either the row or column dimension is 1, and O(n) otherwise.
- Return
- The transposed/shuffled expression
- Parameters
x
: The input expressiondims
: The dimensions to swap. The ith dimension of the output will be equal to the dims[i] dimension of the input. dims must have the same number of dimensions as x.
-
Expression
dynet::expr::
select_rows
(const Expression &x, const std::vector<unsigned> &rows)¶ Select rows.
Select a subset of rows of a matrix.
- Return
- An expression containing the selected rows
- Parameters
x
: The input expressionrows
: The rows to extract
-
Expression
dynet::expr::
select_rows
(const Expression &x, const std::vector<unsigned> *prows)¶ Modifiable select rows.
Select a subset of rows of a matrix, where the elements of prows can be modified without re-creating the computation graph.
- Return
- An expression containing the selected rows
- Parameters
x
: The input expressionprows
: The rows to extract
-
Expression
dynet::expr::
select_cols
(const Expression &x, const std::vector<unsigned> &cols)¶ Select columns.
Select a subset of columns of a matrix. select_cols is more efficient than select_rows since DyNet uses column-major order.
- Return
- An expression containing the selected columns
- Parameters
x
: The input expressioncolumns
: The columns to extract
-
Expression
dynet::expr::
select_cols
(const Expression &x, const std::vector<unsigned> *pcols)¶ Modifiable select columns.
Select a subset of columns of a matrix, where the elements of pcols can be modified without re-creating the computation graph.
- Return
- An expression containing the selected columns
- Parameters
x
: The input expressionpcolumns
: The columns to extract
-
Expression
dynet::expr::
sum_batches
(const Expression &x)¶ Sum over minibatches.
Sum an expression that consists of multiple minibatches into one of equal dimension but with only a single minibatch. This is useful for summing loss functions at the end of minibatch training.
- Return
- An expression with a single batch
- Parameters
x
: The input mini-batched expression
-
Expression
dynet::expr::
pick
(const Expression &x, unsigned v, unsigned d = 0)¶ Pick element.
Pick a single element/row/column/sub-tensor from an expression. This will result in the dimension of the tensor being reduced by 1.
- Return
- The value of x[v] along dimension d
- Parameters
x
: The input expressionv
: The index of the element to selectd
: The dimension along which to choose the element
-
Expression
dynet::expr::
pick
(const Expression &x, const std::vector<unsigned> &v, unsigned d = 0)¶ Batched pick.
Pick elements from multiple batches.
- Return
- A mini-batched expression containing the picked elements
- Parameters
x
: The input expressionv
: A vector of indicies to choose, one for each batch in the input expression.d
: The dimension along which to choose the elements
-
Expression
dynet::expr::
pick
(const Expression &x, const unsigned *pv, unsigned d = 0)¶ Modifiable pick element.
Pick a single element from an expression, where the index is passed by pointer so we do not need to re-create the computation graph every time.
- Return
- The value of x[*pv]
- Parameters
x
: The input expressionpv
: Pointer to the index of the element to selectd
: The dimension along which to choose the elements
-
Expression
dynet::expr::
pick
(const Expression &x, const std::vector<unsigned> *pv, unsigned d = 0)¶ Modifiable batched pick element.
Pick multiple elements from an input expression, where the indices are passed by pointer so we do not need to re-create the computation graph every time.
- Return
- A mini-batched expression containing the picked elements
- Parameters
x
: The input expressionpv
: A pointer to vector of indicies to choosed
: The dimension along which to choose the elements
-
Expression
dynet::expr::
pickrange
(const Expression &x, unsigned v, unsigned u)¶ Pick range of elements.
Pick a range of elements from an expression.
- Return
- The value of {x[v],...,x[u]}
- Parameters
x
: The input expressionv
: The beginning indexu
: The end index
-
Expression
dynet::expr::
pick_batch_elem
(const Expression &x, unsigned v)¶ (Modifiable) Pick batch element.
Pick batch element from a batched expression. For a Tensor with 3 batch elements:
\( \begin{pmatrix} x_{1,1,1} & x_{1,1,2} \\ x_{1,2,1} & x_{1,2,2} \\ \end{pmatrix} \begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix} \begin{pmatrix} x_{3,1,1} & x_{3,1,2} \\ x_{3,2,1} & x_{3,2,2} \\ \end{pmatrix} \)
pick_batch_elem(t, 1) will return a Tensor of
\( \begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix} \)
- Return
- The expression of picked batch element. The picked element is a tensor whose
bd
equals to one. - Parameters
x
: The input expressionv
: The index of the batch element to be picked.
-
Expression
dynet::expr::
pick_batch_elems
(const Expression &x, const std::vector<unsigned> &v)¶ (Modifiable) Pick batch elements.
Pick several batch elements from a batched expression. For a Tensor with 3 batch elements:
\( \begin{pmatrix} x_{1,1,1} & x_{1,1,2} \\ x_{1,2,1} & x_{1,2,2} \\ \end{pmatrix} \begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix} \begin{pmatrix} x_{3,1,1} & x_{3,1,2} \\ x_{3,2,1} & x_{3,2,2} \\ \end{pmatrix} \)
pick_batch_elems(t, {2, 3}) will return a Tensor of with 2 batch elements:
\( \begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix} \begin{pmatrix} x_{3,1,1} & x_{3,1,2} \\ x_{3,2,1} & x_{3,2,2} \\ \end{pmatrix} \)
- Return
- The expression of picked batch elements. The batch elements is a tensor whose
bd
equals to the size of vectorv
. - Parameters
x
: The input expressionv
: A vector of indicies of the batch elements to be picked.
-
Expression
dynet::expr::
pick_batch_elem
(const Expression &x, const unsigned *v)¶ Pick batch element.
Pick batch element from a batched expression.
- Return
- The expression of picked batch element. The picked element is a tensor whose
bd
equals to one. - Parameters
x
: The input expressionv
: A pointer to the index of the correct element to be picked.
-
Expression
dynet::expr::
pick_batch_elems
(const Expression &x, const std::vector<unsigned> *pv)¶ Pick batch elements.
Pick several batch elements from a batched expression.
- Return
- The expression of picked batch elements. The batch elements is a tensor whose
bd
equals to the size of vectorv
. - Parameters
x
: The input expressionv
: A pointer to the indexes
-
Expression
dynet::expr::
concatenate_to_batch
(const std::initializer_list<Expression> &xs)¶ Concatenate list of expressions to a single batched expression.
Perform a concatenation of several expressions along the batch dimension. All expressions must have the same shape except for the batch dimension.
- Return
- The expression with the batch dimensions concatenated
- Parameters
xs
: The input expressions
-
Expression
dynet::expr::
concatenate_cols
(const std::initializer_list<Expression> &xs)¶ Concatenate columns.
Perform a concatenation of the columns in multiple expressions. All expressions must have the same number of rows.
- Return
- The expression with the columns concatenated
- Parameters
xs
: The input expressions
-
Expression
dynet::expr::
concatenate
(const std::initializer_list<Expression> &xs, unsigned d = 0)¶ Concatenate.
Perform a concatenation of multiple expressions along a particular dimension. All expressions must have the same dimensions except for the dimension to be concatenated (rows by default).
- Return
- The expression with the specified dimension concatenated
- Parameters
xs
: The input expressionsxs
: The dimension along which to perform concatenation
-
Expression
dynet::expr::
max_dim
(const Expression &x, unsigned d = 0)¶ Max out through a dimension.
Select out a element/row/column/sub-tensor from an expression, with maximum value along a given dimension. This will result in the dimension of the tensor being reduced by 1.
- Return
- An expression of sub-tensor with max value along dimension d
- Parameters
x
: The input expressiond
: The dimension along which to choose the element
-
Expression
dynet::expr::
min_dim
(const Expression &x, unsigned d = 0)¶ Min out through a dimension.
Select out a element/row/column/sub-tensor from an expression, with minimum value along a given dimension. This will result in the dimension of the tensor being reduced by 1.
- Return
- An expression of sub-tensor with min value along dimension d
- Parameters
x
: The input expressiond
: The dimension along which to choose the element
Noise Operations¶
These operations are used to add noise to the graph for purposes of making learning more robust.
-
Expression
dynet::expr::
noise
(const Expression &x, real stddev)¶ Gaussian noise.
Add gaussian noise to an expression.
- Return
- The noised expression
- Parameters
x
: The input expressionstddev
: The standard deviation of the gaussian
-
Expression
dynet::expr::
dropout
(const Expression &x, real p)¶ Dropout.
With a fixed probability, drop out (set to zero) nodes in the input expression, and scale the remaining nodes by 1/p. Note that there are two kinds of dropout:
- Regular dropout: where we perform dropout at training time and then scale outputs by p at test time.
- Inverted dropout: where we perform dropout and scaling at training time, and do not need to do anything at test time. DyNet implements the latter, so you only need to apply dropout at training time, and do not need to perform scaling and test time.
- Return
- The dropped out expression
- Parameters
x
: The input expressionp
: The dropout probability
-
Expression
dynet::expr::
block_dropout
(const Expression &x, real p)¶ Block dropout.
Identical to the dropout operation, but either drops out all or no values in the expression, as opposed to making a decision about each value individually.
- Return
- The block dropout expression
- Parameters
x
: The input expressionp
: The block dropout probability
Tensor Operations¶
These operations are used for performing operations on higher order tensors.
-
Expression
dynet::expr::
contract3d_1d
(const Expression &x, const Expression &y)¶ Contracts a rank 3 tensor and a rank 1 tensor into a rank 2 tensor.
The resulting tensor \(z\) has coordinates \(z_ij = \sum_k x_{ijk} y_k\)
- Return
- Matrix
- Parameters
x
: Rank 3 tensory
: Vector
-
Expression
dynet::expr::
contract3d_1d_1d
(const Expression &x, const Expression &y, const Expression &z)¶ Contracts a rank 3 tensor and two rank 1 tensor into a rank 1 tensor.
This is the equivalent of calling
contract3d_1d
and then performing a matrix vector multiplication.The resulting tensor \(t\) has coordinates \(t_i = \sum_{j,k} x_{ijk} y_k z_j\)
- Return
- Vector
- Parameters
x
: Rank 3 tensory
: Vectorz
: Vector
-
Expression
dynet::expr::
contract3d_1d_1d
(const Expression &x, const Expression &y, const Expression &z, const Expression &b)¶ Same as
contract3d_1d_1d
with an additional bias parameter.This is the equivalent of calling
contract3d_1d
and then performing an affine transform.The resulting tensor \(t\) has coordinates \(t_i = b_i + \sum_{j,k} x_{ijk} y_k z_j\)
- Return
- Vector
- Parameters
x
: Rank 3 tensory
: Vectorz
: Vectorb
: Bias vector
-
Expression
dynet::expr::
contract3d_1d
(const Expression &x, const Expression &y, const Expression &b)¶ Same as
contract3d_1d
with an additional bias parameter.The resulting tensor \(z\) has coordinates \(z_{ij} = b_{ij}+\sum_k x_{ijk} y_k\)
- Return
- Matrix
- Parameters
x
: Rank 3 tensory
: Vectorb
: Bias matrix
Linera Algebra Operations¶
These operations are used for performing various operations common in linear algebra.
-
Expression
dynet::expr::
inverse
(const Expression &x)¶ Matrix Inverse.
Takes the inverse of a matrix (not implemented on GPU yet, although contributions are welcome: https://github.com/clab/dynet/issues/158). Note that back-propagating through an inverted matrix can also be the source of stability problems sometimes.
- Return
- The inverse of the matrix
- Parameters
x
: A square matrix
-
Expression
dynet::expr::
logdet
(const Expression &x)¶ Log determinant.
Takes the log of the determinant of a matrix. (not implemented on GPU yet, although contributions are welcome: https://github.com/clab/dynet/issues/158).
- Return
- The log of its determinant
- Parameters
x
: A square matrix
-
Expression
dynet::expr::
trace_of_product
(const Expression &x, const Expression &y)¶ Trace of Matrix Product.
Takes the trace of the product of matrices. (not implemented on GPU yet, although contributions are welcome: https://github.com/clab/dynet/issues/158).
- Return
- trace(x1 * x2)
- Parameters
x1
: A matrixx2
: Another matrix
Convolution Operations¶
These operations are convolution-related.
-
Expression
dynet::expr::
conv2d
(const Expression &x, const Expression &f, const std::vector<unsigned> &stride, bool is_valid = true)¶ conv2d without bias
2D convolution operator without bias parameters. ‘VALID’ and ‘SAME’ convolutions are supported. Think about when stride is 1, the distinction:
- SAME: output size is the same with input size. To do so, one needs to pad the input so the filter can sweep outside of the input maps.
- VALID: output size shrinks by filter_size - 1, and the filters always sweep at valid positions inside the input maps. No padding needed.
In detail, assume:
- Input feature maps: (XH x XW x XC) x N
- Filters: FH x FW x XC x FC, 4D tensor
- Strides: strides[0] and strides[1] are row (h) and col (w) stride, respectively.
For the SAME convolution: the output height (YH) and width (YW) are computed as:
- YH = ceil(float(XH) / float(strides[0]))
- YW = ceil(float(XW) / float(strides[1])) and the paddings are computed as:
- pad_along_height = max((YH - 1) * strides[0] + FH - XH, 0)
- pad_along_width = max((YW - 1) * strides[1] + FW - XW, 0)
- pad_top = pad_along_height / 2
- pad_bottom = pad_along_height - pad_top
- pad_left = pad_along_width / 2
- pad_right = pad_along_width - pad_left
For the VALID convolution: the output height (YH) and width (YW) are computed as:
- YH = ceil(float(XH - FH + 1) / float(strides[0]))
- YW = ceil(float(XW - FW + 1) / float(strides[1])) and the paddings are always zeros.
- Return
- The output feature maps (H x W x Co) x N, 3D tensor with an optional batch dimension
- Parameters
x
: The input feature maps: (H x W x Ci) x N (ColMaj), 3D tensor with an optional batch dimensionf
: 2D convolution filters: H x W x Ci x Co (ColMaj), 4D tensorstride
: the row and column stridesis_valid
: ‘VALID’ convolution or ‘SAME’ convolution, default is True (‘VALID’)
-
Expression
dynet::expr::
conv2d
(const Expression &x, const Expression &f, const Expression &b, const std::vector<unsigned> &stride, bool is_valid = true)¶ conv2d with bias
2D convolution operator with bias parameters. ‘VALID’ and ‘SAME’ convolutions are supported. Think about when stride is 1, the distinction:
- SAME: output size is the same with input size. To do so, one needs to pad the input so the filter can sweep outside of the input maps.
- VALID: output size shrinks by filter_size - 1, and the filters always sweep at valid positions inside the input maps. No padding needed.
In detail, assume:
- Input feature maps: XH x XW x XC x N
- Filters: FH x FW x XC x FC
- Strides: strides[0] and strides[1] are row (h) and col (w) stride, respectively.
For the SAME convolution: the output height (YH) and width (YW) are computed as:
- YH = ceil(float(XH) / float(strides[0]))
- YW = ceil(float(XW) / float(strides[1])) and the paddings are computed as:
- pad_along_height = max((YH - 1) * strides[0] + FH - XH, 0)
- pad_along_width = max((YW - 1) * strides[1] + FW - XW, 0)
- pad_top = pad_along_height / 2
- pad_bottom = pad_along_height - pad_top
- pad_left = pad_along_width / 2
- pad_right = pad_along_width - pad_left
For the VALID convolution: the output height (YH) and width (YW) are computed as:
- YH = ceil(float(XH - FH + 1) / float(strides[0]))
- YW = ceil(float(XW - FW + 1) / float(strides[1])) and the paddings are always zeros.
- Return
- The output feature maps (H x W x Co) x N, 3D tensor with an optional batch dimension
- Parameters
x
: The input feature maps: (H x W x Ci) x N (ColMaj), 3D tensor with an optional batch dimensionf
: 2D convolution filters: H x W x Ci x Co (ColMaj), 4D tensorb
: The bias (1D: Ci)stride
: the row and column stridesis_valid
: ‘VALID’ convolution or ‘SAME’ convolution, default is True (‘VALID’)
Normalization Operations¶
This includes batch normalization and the likes.
-
Expression
dynet::expr::
layer_norm
(const Expression &x, const Expression &g, const Expression &b)¶ Layer normalization.
Performs layer normalization :
\( \begin{split} \mu &= \frac 1 n \sum_{i=1}^n x_i\\ \sigma &= \sqrt{\frac 1 n \sum_{i=1}^n (x_i-\mu)^2}\\ y&=\frac {\boldsymbol{g}} \sigma \circ (\boldsymbol{x}-\mu) + \boldsymbol{b}\\ \end{split} \)
Reference : Ba et al., 2016
- Return
- An expression of the same dimension as
x
- Parameters
x
: Input expression (possibly batched)g
: Gain (same dimension as x, no batch dimension)b
: Bias (same dimension as x, no batch dimension)