# Core functionalities¶

## Computation Graph¶

The ComputationGraph is the workhorse of DyNet. From the DyNet technical report :

[The] computation graph represents symbolic computation, and the results of the computation are evaluated lazily: the computation is only performed once the user explicitly asks for it (at which point a “forward” computation is triggered). Expressions that evaluate to scalars (i.e. loss values) can also be used to trigger a “backward” computation, computing the gradients of the computation with respect to the parameters.
int dynet::get_number_of_active_graphs()

Gets the number of active graphs.

This is 0 or 1, you can’t create more than one graph at once

Return
Number of active graphs

unsigned dynet::get_current_graph_id()

Get id of the current active graph.

This can help check whether a graph is stale

Return
Id of the current graph

struct ComputationGraph
#include <dynet.h>

Computation graph where nodes represent forward and backward intermediate values, and edges represent functions of multiple values.

To represent the fact that a function may have multiple arguments, edges have a single head and 0, 1, 2, or more tails. (Constants, inputs, and parameters are represented as functions of 0 parameters.) Example: given the function z = f(x, y), z, x, and y are nodes, and there is an edge representing f with which points to the z node (i.e., its head), and x and y are the tails of the edge. You shouldn’t need to use most methods from the ComputationGraph except for backward since most of them are available directly from the Expression class.

Public Functions

ComputationGraph()

Default constructor.

VariableIndex add_input(real s, Device *device)

The computational network will pull inputs in from the user’s data structures and make them available to the computation

Return
The index of the created variable
Parameters
• s: Real number
• device: The device to place input value

VariableIndex add_input(const real *ps, Device *device)

The computational network will pull inputs in from the user’s data structures and make them available to the computation

Return
The index of the created variable
Parameters
• ps: Pointer to a real number
• device: The device to place input value

VariableIndex add_input(const Dim &d, const std::vector<float> &data, Device *device)

The computational network will pull inputs in from the user’s data structures and make them available to the computation

Return
The index of the created variable
Parameters
• d: Desired shape of the input
• data: Input data (as a 1 dimensional array)
• data: The data points corresponding to each index
• device: The device to place input value

VariableIndex add_input(const Dim &d, const std::vector<float> *pdata, Device *device)

The computational network will pull inputs in from the user’s data structures and make them available to the computation

Return
The index of the created variable
Parameters
• d: Desired shape of the input
• pdata: Pointer to the input data (as a 1 dimensional array)
• device: The device to place input value

VariableIndex add_input(const Dim &d, const std::vector<unsigned int> &ids, const std::vector<float> &data, Device *device, float defdata = 0.f)

The computational network will pull inputs in from the user’s data structures and make them available to the computation. Represents specified (not learned) inputs to the network in sparse array format, with an optional default value.

Return
The index of the created variable
Parameters
• d: Desired shape of the input
• ids: The indexes of the data points to update
• data: The data points corresponding to each index
• device: The device to place input value
• defdata: The default data with which to set the unspecified data points

VariableIndex add_parameters(Parameter p)

Add a parameter to the computation graph.

Return
The index of the created variable
Parameters

VariableIndex add_parameters(LookupParameter p)

Add a full matrix of lookup parameters to the computation graph.

Return
The index of the created variable
Parameters

VariableIndex add_const_parameters(Parameter p)

Add a parameter to the computation graph (but don’t update)

Return
The index of the created variable
Parameters

VariableIndex add_const_parameters(LookupParameter p)

Add a full matrix of lookup parameter to the computation graph (but don’t update)

Return
The index of the created variable
Parameters

VariableIndex add_lookup(LookupParameter p, const unsigned *pindex)

Add a lookup parameter to the computation graph.

Use pindex to point to a memory location where the index will live that the caller owns

Return
The index of the created variable
Parameters
• p: Lookup parameter from which to pick
• pindex: Pointer to the index to lookup

VariableIndex add_lookup(LookupParameter p, unsigned index)

Add a lookup parameter to the computation graph.

Return
The index of the created variable
Parameters
• p: Lookup parameter from which to pick
• index: Index to lookup

VariableIndex add_lookup(LookupParameter p, const std::vector<unsigned> *pindices)

Add lookup parameters to the computation graph.

Use pindices to point to a memory location where the indices will live that the caller owns

Return
The index of the created variable
Parameters
• p: Lookup parameter from which to pick
• pindices: Pointer to the indices to lookup

VariableIndex add_lookup(LookupParameter p, const std::vector<unsigned> &indices)

Add lookup parameters to the computation graph.

Return
The index of the created variable
Parameters
• p: Lookup parameter from which to pick
• indices: Indices to lookup

VariableIndex add_const_lookup(LookupParameter p, const unsigned *pindex)

Add a lookup parameter to the computation graph.

Just like add_lookup, but don’t optimize the lookup parameters

Return
The index of the created variable
Parameters
• p: Lookup parameter from which to pick
• pindex: Pointer to the indices to lookup

VariableIndex add_const_lookup(LookupParameter p, unsigned index)

Add a lookup parameter to the computation graph.

Just like add_lookup, but don’t optimize the lookup parameters

Return
The index of the created variable
Parameters
• p: Lookup parameter from which to pick
• index: Index to lookup

VariableIndex add_const_lookup(LookupParameter p, const std::vector<unsigned> *pindices)

Add lookup parameters to the computation graph.

Just like add_lookup, but don’t optimize the lookup parameters

Return
The index of the created variable
Parameters
• p: Lookup parameter from which to pick
• pindices: Pointer to the indices to lookup

VariableIndex add_const_lookup(LookupParameter p, const std::vector<unsigned> &indices)

Add lookup parameters to the computation graph.

Just like add_lookup, but don’t optimize the lookup parameters

Return
The index of the created variable
Parameters
• p: Lookup parameter from which to pick
• indices: Indices to lookup

template <class Function>
VariableIndex add_function(const std::initializer_list<VariableIndex> &arguments)

Add a function to the computation graph.

This what is called when creating an expression

Return
The index of the output variable
Parameters
• arguments: List of the arguments indices
Template Parameters
• Function: Function to be applied

template <class Function, typename... Args>
VariableIndex add_function(const std::initializer_list<VariableIndex> &arguments, Args&&... side_information)

Add a function to the computation graph (with side information)

This what is called when creating an expression

Return
The index of the output variable
Parameters
• arguments: List of the arguments indices
• side_information: Side information that is needed to compute the function
Template Parameters
• Function: Function to be applied

void clear()

Reset ComputationGraph to a newly created state.

[long description]

void checkpoint()

Set a checkpoint.

void revert()

Revert to last checkpoint.

Dim &get_dimension(VariableIndex index) const

Get dimension of a node.

Return
Dimension
Parameters
• index: Variable index of the node

const Tensor &forward(const Expression &last)

Run complete forward pass from first node to given one, ignoring all precomputed values.

Return
Value of the last Expression after execution
Parameters
• last: Expression up to which the forward pass must be computed

const Tensor &forward(VariableIndex i)

Run complete forward pass from first node to given one, ignoring all precomputed values.

Return
Value of the end Node after execution
Parameters
• i: Variable index of the node up to which the forward pass must be computed

const Tensor &incremental_forward(const Expression &last)

Run forward pass from the last computed node to given one.

Useful if you want to add nodes and evaluate just the new parts.

Return
Value of the last Expression after execution
Parameters
• last: Expression up to which the forward pass must be computed

const Tensor &incremental_forward(VariableIndex i)

Run forward pass from the last computed node to given one.

Useful if you want to add nodes and evaluate just the new parts.

Return
Value of the end Node after execution
Parameters
• last: Variable index of the node up to which the forward pass must be computed

const Tensor &get_value(VariableIndex i)

Get forward value for node at index i.

Performs forward evaluation if note available (may compute more than strictly what is needed).

Return
Requested value
Parameters
• i: Index of the variable from which you want the value

const Tensor &get_value(const Expression &e)

Get forward value for the given expression.

Performs forward evaluation if note available (may compute more than strictly what is needed).

Return
Requested value
Parameters
• e: Expression from which you want the value

const Tensor &get_gradient(VariableIndex i)

Get gradient for node at index i.

Performs backward pass if not available (may compute more than strictly what is needed).

Return
Parameters
• i: Index of the variable from which you want the gradient

const Tensor &get_gradient(const Expression &e)

Get forward gradient for the given expression.

Performs backward pass if not available (may compute more than strictly what is needed).

Return
Parameters
• e: Expression from which you want the gradient

void invalidate()

Clears forward caches (for get_value etc).

void backward(const Expression &last, bool full = false)

Computes backward gradients from the front-most evaluated node.

The parameter full specifies whether the gradients should be computed for all nodes (true) or only non-constant nodes.

By default, a node is constant unless

1. it is a parameter node
2. it depends on a non-constant node

Thus, functions of constants and inputs are considered as constants.

Turn full on if you want to retrieve gradients w.r.t. inputs for instance. By default this is turned off, so that the backward pass ignores nodes which have no influence on gradients w.r.t. parameters for efficiency.

Parameters
• last: Expression from which to compute the gradient
• full: Whether to compute all gradients (including with respect to constant nodes).

void backward(VariableIndex i, bool full = false)

The parameter full specifies whether the gradients should be computed for all nodes (true) or only non-constant nodes.

By default, a node is constant unless

1. it is a parameter node
2. it depends on a non-constant node

Thus, functions of constants and inputs are considered as constants.

Turn full on if you want to retrieve gradients w.r.t. inputs for instance. By default this is turned off, so that the backward pass ignores nodes which have no influence on gradients w.r.t. parameters for efficiency.

Parameters
• i: Index of the node from which to compute the gradient
• full: Whether to compute all gradients (including with respect to constant nodes). Turn this on if you want to retrieve gradients w.r.t. inputs for instance. By default this is turned off, so that the backward pass ignores nodes which have no influence on gradients w.r.t. parameters for efficiency.

void print_graphviz() const

Used for debugging.

unsigned get_id() const

Get the unique graph ID.

This ID is incremented by 1 each time a computation graph is created

Return
graph is

## Nodes¶

Nodes are constituents of the computation graph. The end user doesn’t interact with Nodes but with Expressions.

However implementing new operations requires to create a new subclass of the Node class described below.

struct Node
#include <dynet.h>

Represents an SSA variable.

Contains information on tha computation node : arguments, output value and gradient of the output with respect to the function. This class must be inherited to implement any new operation. See nodes.cc for examples. An operation on expressions can then be created from the new Node, see expr.h/expr.cc for examples

Subclassed by dynet::Abs, dynet::Acos, dynet::Acosh, dynet::AddVectorToAllColumns, dynet::AffineTransform, dynet::Argmax, dynet::Asin, dynet::Asinh, dynet::Atan, dynet::Atanh, dynet::Average, dynet::BinaryLogLoss, dynet::BlockDropout, dynet::Ceil, dynet::CircularConvolution, dynet::CircularCorrelation, dynet::Concatenate, dynet::ConcatenateToBatch, dynet::Constant, dynet::ConstantMinusX, dynet::ConstantPlusX, dynet::ConstParameterNode, dynet::ConstrainedSoftmax, dynet::ConstScalarMultiply, dynet::Conv2D, dynet::Cos, dynet::Cosh, dynet::Cube, dynet::CumulativeSum, dynet::CwiseMultiply, dynet::CwiseQuotient, dynet::CwiseSum, dynet::DotProduct, dynet::Dropout, dynet::DropoutBatch, dynet::DropoutDim, dynet::Erf, dynet::Exp, dynet::ExponentialLinearUnit, dynet::Filter1DNarrow, dynet::Floor, dynet::FoldRows, dynet::GaussianNoise, dynet::Hinge, dynet::HingeDim, dynet::HuberDistance, dynet::Identity, dynet::InnerProduct3D_1D, dynet::InnerProduct3D_1D_1D, dynet::InputNode, dynet::KMaxPooling, dynet::KMHNGram, dynet::L1Distance, dynet::L2Norm, dynet::Log, dynet::LogDet, dynet::LogGamma, dynet::LogisticSigmoid, dynet::LogSigmoid, dynet::LogSoftmax, dynet::LogSumExp, dynet::LogSumExpDimension, dynet::MatrixInverse, dynet::MatrixMultiply, dynet::Max, dynet::MaxDimension, dynet::MaxPooling1D, dynet::MaxPooling2D, dynet::Min, dynet::MinDimension, dynet::MomentBatches, dynet::MomentDimension, dynet::MomentElements, dynet::Negate, dynet::NoBackprop, dynet::PairwiseRankLoss, dynet::ParameterNodeBase, dynet::PickBatchElements, dynet::PickElement, dynet::PickNegLogSoftmax, dynet::PickRange, dynet::PoissonRegressionLoss, dynet::Pow, dynet::RandomBernoulli, dynet::RandomGumbel, dynet::RandomNormal, dynet::RandomUniform, dynet::Rectify, dynet::Reshape, dynet::RestrictedLogSoftmax, dynet::Round, dynet::ScalarInputNode, dynet::ScaleGradient, dynet::SelectCols, dynet::SelectRows, dynet::SigmoidLinearUnit, dynet::Sin, dynet::Sinh, dynet::Softmax, dynet::SoftSign, dynet::SparseInputNode, dynet::Sparsemax, dynet::SparsemaxLoss, dynet::Sqrt, dynet::Square, dynet::SquaredEuclideanDistance, dynet::SquaredNorm, dynet::StdBatches, dynet::StdDimension, dynet::StdElements, dynet::StridedSelect, dynet::Sum, dynet::SumDimension, dynet::SumElements, dynet::Tan, dynet::Tanh, dynet::ToDevice, dynet::TraceOfProduct, dynet::Transpose, dynet::VanillaLSTMC, dynet::VanillaLSTMGates, dynet::VanillaLSTMH, dynet::WeightNormalization

Public Types

enum INPLACE_TYPE

< Type for the inplace operations: NOPE(non-inplace), READ(no changes to the memory), WRITE(unrecoverable changes possibly)

Values:

NOPE
READ
WRITE

Public Functions

virtual Dim dim_forward(const std::vector<Dim> &xs) const = 0

Compute dimensions of result for given dimensions of inputs.

Also checks to make sure inputs are compatible with each other

Return
Dimension of the output
Parameters
• xs: Vector containing the dimensions of the inputs

virtual std::string as_string(const std::vector<std::string> &args) const = 0

Returns important information for debugging.

See nodes-conv.cc for examples

Return
String description of the node
Parameters
• args: String descriptions of the arguments

size_t aux_storage_size() const

Size of the auxiliar storage.

in general, this will return an empty size, but if a component needs to store extra information in the forward pass for use in the backward pass, it can request the memory here (nb. you could put it on the Node object, but in general, edges should not allocate tensor memory since memory is managed centrally for the entire computation graph).

Return
Size

virtual void forward_impl(const std::vector<const Tensor *> &xs, Tensor &fx) const = 0

Forward computation.

This function contains the logic for the forward pass. Some implementation remarks from nodes.cc:

1. fx can be understood as a pointer to the (preallocated) location for the result of forward to be stored
2. fx is not initialized, so after calling forward fx must point to the correct answer
3. fx can be repointed to an input, if forward(x) evaluates to x (e.g., in reshaping)
4. scalars results of forward are placed in fx.v[0]
5. DYNET manages its own memory, not Eigen, and it is configured with the EIGEN_NO_MALLOC option. If you get an error about Eigen attempting to allocate memory, it is (probably) because of an implicit creation of a temporary variable. To tell Eigen this is not necessary, the noalias() method is available. If you really do need a temporary variable, its capacity must be requested by Node::aux_storage_size

Note on debugging problems with differentiable components

• fx is uninitialized when forward is called- are you relying on it being 0?

Parameters
• xs: Pointers to the inputs
• fx: pointer to the (preallocated) location for the result of forward to be stored

virtual void backward_impl(const std::vector<const Tensor *> &xs, const Tensor &fx, const Tensor &dEdf, unsigned i, Tensor &dEdxi) const = 0

Accumulates the derivative of E with respect to the ith argument to f, that is, xs[i].

This function contains the logic for the backward pass. Some implementation remarks from nodes.cc:

1. dEdxi MUST ACCUMULATE a result since multiple calls to forward may depend on the same x_i. Even, e.g., Identity must be implemented as dEdx1 += dEdf. THIS IS EXTREMELY IMPORTANT
2. scalars results of forward are placed in fx.v[0]
3. DYNET manages its own memory, not Eigen, and it is configured with the EIGEN_NO_MALLOC option. If you get an error about Eigen attempting to allocate memory, it is (probably) because of an implicit creation of a temporary variable. To tell Eigen this is not necessary, the noalias() method is available. If you really do need a temporary variable, its capacity must be requested by Node::aux_storage_size

Note on debugging problems with differentiable components

• dEdxi must accummulate (see point 4 above!)

Parameters
• xs: Pointers to inputs
• fx: Output
• dEdf: Gradient of the objective w.r.t the output of the node
• i: Index of the input w.r.t which we take the derivative
• dEdxi: Gradient of the objective w.r.t the input of the node

virtual bool supports_multibatch() const

Whether this node supports computing multiple batches in one call.

If true, forward and backward will be called once with a multi-batch tensor. If false, forward and backward will be called multiple times for each item.

Return
Support for multibatch

virtual bool supports_multidevice() const

Whether this node supports processing inputs/outputs on multiple devices.

DyNet will throw an error if you try to process inputs and outputs on different devices unless this is activated.

Return
Support for multi-device

void forward(const std::vector<const Tensor *> &xs, Tensor &fx) const

perform the forward/backward passes in one or multiple calls

Parameters
• xs: Pointers to the inputs
• fx: pointer to the (preallocated) location for the result of forward to be stored

void backward(const std::vector<const Tensor *> &xs, const Tensor &fx, const Tensor &dEdf, unsigned i, Tensor &dEdxi) const

perform the backward passes in one or multiple calls

Parameters
• xs: Pointers to inputs
• fx: Output
• dEdf: Gradient of the objective w.r.t the output of the node
• i: Index of the input w.r.t which we take the derivative
• dEdxi: Gradient of the objective w.r.t the input of the node

virtual int autobatch_sig(const ComputationGraph &cg, SigMap &sm) const

signature for automatic batching This will be equal only for nodes that can be combined. Returns 0 for unbatchable functions.

virtual std::vector<int> autobatch_concat(const ComputationGraph &cg) const

which inputs can be batched This will be true for inputs that should be concatenated when autobatching, and false for inputs that should be shared among all batches.

virtual Node *autobatch_pseudo_node(const ComputationGraph &cg, const std::vector<VariableIndex> &batch_ids) const

create a pseudonode for autobatching This will combine together multiple nodes into one big node for the automatic batching functionality. When a node representing one component of the mini-batch can be used as-is it is OK to just return the null pointer, otherwise we should make the appropriate changes and return a new node.

virtual void autobatch_reshape(const ComputationGraph &cg, const std::vector<VariableIndex> &batch_ids, const std::vector<int> &concat, std::vector<const Tensor *> &xs, Tensor &fx) const

reshape the tensors for auto Takes in info, and reshapes the dimensions of xs (for which “concat” is true), and fx. By default do no reshaping, which is OK for componentwise operations.

void autobatch_reshape_concatonly(const ComputationGraph &cg, const std::vector<VariableIndex> &batch_ids, const std::vector<int> &concat, std::vector<const Tensor *> &xs, Tensor &fx) const

reshape the tensors for auto Takes in info, and reshapes the dimensions of xs (for which “concat” is true) and fx by concatenating their batches.

unsigned arity() const

Number of arguments to the function.

Return
Arity of the function

Public Members

std::vector<VariableIndex> args

Dependency structure

Dim dim

Will be .size() = 0 initially filled in by forward() TODO fix this

void *aux_mem

this will usually be null. but, if your node needs to store intermediate values between forward and backward, you can use store it here. request the number of bytes you need from aux_storage_size(). Note: this memory will be on the CPU or GPU, depending on your computation backend

## Parameters and Model¶

Parameters are things that are optimized. in contrast to a system like Torch where computational modules may have their own parameters, in DyNet parameters are just parameters.

To deal with sparse updates, there are two parameter classes:

• Parameters represents a vector, matrix, (eventually higher order tensors) of parameters. These are densely updated.
• LookupParameters represents a table of vectors that are used to embed a set of discrete objects. These are sparsely updated.
struct ParameterStorageBase
#include <model.h>

This is the base class for ParameterStorage and LookupParameterStorage, the objects handling the actual parameters.

You can access the storage from any Parameter (resp. LookupParameter) class, use it only to do low level manipulations.

Subclassed by dynet::LookupParameterStorage, dynet::ParameterStorage

Public Functions

virtual void scale_parameters(float a) = 0

Scale the parameters.

Parameters
• a: scale factor

virtual void scale_gradient(float a) = 0

Parameters
• a: scale factor

virtual void zero() = 0

Set the parameters to 0.

virtual void squared_l2norm(float *sqnorm) const = 0

Get the parameter squared l2 norm.

Parameters
• sqnorm: Pointer to the float holding the result

virtual void g_squared_l2norm(float *sqnorm) const = 0

Get the squared l2 norm of the gradient w.r.t. these parameters.

Parameters
• sqnorm: Pointer to the float holding the result

virtual bool is_updated() const = 0

Check whether corpus is updated.

virtual bool has_grad() const = 0

Check whether the gradient is zero or not (true if gradient is non-zero)

virtual size_t size() const = 0

Get the size (number of scalar parameters)

Return
Number of scalar parameters

struct ParameterStorage : public dynet::ParameterStorageBase
#include <model.h>

Storage class for Parameters.

Subclassed by dynet::ParameterStorageCreator

Public Functions

void copy(const ParameterStorage &val)

Copy from another ParameterStorage.

Parameters

void accumulate_grad(const Tensor &g)

After this method gets called, g <- g + d

Parameters
• g: Tensor to add

void clear()

Clear the gradient (set it to 0)

void clip(float left, float right)

Clip the values to the range [left, right].

Public Members

std::string name

Name of this parameter

Dim dim

Dimensions of the parameter tensor

Tensor values

Values of the parameter

Tensor g

Values of the gradient w.r.t. this parameter

bool updated

Whether this is updated

bool nonzero_grad

ParameterCollection *owner

Pointer to the collection that “owns” this parameter

struct LookupParameterStorage : public dynet::ParameterStorageBase
#include <model.h>

Storage class for LookupParameters.

Subclassed by dynet::LookupParameterStorageCreator

Public Functions

void initialize(unsigned index, const std::vector<float> &val)

Initialize one particular lookup.

Parameters
• index: Index of the lookput to initialize
• val: Values

void copy(const LookupParameterStorage &val)

Copy from another LookupParameterStorage.

Parameters

void accumulate_grad(const Tensor &g)

after this grads<-grads + g

Parameters
• g: [description]

void accumulate_grad(unsigned index, const Tensor &g)

after this grads[index]<-grads[index] + g

Parameters
• index: [description]
• g: [description]

void accumulate_grads(unsigned n, const unsigned *ids_host, const unsigned *ids_dev, float *g)

After this method gets called, grads[ids_host[i]] <- grads[ids_host[i]] + g[i*dim.size():(i+1)*dim.size()]

Parameters
• n: size of ids_host
• ids_host: Indices of the gradients to update
• ids_dev: [To be documented] (only for GPU)
• g: Values

Public Members

std::string name

Name of this parameter

Dim all_dim

Total dimension

Tensor all_values

Values for all dimensions at once

Tensor all_grads

Gradient values for all dimensions at once

Dim dim

Dimension for one lookup

std::vector<Tensor> values

List of values for each lookup

std::vector<Tensor> grads

List of gradient values for each lookup

std::unordered_set<unsigned> non_zero_grads

Gradients are sparse, so track which components are nonzero

bool updated

Whether this lookup parameter should be updated

bool nonzero_grad

Whether all of the gradients have been updated. Whether the gradient is zero

ParameterCollection *owner

Pointer to the collection that “owns” this parameter

struct Parameter
#include <model.h>

Object representing a trainable parameter.

This objects acts as a high level component linking the actual parameter values (ParameterStorage) and the ParameterCollection. As long as you don’t want to do low level hacks at the ParameterStorage level, this is what you will use.

Public Functions

Parameter()

Default constructor.

Parameter(std::shared_ptr<ParameterStorage> p)

Constructor.

This is called by the model, you shouldn’t need to use it

Parameters
• p: Shared pointer to the parameter storage

ParameterStorage &get_storage() const

Get underlying ParameterStorage object.

Return
ParameterStorage holding the parameter values

string get_fullname() const

Get the full name of the ParameterStorage object.

void zero()

Zero the parameters.

Dim dim() const

Shape of the parameter.

Return
Shape as a Dim object

Tensor *values()

Values of the parameter.

Return
Values as a Tensor object

Tensor *gradients()

Return
gradients as a Tensor object

float current_weight_decay() const

Get the current weight decay for the parameters.

void set_updated(bool b)

Set the parameter as updated.

Parameters
• b: Update status

void scale(float s)

Scales the parameter (multiplies by s)

Parameters
• s: scale

void scale_gradient(float s)

Scales the gradient (multiplies by s)

Parameters
• s: scale

bool is_updated() const

Check the update status.

Return
Update status

void clip_inplace(float left, float right)

Clip the values of the parameter to the range [left, right] (in place)

void set_value(const std::vector<float> &val)

set the values of the parameter

Public Members

std::shared_ptr<ParameterStorage> p

Pointer to the storage for this Parameter

struct LookupParameter
#include <model.h>

Object representing a trainable lookup parameter.

Public Functions

LookupParameterStorage &get_storage() const

Get underlying LookupParameterStorage object.

Return
LookupParameterStorage holding the parameter values

void initialize(unsigned index, const std::vector<float> &val) const

Initialize one particular column.

Parameters
• index: Index of the column to be initialized
• val: [description]

void zero()

Zero the parameters.

string get_fullname() const

Get the full name of the ParameterStorage object.

Dim dim() const

Shape of the lookup parameter.

Return
Shape as a Dim object

std::vector<Tensor> *values()

Values of the lookup parameter.

Return
Values as a Tensor object

float current_weight_decay() const

Get the current weight decay for the parameters.

void scale(float s)

Scales the parameter (multiplies by s)

Parameters
• s: scale

void scale_gradient(float s)

Scales the gradient (multiplies by s)

Parameters
• s: scale

void set_updated(bool b)

Set the parameter as updated.

Parameters
• b: Update status

bool is_updated() const

Check the update status.

Return
Update status

Public Members

std::shared_ptr<LookupParameterStorage> p

Pointer to the storage for this Parameter

class ParameterCollection
#include <model.h>

This is a collection of parameters.

if you need a matrix of parameters, or a lookup table - ask an instance of this class. This knows how to serialize itself. Parameters know how to track their gradients, but any extra information (like velocity) will live here

Subclassed by dynet::Model

Public Functions

ParameterCollection()

Constructor.

Weight-decay value is taken from commandline option.

ParameterCollection(float weight_decay_lambda)

Constructor.

Parameters
• weight_decay_lambda: Default weight-decay value for this collection.

float gradient_l2_norm() const

Use this to look for gradient vanishing/exploding

Return

void reset_gradient()

Parameter add_parameters(const Dim &d, float scale = 0.0f, const std::string &name = "", Device *device = dynet::default_device)

Add parameters to model and returns Parameter object.

creates a ParameterStorage object holding a tensor of dimension d and returns a Parameter object (to be used as input in the computation graph). The coefficients are sampled according to the scale parameter

Return
Parameter object to be used in the computation graph
Parameters
• d: Shape of the parameter
• scale: If scale is non-zero, initializes according to $$mathcal U([-\mathrm{scale},+\mathrm{scale}]$$, otherwise uses Glorot initialization
• name: Name of the parameter
• device: Device placement for the parameter

Parameter add_parameters(const Dim &d, Device *device)

Add parameters to model and returns Parameter object.

creates a ParameterStorage object holding a tensor of dimension d and returns a Parameter object (to be used as input in the computation graph).

Return
Parameter object to be used in the computation graph
Parameters
• d: Shape of the parameter
• device: Device placement for the parameter

Parameter add_parameters(const Dim &d, const std::string &name, Device *device = dynet::default_device)

Add parameters to model and returns Parameter object.

creates a ParameterStorage object holding a tensor of dimension d and returns a Parameter object (to be used as input in the computation graph).

Return
Parameter object to be used in the computation graph
Parameters
• d: Shape of the parameter
• name: Name of the parameter
• device: Device placement for the parameter

Parameter add_parameters(const Dim &d, const ParameterInit &init, const std::string &name = "", Device *device = dynet::default_device)

Return
Parameter object to be used in the computation graph
Parameters
• d: Shape of the parameter
• init: Custom initializer
• name: Name of the parameter
• device: Device placement for the parameter

std::vector<std::shared_ptr<ParameterStorageBase>> get_parameter_storages_base() const

Get parameters base in current model.

Return
list of points to ParameterStorageBase objects

std::shared_ptr<ParameterStorage> get_parameter_storage(const std::string &pname)

Get parameter in current model.

It is not recommended to use this

Return
the pointer to the Parameter object

std::vector<std::shared_ptr<ParameterStorage>> get_parameter_storages() const

Get parameters in current model.

Return
list of points to ParameterStorage objects

LookupParameter add_lookup_parameters(unsigned n, const Dim &d, const std::string &name = "", Device *device = dynet::default_device)

Same as add_parameters. Initializes with Glorot

Return
LookupParameter object to be used in the computation graph
Parameters
• n: Number of lookup indices
• d: Dimension of each embedding
• name: Name of the parameter
• device: Device placement for the parameter

LookupParameter add_lookup_parameters(unsigned n, const Dim &d, const ParameterInit &init, const std::string &name = "", Device *device = dynet::default_device)

Add lookup parameter with custom initializer.

Return
LookupParameter object to be used in the computation graph
Parameters
• n: Number of lookup indices
• d: Dimension of each embedding
• init: Custom initializer
• name: Name of the parameter
• device: Device placement for the parameter

std::shared_ptr<LookupParameterStorage> get_lookup_parameter_storage(const std::string &lookup_pname)

Get lookup parameter in current model.

It is not recommended to use this

Return
the pointer to the LookupParameter object

std::vector<std::shared_ptr<LookupParameterStorage>> get_lookup_parameter_storages() const

Get lookup parameters in current model.

Return
list of points to LookupParameterStorage objects

void project_weights(float radius = 1.0f)

project weights so their L2 norm = radius

NOTE (Paul) : I am not sure this is doing anything currently. The argument doesn’t seem to be used anywhere… If you need this raise an issue on github

Parameters
• radius: Target norm

void set_weight_decay_lambda(float lambda)

Set the weight decay coefficient.

Parameters
• lambda: Weight decay coefficient

const std::vector<std::shared_ptr<ParameterStorage>> &parameters_list() const

Returns list of shared pointers to ParameterSorages.

You shouldn’t need to use this

Return
List of shared pointers to ParameterSorages

const std::vector<std::shared_ptr<LookupParameterStorage>> &lookup_parameters_list() const

Returns list of pointers to LookupParameterSorages.

You shouldn’t need to use this

Return
List of pointers to LookupParameterSorages

size_t parameter_count() const

Returns the total number of tunable parameters (i. e. scalars) contained within this model.

That is to say, a 2x2 matrix counts as four parameters.

Return
Number of parameters

size_t updated_parameter_count() const

Returns total number of (scalar) parameters updated.

Return
number of updated parameters

void set_updated_param(const Parameter *p, bool status)

[brief description]

[long description]

Parameters
• p: [description]
• status: [description]

void set_updated_lookup_param(const LookupParameter *p, bool status)

[brief description]

[long description]

Parameters
• p: [description]
• status: [description]

bool is_updated_param(const Parameter *p)

[brief description]

[long description]

Return
[description]
Parameters
• p: [description]

bool is_updated_lookup_param(const LookupParameter *p)

[brief description]

[long description]

Return
[description]
Parameters
• p: [description]

ParameterCollection add_subcollection(const std::string &name = "", float weight_decay_lambda = -1)

This will allow you to add a ParameterCollection that is a (possibly named) subset of the original collection. This is useful if you want to save/load/update only part of the parameters in the model.

Return
The subcollection
Parameters
• name:
• weight_decay_lambda: if negative/omitted, inherit from parent.

size_t size()

Get size.

Get the number of parameters in the ParameterCollection

std::string get_fullname() const

get namespace of current ParameterCollection object(end with a slash)

L2WeightDecay &get_weight_decay()

Get the weight decay object.

float get_weight_decay_lambda() const

Get the weight decay lambda value.

struct ParameterInit
#include <param-init.h>

Initializers for parameters.

Allows for custom parameter initialization

Public Functions

ParameterInit()

Default constructor.

virtual void initialize_params(Tensor &values) const = 0

Function called upon initialization.

Whenever you inherit this struct to implement your own custom initializer, this is the function you want to overload to implement your logic.

Parameters
• values: The tensor to be initialized. You should modify it in-place. See dynet/model.cc for some examples

struct ParameterInitNormal : public dynet::ParameterInit
#include <param-init.h>

Initialize parameters with samples from a normal distribution.

Public Functions

ParameterInitNormal(float m = 0.0f, float v = 1.0f)

Constructor.

Parameters
• m: Mean of the gaussian distribution
• v: Variance of the gaussian distribution (reminder : the variance is the square of the standard deviation)

struct ParameterInitUniform : public dynet::ParameterInit
#include <param-init.h>

Initialize parameters with samples from a uniform distribution.

Public Functions

ParameterInitUniform(float scale)

Constructor for uniform distribution centered on 0.

[long description]Samples parameters from $$mathcal U([-\mathrm{scale},+\mathrm{scale}]$$

Parameters
• scale: Scale of the distribution

ParameterInitUniform(float l, float r)

Constructor for uniform distribution in a specific interval.

[long description]

Parameters
• l: Lower bound of the interval
• r: Upper bound of the interval

struct ParameterInitConst : public dynet::ParameterInit
#include <param-init.h>

Initialize parameters with a constant value.

Public Functions

ParameterInitConst(float c)

Constructor.

Parameters
• c: Constant value

struct ParameterInitIdentity : public dynet::ParameterInit
#include <param-init.h>

Initialize as the identity.

This will raise an exception if used on non square matrices

Public Functions

ParameterInitIdentity()

Constructor.

struct ParameterInitGlorot : public dynet::ParameterInit
#include <param-init.h>

Initialize with the methods described in Glorot, 2010

In order to preserve the variance of the forward and backward flow across layers, the parameters $$\theta$$ are initialized such that $$\mathrm{Var}(\theta)=\frac 2 {n_1+n_2}$$ where $$n_1,n_2$$ are the input and output dim.

In the case of 4d tensors (common in convolutional networks) of shape $$XH,XW,XC,N$$ the weights are sampled from $$\mathcal U([-g\sqrt{\frac 6 {d}},g\sqrt{ \frac 6 {d}}])$$ where $$d = XC * (XH * XW) + N * (XH * XW)$$ Important note : The underlying distribution is uniform (not gaussian)

Note: This is also known as Xavier initialization

Public Functions

ParameterInitGlorot(bool is_lookup = false, float gain = 1.f)

Constructor.

Parameters
• is_lookup: Boolean value identifying the parameter as a LookupParameter
• gain: Scaling parameter. In order for the Glorot initialization to be correct, you should ût this equal to $$\frac 1 {f'(0)}$$ where $$f$$ is your activation function

struct ParameterInitSaxe : public dynet::ParameterInit
#include <param-init.h>

Initializes according to Saxe et al., 2014

Initializes as a random orthogonal matrix (unimplemented for GPU)

Public Functions

ParameterInitSaxe(float gain = 1.0)

Constructor.

struct ParameterInitFromFile : public dynet::ParameterInit
#include <param-init.h>

Initializes from a file.

Useful for reusing weights, etc…

Public Functions

ParameterInitFromFile(std::string f)

Constructor.

Parameters
• f: File name (format should just be a list of values)

struct ParameterInitFromVector : public dynet::ParameterInit
#include <param-init.h>

Initializes from a std::vector of floats.

Public Functions

ParameterInitFromVector(std::vector<float> v)

Constructor.

Parameters
• v: Vector of values to be used

## Tensor¶

Tensor objects provide a bridge between C++ data structures and Eigen Tensors for multidimensional data.

Concretely, as an end user you will obtain a tensor object after calling .value() on an expression. You can then use functions described below to convert these tensors to float s, arrays of float s, to save and load the values, etc…

Conversely, when implementing low level nodes (e.g. for new operations), you will need to retrieve Eigen tensors from DyNet tensors in order to perform efficient computation.

vector<Eigen::DenseIndex> dynet::as_vector(const IndexTensor &v)

Get the array of indices in an index tensor.

For higher order tensors this returns the flattened value

Return
Index values
Parameters
• v: Input index tensor

std::ostream &dynet::operator<<(std::ostream &os, const Tensor &t)

You can use cout<<tensor; for debugging or saving.

Parameters
• os: output stream
• t: Tensor

real dynet::as_scalar(const Tensor &t)

Get a scalar value from an order 0 tensor.

Throws an runtime_error exception if the tensor has more than one element.

TODO : Change for custom invalid dimension exception maybe?

Return
Scalar value
Parameters
• t: Input tensor

std::vector<real> dynet::as_vector(const Tensor &v)

Get the array of values in the tensor.

For higher order tensors this returns the flattened value

Return
Values
Parameters
• v: Input tensor

std::vector<real> dynet::as_scale_vector(const Tensor &v, float a)

Get the array of values in the scaled tensor.

For higher order tensors this returns the flattened value

Return
Values
Parameters
• v: Input tensor
• a: Scale factor

real dynet::rand01()

This is a helper function to sample uniformly in $$[0,1]$$.

Return
$$x\sim\mathcal U([0,1])$$

int dynet::rand0n(int n)

This is a helper function to sample uniformly in $$\{0,\dots,n-1\}$$.

Return
$$x\sim\mathcal U(\{0,\dots,n-1\})$$
Parameters
• n: Upper bound (excluded)

real dynet::rand_normal()

This is a helper function to sample from a normalized gaussian distribution.

Return
$$x\sim\mathcal N(0,1)$$

struct IndexTensor
#include <index-tensor.h>

Represents a tensor of indices.

This holds indices to locations within a dimension or tensor.

Public Functions

IndexTensor()

Create an empty tensor.

IndexTensor(const Dim &d, Eigen::DenseIndex *v, Device *dev, DeviceMempool mem)

Creates a tensor.

[long description]

Parameters
• d: Shape of the tensor
• v: Pointer to the values
• dev: Device
• mem: Memory pool

Public Members

Dim d

Shape of tensor

Eigen::DenseIndex *v

Pointer to memory

struct Tensor
#include <tensor.h>

Represents a tensor of any order.

This provides a bridge between classic C++ types and Eigen tensors.

Public Functions

Tensor()

Create an empty tensor.

Tensor(const Dim &d, float *v, Device *dev, DeviceMempool mem)

Creates a tensor.

[long description]

Parameters
• d: Shape of the tensor
• v: Pointer to the values
• dev: Device
• mem: Memory pool

float *batch_ptr(unsigned bid)

Get the pointer for a particular batch.

Automatically broadcasting if the size is zero

Return
Pointer to the memory where the batch values are located
Parameters
• bid: Batch id requested

bool is_valid() const

Check for NaNs and infinite values.

This is very slow: use sparingly (it’s linear in the number of elements). This raises a std::runtime_error exception if the Tensor is on GPU because it’s not implemented yet

Return
Whether the tensor contains any invalid value

Tensor batch_elem(unsigned b) const

Get a Tensor object representing a single batch.

If this tensor only has a single batch, then broadcast. Otherwise, check to make sure that the requested batch is smaller than the number of batches.

TODO: This is a bit wasteful, as it re-calculates bs.batch_size() every time.

Return
Sub tensor at batch b
Parameters
• b: Batch id

std::vector<Tensor> batch_elems() const

Get tensors for all batches.

Return
List of the tensors in each batch

Public Members

Dim d

Shape of tensor

float *v

Pointer to memory

struct TensorTools
#include <tensor.h>

Provides tools for creating, accessing, copying and modifying tensors (in-place)

Public Static Functions

void clip(Tensor &d, float left, float right)

Clip the values in the tensor to a fixed range.

Parameters
• d: Tensor to modify
• left: Target minimum value
• right: Target maximum value

void scale(Tensor &x, float left, float right)

Do an elementwise linear transform of values a*x + b.

Parameters
• x: Tensor to modify
• a: The value to multiply by
• b: The value to add

void uniform_to_bernoulli(Tensor &x, float p)

Take a tensor of Uniform(0,1) sampled variables and turn them into Bernoulli(p) variables.

Parameters
• x: Tensor to modify
• p: The bernoulli probability

void constant(Tensor &d, float c)

Fills the tensor with a constant value.

Parameters
• d: Tensor to modify
• c: Target value

void zero(Tensor &d)

Fills a tensor with zeros.

Parameters
• d: Input tensor

void identity(Tensor &val)

Set the (order 2) tensor as the identity matrix.

this throws a runtime_error exception if the tensor isn’t a square matrix

Parameters
• val: Input tensor

void randomize_bernoulli(Tensor &val, real p, real scale = 1.0f)

Fill the tensor with bernoulli random variables and scale them by scale.

Parameters
• val: Input tensor
• p: Parameter of the bernoulli distribution
• scale: Scale of the random variables

void randomize_normal(Tensor &val, real mean = 0.0f, real stddev = 1.0f)

Fill the tensor with gaussian random variables.

Parameters
• val: Input tensor
• mean: Mean
• stddev: Standard deviation

void randomize_uniform(Tensor &val, real left = 0.0f, real right = 1.0f)

Fill the tensor with uniform random variables.

Parameters
• val: Input tensor
• left: Left bound of the interval
• right: Right bound of the interval

void randomize_orthonormal(Tensor &val, real scale = 1.0f)

Takes a square matrix tensor and sets it as a random orthonormal matrix.

More specifically this samples a random matrix with RandomizeUniform and then performs SVD and returns the left orthonormal matrix in the decomposition, scaled by scale

Parameters
• val: Input tensor
• scale: Value to which the resulting orthonormal matrix will be scaled

float access_element(const Tensor &v, int index)

Access element of the tensor by index in the values array.

AccessElement and SetElement are very, very slow (potentially) - use appropriately

Return
v.v[index]
Parameters
• v: Tensor
• index: Index in the memory

float access_element(const Tensor &v, const Dim &index)

Access element of the tensor by indices in the various dimension.

This only works for matrix shaped tensors (+ batch dimension). AccessElement and SetElement are very, very slow (potentially) - use appropriately

Return
(*v)(index[0], index[1])
Parameters
• v: Tensor
• index: Indices in the tensor

void set_element(const Tensor &v, int index, float value)

Set element of the tensor by index in the values array.

AccessElement and SetElement are very, very slow (potentially) - use appropriately

Parameters
• v: Tensor
• index: Index in the memory
• value: Desired value

void copy_element(const Tensor &l, int lindex, Tensor &r, int rindex)

Copy element from one tensor to another (by index in the values array)

Parameters
• l: Source tensor
• lindex: Source index
• r: Target tensor
• rindex: Target index

void set_elements(const Tensor &v, const std::vector<float> &vec)

Set the elements of a tensor with an array of values.

(This uses memcpy so be careful)

Parameters
• v: Input Tensor
• vec: Values

void copy_elements(Tensor &v, const Tensor &v_src)

Copy one tensor into another.

Parameters
• v: Target tensor
• v_src: Source tensor

void accumulate(Tensor &v, const Tensor &v_src)

Accumulate the values of one tensor into another.

Parameters
• v: Target tensor
• v_src: Source tensor

void logsumexp(const Tensor &x, Tensor &m, Tensor &z, unsigned d = 0)

Calculate the logsumexp function over all columns of the tensor.

Parameters
• x: The input tensor
• m: A tensor of scratch memory to hold the maximum values of each column
• z: The output tensor

IndexTensor argmax(const Tensor &v, unsigned dim = 0, unsigned num = 1)

Calculate the index of the maximum value.

Return
A newly allocated LongTensor consisting of argmax IDs. The length of the dimension “dim” will be “num”, consisting of the appropriate IDs.
Parameters
• v: A tensor where each row represents a probability distribution
• dim: Which dimension to take the argmax over
• num: The number of kmax values

IndexTensor categorical_sample_log_prob(const Tensor &v, unsigned dim = 0, unsigned num = 1)

Calculate samples from a log probability.

Return
A newly allocated LongTensor consisting of argmax IDs. The length of the dimension “dim” will be “num”, consisting of the appropriate IDs.
Parameters
• v: A tensor where each row represents a log probability distribution
• dim: Which dimension to take the sample over
• num: The number of samples for each row

std::pair<Tensor, IndexTensor> topk(const Tensor &v, unsigned dim = 0, unsigned num = 1)

Calculate the k-max values and their indexes.

Return
A newly allocated pair<Tensor, LongTensor> consisting of argmax Vals/IDs. The length of the dimension “dim” will be “num”, consisting of the appropriate Vals/IDs.
Parameters
• v: A tensor where each row represents a probability distribution
• dim: Which dimension to take the kmax over
• num: The number of kmax values

## Dimensions¶

The Dim class holds information on the shape of a tensor. As explained in Unorthodox Design, in DyNet the dimensions are represented as the standard dimension + the batch dimension, which makes batched computation transparent.

DYNET_MAX_TENSOR_DIM

Maximum number of dimensions supported by dynet : 7

struct Dim
#include <dim.h>

The Dim struct stores information about the dimensionality of expressions.

Batch dimension is treated separately from standard dimension.

Public Functions

Dim()

Default constructor.

Dim(std::initializer_list<unsigned int> x)

Initialize from a list of dimensions.

The batch dimension is 1 in this case (non-batched expression)

Parameters
• x: List of dimensions

Dim(std::initializer_list<unsigned int> x, unsigned int b)

Initialize from a list of dimensions and a batch size.

Parameters
• x: List of dimensions
• b: Batch size

Dim(const std::vector<long> &x)

Initialize from a vector of dimensions.

The batch dimension is 1 in this case (non-batched expression)

Parameters
• x: Array of dimensions

Dim(const std::vector<long> &x, unsigned int b)

Initialize from a vector of dimensions and a batch size.

Parameters
• x: Vector of dimensions
• b: Batch size

unsigned int size() const

Total size of a batch.

Return
Batch size * size of a batch

unsigned int batch_size() const

Size of a batch (product of all dimensions)

Return
Size of a batch

unsigned int sum_dims() const

Sum of all dimensions within a batch.

Return
Sum of the dimensions within a batch

Dim truncate() const

remove trailing dimensions of 1

iterate all the dimensions of Dim, stop at last dimension of 1

Return
truncated dimension

Dim single_batch() const

Set the batch dimension to 1.

Return
1-batch version of this instance

void resize(unsigned int i)

Change the number of dimensions.

Parameters
• int: New number of dimensions

unsigned int ndims() const

Get number of dimensions.

Return
Number of dimensions

unsigned int rows() const

Size of the first dimension.

Return
Size of the first dimension

unsigned int num_nonone_dims() const

Number of non-one dimensions.

Return
Number of non-one dimensions

unsigned int cols() const

Size of the second dimension (or 1 if only one dimension)

Return
Size of the second dimension (or 1 if only one dimension)

unsigned int batch_elems() const

Batch dimension.

Return
Batch dimension

void set(unsigned int i, unsigned int s)

Set specific dimension.

Set the value of a specific dimension to an arbitrary value

Parameters
• i: Dimension index
• s: Dimension size

unsigned int operator[](unsigned int i) const

Access a specific dimension as you would access an array element.

Return
Size of dimension i
Parameters
• i: Dimension index

unsigned int size(unsigned int i) const

Size of dimension i.

Return
Size of dimension i
Parameters
• i: Dimension index

void delete_dim(unsigned int i)

Remove one of the dimensions.

Parameters
• i: index of the dimension to be removed

void delete_dims(std::vector<unsigned int> dims, bool reduce_batch)

Remove multi-dimensions.

Parameters
• dims: dimensions to be removed
• reduce_batch: reduce the batch dimension or not

void add_dim(unsigned int n)

Insert a dimension to the end.

Parameters
• n: the size of the new dimension

void insert_dim(unsigned int i, unsigned int n)

Insert a dimension.

Parameters
• i: the index to insert the new dimension
• n: the size of the new dimension

Dim transpose() const

Transpose a vector or a matrix.

This raises an invalid_argument exception on tensors with more than 2 dimensions

Return
The transposed Dim structure

void print_profile(std::ostream &out) const

Print the unbatched profile as a string.

Public Members

unsigned int d[DYNET_MAX_TENSOR_DIM]

Array of dimension

unsigned int nd

Number of dimensions

unsigned int bd

Batch dimension