Python Reference Manual¶
DyNet global parameters¶
DynetParams¶
-
class
dynet.
DynetParams
¶ This object holds the global parameters of Dynet
This is useful if you want to specify the global dynet parameters (memory, random seed…) programmatically, for example in a notebook.
import _dynet
You can then declare and use a
DynetParams
object# Declare a DynetParams object dyparams = dy.DynetParams() # Fetch the command line arguments (optional) dyparams.from_args() # Set some parameters manualy (see the command line arguments documentation) dyparams.set_mem(2048) dyparams.set_random_seed(666) # Initialize with the given parameters dyparams.init() # or init_from_params(dyparams)
You can also use
dynet_config
object in your script to specify the device usage and the global dynet parameters (memory, random seed…) beforeimport dynet
:import dynet_config # Declare GPU as the default device type dynet_config.set_gpu() # Set some parameters manualy dynet_config.set(mem=4,random_seed=9) # Initialize dynet import using above configuration in the current scope import dynet
Don’t forget to initialize with
dyparams.init()
, otherwise dynet will raise an error.-
from_args
(shared_parameters=None)¶ Gets parameters from the command line arguments
You can still modify the parameters after calling this. See the documentation about command line arguments for more details
Keyword Arguments: shared_parameters ([type]) – [description] (default: None)
-
from_config
(conf)¶ Set parameters from config object:
- Attributes of conf object:
- mem, seed, autobatch, profiling, weight_decay, shared_params, requested_gpus, gpu_mask
-
init
()¶ Initialize dynet with the current dynetparams object.
This is one way, you can’t uninitialize dynet
-
set_autobatch
(autobatch)¶ Activate autobatching
Parameters: autobatch (bool) – Set to True
to activate autobatching
-
set_mem
(mem)¶ Set the memory allocated to dynet
The unit is MB
Parameters: mem (number) – memory size in MB
-
set_profiling
(profiling)¶ Activate autobatching debug
Parameters: profiling (int) – Set to a value > 0 to activate profiling
-
set_random_seed
(random_seed)¶ Set random seed for dynet
Parameters: random_seed (number) – Random seed
-
set_requested_gpus
(requested_gpus)¶ Number of requested gpus
Parameters: requested_gpus (number) – number of requested gpus
Shared parameters
Parameters: shared_parameters (bool) – shared parameters
-
set_weight_decay
(weight_decay)¶ Set weight decay parameter
Parameters: weight_decay (float) – weight decay parameter
-
Initialization functions¶
-
dynet.
init
(shared_parameters=None)¶ Initialize dynet
Initializes dynet from command line arguments. Do not use after import dynet
Keyword Arguments: shared_parameters (bool) – [description] (default: None)
-
dynet.
init_from_params
(params)¶ Initialize from DynetParams
Same as
params.init()Parameters: params (DynetParams) – dynet parameters
-
dynet.
reset_random_seed
(seed)¶ Resets the random seed and the random number generator
Parameters: seed (int) – The new random seed
ParameterCollection and Parameters¶
ParameterCollection¶
-
class
dynet.
ParameterCollection
(parent=None)¶ A ParameterCollection holds Parameters. Use it to create, load and save parameters.
(It used to be called Model in previous versions of DyNet, and Model is still an alias for ParameterCollection.)
A ParameterCollection is a container for Parameters and LookupParameters.
dynet.Trainer objects take ParameterCollection objects that define which parameters are being trained.
The values of the parameters in a collection can be persisted to and loaded from files.
- Hierarchy:
- The parameter collections can be nested, where each collection can hold zero or more sub-collection, which are also ParameterCollection objects. Each (sub-)collection contains the parameters in it and in all the (sub-)collections below it.
- Naming:
Parameters, LookupParameters and ParameterCollections have associated string names. The names can be accessed using the .name() method.
The names are used for identifying the parameters and the collection hierarchy when loading from disk, and in particular when loading only a subset of the objects in a saved file.
The name of a parameter, lookup parameter or sub-collection is unique within a ParameterCollection, and reflects the hierarchy structure.
One can supply an optional informative name when creating the parameter or sub-collection. The supplied names are then appended with running index to avoid name clashes. The .name() method returns the full name of an object, including the appended index and its location within the collection hierarchy. The user-supplied names cannot inclue the characters / (which is used as a hierarchy separator) or
_
(which is used as an index separator).
-
add_lookup_parameters
(dim, init=None, name='', device='', scale=1.0, mean=0.0, std=1.0)¶ Add a lookup parameter to the ParameterCollection with a given initializer
lp = m.add_lookup_parameters((3,5), init=0) # Creates 3 vectors of dimension 5 filled with zeros lp = m.add_lookup_parameters((3,5), init='uniform', scale=a) # Creates 3 vectors of dimension 5 initialized with U([-a,a]) lp = m.add_lookup_parameters((3,5), init='normal', mean=a, std=b) # Creates 3 vectors of dimension 5 initialized with N(a, b) lp = m.add_lookup_parameters((3,5), init='glorot') # Creates 3 vectors of dimension 5 with glorot init lp = m.add_lookup_parameters((3,5), init='he') # Creates 3 vectors of dimension 5 with he init arr = np.zeros((3, 5)) lp = m.add_lookup_parameters(arr.shape, init=arr) # Creates 3 vectors of dimension 5 from a numpy array (first dimension is the lookup dimension) lp = m.add_lookup_parameters((3,5), init=dy.PyInitializer()) # Any parameter initializer
Parameters: dim (tuple, np.ndarray) – Shape of the parameter. The first dimension is the lookup dimension (number of records in the lookup table).
Keyword Arguments: - init (number, string, dynet.PyInitializer, np.ndarray) – Initializer, see description for details (default: GlorotInitializer)
- name (string) – Optional name for this parameter (default: “”)
- device (string) – Optional device name for this parameter (default: “”, default device)
- scale (number) – Scale for uniform initialization
- mean (number) – Mean for normal initialization
- std (number) – Standard deviation for normal initialization
Returns: Created LookupParameter
Return type:
-
add_parameters
(dim, init=None, name='', device='', scale=1.0, mean=0.0, std=1.0)¶ Add a parameter to the ParameterCollection with a given initializer. There are different ways of specifying an initializer:
p = m.add_parameters((3,5), init=0) # Creates 3x5 matrix filled with 0 (or any other float) p = m.add_parameters((3,5), init='uniform', scale=a) # Creates 3x5 matrix initialized with U([-a,a]) p = m.add_parameters((3,5), init='normal', mean=a, std=b) # Creates 3x5 matrix initialized with N(a, b) p = m.add_parameters((5,5), init='identity') # Creates 5x5 identity matrix p = m.add_parameters((5,5), init='saxe') # Creates 5x5 orthogonal matrix (NOT SUPPORTED YET) p = m.add_parameters((3,5), init='glorot') # Creates 3x5 matrix with glorot init p = m.add_parameters((3,5), init='he') # Creates 3x5 matrix with he init arr = np.zeros((3, 5) p = m.add_parameters(arr.shape, init=arr) # Creates 3x5 matrix from a numpy array p = m.add_parameters((3,5), init=dy.PyInitializer()) # Any parameter initializer
Parameters: dim (tuple, np.ndarray) – Shape of the parameter.
Keyword Arguments: - init (number, string, dynet.PyInitializer, np.ndarray) – Initializer, see description for details (default: GlorotInitializer)
- name (string) – Optional name for this parameter (default: “”)
- device (string) – Optional device name for this parameter (default: “”, default device)
- scale (number) – Scale for uniform initialization
- mean (number) – Mean for normal initialization
- std (number) – Standard deviation for normal initialization
Returns: Created Parameter
Return type:
-
add_subcollection
(name=None)¶ Creates a sub-collection of the current collection, and returns it.
A sub-collection is simply a ParameterCollection object which is tied to a parent collection. ParameterCollections can be nested to arbitraty depth.
Sub-collections are used for grouping of parameters, for example if one wants to train only a subset of the parameters, one can add them in a subcollection and pass the subcollection to a trainer. Similarly, for saving (or loading) only some of the parameters, one can save/populate a sub-collection.
Sub-collections are used inside builder objects (such as the LSTMBuilder): The builder creates a local sub-collection and adds parameters to it instead of to the global collection that is passed to it in the constructor. This way, the parameters participating in the builder are logically grouped, and can be saved/loaded/trained seperately if needed.
Parameters: name (string) – an optional name for the sub-collection. Keyword Arguments: name (string) – Optional name for this sub-collection (default: “”) Returns: (dynet.ParameterCollection) a parameter collection.
-
get_weight_decay
()¶ Get the weight decay lambda value.
-
load_lookup_param
(fname, key)¶ Loads a named lookup-parameter from a file, adds it to the collection, and returns the loaded parameter.
Parameters: - fname (string) – the file name to read from.
- key (string) – the full-name of the lookup parameter to read.
Returns: (dynet.LookupParameters) The LookupParameters object.
-
load_param
(fname, key)¶ Loads a named parameter from a file, adds it to the collection, and returns the loaded parameter.
Parameters: - fname (string) – the file name to read from.
- key (string) – the full-name of the parameter to read.
Returns: (dynet.Parameters) The Parameters object.
-
lookup_parameters_from_numpy
(array, name='', device='')¶ Create LookupParameters from numpy array
Parameters: - array (np.ndarray) – Numpy array. rows: vocab_size, cols: dims.
- name (string) – optional name for this parameter.
- device (string) – Optional device name for this parameter (default: “”, default device)
Returns: LookupParameter
Return type:
-
lookup_parameters_list
()¶ Returns list of all looku parameters in the collection
Returns: All dy.LookupParameters in the collection Return type: (list)
-
name
()¶ Return the full name of this collection.
-
parameters_from_numpy
(array, name='', device='')¶ Create parameter from numpy array
Parameters: - array (np.ndarray) – Numpy array
- name (string) – optional name for this parameter.
- device (string) – Optional device name for this parameter (default: “”, default device)
Returns: Parameter
Return type:
-
parameters_list
()¶ Returns list of all parameters in the collection
Returns: All dy.Parameters in the collection Return type: (list)
-
populate
(fname, key='')¶ Populate the values of all parameters in this collection from file.
This only populates the values of existing parameters, and does not add parameters to the collection. Thus, the content of the file and the parameters in this collection must match. One should make sure to add to the collection the same parameters (and in the same order) before calling populate, as the ones that were added before calling save.
Parameters: fname (string) – file name to read parameter values from.
-
save
(fname, name='', append=False)¶ Save the values of all parameters in this collection to file.
Parameters: fname (string) – file name to save into.
-
set_weight_decay
(lam)¶ Set the weight decay coefficient.
Parameters: lam (float) – Weight decay coefficient
-
set_weight_decay_lambda
(lam)¶ Set the weight decay coefficient. (alias to set_weight_decay)
Parameters: lam (float) – Weight decay coefficient
Parameters and LookupParameters¶
-
class
dynet.
Parameters
¶ Bases:
dynet.Expression
Parameters class
Parameters are things that are optimized. in contrast to a system like Torch where computational modules may have their own parameters, in DyNet parameters are just parameters.
-
as_array
()¶ Return as a numpy array.
Returns: values of the parameter Return type: np.ndarray
-
clip_inplace
(left, right)¶ Clip the values in the parameter to a fixed range [left, right] (in place)
Parameters: arr (np.ndarray) – Scale
-
expr
(update=False)¶ Returns the parameter as an expression.
This is useful if you want to return a constant version of the parameter by setting
update=False
. More precisely,W.expr(update)
Will return the same thing as
W if update else dy.const_parameter(W)
Parameters: update (bool) – If this is set to False, the parameter won’t be updated during the backward pass Returns: Expression of the parameter Return type: Expression
-
grad_as_array
()¶ Return gradient as a numpy array.
Returns: values of the gradient w.r.t. this parameter Return type: np.ndarray
-
is_updated
()¶ check whether the parameter is updated or not
Returns: Update status Return type: bool
-
name
()¶ Return the full name of this parameter.
-
populate
(fname, key)¶ Populate the values of this Parameters object from the parameter named key in the file fname. The sizes of saved parameters and this object must match.
Parameters: - fname (string) – the name of a file to load from.
- key (string) – the parameter to read from the file.
-
scale
(s)¶ Scales the parameter
Parameters: s (float) – Scale
-
scale_gradient
(s)¶ Scales the gradient
Parameters: s (float) – Scale
-
set_updated
(b)¶ Set parameter as “updated”
Parameters: b (bool) – updated status
-
set_value
(arr)¶ Set value of the parameter
-
shape
()¶ Returns shape of the parameter
Returns: Shape of the parameter Return type: tuple
-
zero
()¶ Set the parameter to zero
-
-
class
dynet.
LookupParameters
¶ Bases:
dynet.Expression
LookupParameters represents a table of parameters.
They are used to embed a set of discrete objects (e.g. word embeddings). These are sparsely updated.
-
as_array
()¶ Return as a numpy array.
The first dimension is the lookup dimension
Returns: Values Return type: np.array
-
batch
(i)¶ Returns a batched expression based on looked up indices
This does the same as
dynet.lookup_batch
Parameters: i (list) – list of indices Returns: Batched expression fo batch dimension len(i)
Return type: dynet.Expression
-
grad_as_array
()¶ Return gradients as a numpy array.
The first dimension is the lookup dimension
Returns: gradient values Return type: np.array
-
init_from_array
(arr)¶ Initializes the values according to a numpy array
Preferably uses ParameterCollection.lookup_parameter_from_numpy when possible
Parameters: arr (np.array) – numpy array of shape (num_lookups,...)
-
init_row
(i, row)¶ Initialize one row with values
Parameters: - i (int) – index
- row (list) – values
-
name
()¶ Return the full name of this lookup parameter.
-
populate
(fname, key='')¶ Populate the values of this LookupParameters object from the parameter named key in the file fname. The sizes of saved parameters and this object must match.
Parameters: - fname (string) – the name of a file to load from.
- key (string) – the parameter to read from the file.
-
row_as_array
(row)¶ Return row as a numpy array.
Parameters: row (int) – row to return Returns: Values Return type: np.array
-
row_grad_as_array
(row)¶ Return row gradient as a numpy array.
Parameters: row (int) – row to return Returns: Values Return type: np.array
-
rows_as_array
(rows)¶ Return rows as a numpy array.
The first dimension is the lookup dimension
Parameters: rows (list) – rows to return Returns: Values Return type: np.array
-
rows_grad_as_array
(rows)¶ Return rows gradients as a numpy array.
The first dimension is the lookup dimension
Parameters: rows (list) – rows to return Returns: Values Return type: np.array
-
save
(fname, key='', append=False)¶ Save the values of this LookupParameters object to a particular file.
TODO: more docs. Refer to the tutorial for more info for now
Parameters: - fname (string) – the name of a file to save to.
- key (string) – TODO
-
scale
(s)¶ Scales the parameter
Parameters: s (float) – Scale
-
scale_gradient
(s)¶ Scales the gradient
Parameters: s (float) – Scale
-
shape
()¶ Returns shape of the lookup parameter
The first dimension is the lookup dimension
Returns: Shape of the parameter Return type: tuple
-
zero
()¶ Set all values to zero
-
Parameters initializers¶
-
class
dynet.
PyInitializer
¶ Base class for parameter initializer
-
class
dynet.
NormalInitializer
(mean=0, var=1)¶ Bases:
dynet.PyInitializer
Initialize the parameters with a gaussian distribution
Keyword Arguments: - mean (number) – Mean of the distribution (default: 0)
- var (number) – Variance of the distribution (default: 1)
-
class
dynet.
UniformInitializer
(scale)¶ Bases:
dynet.PyInitializer
Initialize the parameters with a uniform distribution
Parameters: scale (number) – Parmeters are sampled from \(\mathcal U([-\texttt{scale},\texttt{scale}])\)
-
class
dynet.
ConstInitializer
(c)¶ Bases:
dynet.PyInitializer
Initialize the parameters with a constant value
Parameters: c (number) – Value to initialize the parameters
-
class
dynet.
IdentityInitializer
¶ Bases:
dynet.PyInitializer
Initialize the parameters as the identity
Only works with square matrices
-
class
dynet.
GlorotInitializer
(is_lookup=False, gain=1.0)¶ Bases:
dynet.PyInitializer
Initializes the weights according to Glorot & Bengio (2011)
If the dimensions of the parameter matrix are \(m,n\), the weights are sampled from \(\mathcal U([-g\sqrt{\frac{6}{m+n}},g\sqrt{\frac{6}{m+n}}])\)
In the case of 4d tensors (common in convolutional networks) of shape \(XH,XW,XC,N\) the weights are sampled from \(\mathcal U([-g\sqrt{\frac{6}{d}},g\sqrt{\frac{6}{d}}])\) where \(d = XC * (XH * XW) + N * (XH * XW)\)
The gain \(g\) depends on the activation function :
- \(\text{tanh}\) : 1.0
- \(\text{ReLU}\) : 0.5
- \(\text{sigmoid}\) : 4.0
- Any smooth function \(f\) : \(\frac{1}{f'(0)}\)
Note: This is also known as Xavier initialization
Keyword Arguments: - is_lookup (bool) – Whether the parameter is alookup parameter (default: False)
- gain (number) – Gain (Depends on the activation function) (default: 1.0)
-
class
dynet.
SaxeInitializer
(scale=1.0)¶ Bases:
dynet.PyInitializer
Initializes according to Saxe et al. (2014)
- Initializes as a random orthonormal matrix (unimplemented for GPU)
- Keyword Arguments:
- scale (number): scale to apply to the orthonormal matrix
-
class
dynet.
FromFileInitializer
(fname)¶ Bases:
dynet.PyInitializer
Initialize parameter from file
Parameters: fname (str) – File name
-
class
dynet.
NumpyInitializer
(array)¶ Bases:
dynet.PyInitializer
Initialize from numpy array
Alternatively, use
ParameterCollection.parameters_from_numpy()
Parameters: array (np.ndarray) – Numpy array
High level saving/loading¶
-
dynet.
save
(basename, objects)¶ Saves a list of parameters, lookup parameters and builder objects to disk.
Parameters: - basename (string) – The base-name of the files to save. Two files will be created: basename.data and basename.meta.
- objects (iterable) – An iterable of objects to save (see below).
Example
import dynet as dy
pc = dy.ParameterCollection() W = pc.add_parameters((100,50)) E = pc.add_lookup_parameters((1000,50)) builder = dy.LSTMBuilder(2, 50, 50, pc)
dy.save(“model”, [E, builder, W])
# then, when loading: pc = dy.ParameterCollection() E2, builder2, W2 = dy.load(“model”, pc)
- What can be saved:
Each object in objects must be one of the following:
- Parameter
- LookupParameter
- one of the built-in types (CompactVanillaLSTMBuilder, VanillaLSTMBuilder, LSTMBuilder, GRUBuilder,
- SimpleRNNBuilder, BiRNNBuilder)
- a type adhering to the following interface:
- has a param_collection() method returning a ParameterCollection object with the parameters in the object.
- has a .spec property with picklable items describing the object
- has a .from_spec(spec, model) static method that will create and return a new instane with the needed parameters/etc in the model.
Note, the built-in types in (3) above can be saved/loaded this way simply because they support this interface.
behind the scenes:
- for each item, we write to .meta:
- if it is a Parameters/ParameterCollection:
- its type and full name.
- if it is a builder:
- its class, its spec, the full name of its parameters collection.
- the associated parameters/sub-collection is then saved to .data
-
dynet.
load
(basename, params)¶ Loads a list of parameters, lookup parameters and builder objects from disk. The loaded objects are added to the supplied parameter collection, and returned.
Parameters: - basename (string) – The basename to read from. This is the same string that was used when saving the objects.
- params (dynet.ParameterCollection) – A ParameterCollection to add the loaded objects to.
Returns: A list of parameters, lookup parameters and builder objects, in the same order they were passed to the save function.
Example
import dynet as dy
pc = dy.ParameterCollection() W = pc.add_parameters((100,50)) E = pc.add_lookup_parameters((1000,50)) builder = dy.LSTMBuilder(2, 50, 50, pc)
dy.save(“model”, [E, builder, W])
# then, when loading: pc = dy.ParameterCollection() E2, builder2, W2 = dy.load(“model”, pc)
Computation Graph¶
-
dynet.
renew_cg
(immediate_compute=False, check_validity=False, autobatching=None)¶ Renew the computation graph.
Call this before building any new computation graph
-
dynet.
cg_version
()¶ Version of the current computation graph
-
dynet.
print_text_graphviz
()¶
-
dynet.
cg_checkpoint
()¶ Saves the state of the computation graph
-
dynet.
cg_revert
()¶ Revert the computation graph state to the previous checkpoint
-
dynet.
cg
()¶ Get the current ComputationGraph
-
class
dynet.
ComputationGraph
¶ Computation graph object
While the ComputationGraph is central to the inner workings of DyNet, from the user’s perspective, the only responsibility is to create a new computation graph for each training example.
-
renew
(immediate_compute=False, check_validity=False, autobatching=None)¶ Same as
dynet.renew_cg()
-
version
()¶ Same as
dynet.cg_version()
-
Operations¶
Expressions¶
-
class
dynet.
Expression
¶ Expressions are the building block of a Dynet computation graph.
Expressions are the main data types being manipulated in a DyNet program. Each expression represents a sub-computation in a computation graph.
-
backward
(full=False)¶ Run the backward pass based on this expression
The parameter
full
specifies whether the gradients should be computed for all nodes (True
) or only non-constant nodes (False
).By default, a node is constant unless
- it is a parameter node
- it depends on a non-constant node
Thus, functions of constants and inputs are considered as constants.
Turn
full
on if you want to retrieve gradients w.r.t. inputs for instance. By default this is turned off, so that the backward pass ignores nodes which have no influence on gradients w.r.t. parameters for efficiency.Parameters: full (bool) – Whether to compute all gradients (including with respect to constant nodes).
-
dim
()¶ Dimension of the expression
Returns a tuple (dims,batch_dim) where dims is the tuple of dimensions of each batch element
Returns: dimension Return type: tuple
-
forward
(recalculate=False)¶ This runs incremental forward on the entire graph
May not be optimal in terms of efficiency. Prefer
values
Keyword Arguments: recalculate (bool) – Recalculate the computation graph (for static graphs with new inputs) (default: False)
-
gradient
()¶ Returns the value of the expression as a numpy array
The last dimension is the batch size (if it’s > 1).
Make sure to call
backward
on a downstream expression before calling this.If the Expression is a constant expression (meaning it’s not a function of a parameter), dynet won’t compute it’s gradient for the sake of efficiency. You need to manually force the gradient computation by adding the agument
full=True
tobackward
Returns: numpy array of values Return type: np.ndarray
-
npvalue
(recalculate=False)¶ Returns the value of the expression as a numpy array
The last dimension is the batch size (if it’s > 1)
Keyword Arguments: recalculate (bool) – Recalculate the computation graph (for static graphs with new inputs) (default: False) Returns: numpy array of values Return type: np.ndarray
-
scalar_value
(recalculate=False)¶ Returns value of an expression as a scalar
This only works if the expression is a scalar
Keyword Arguments: recalculate (bool) – Recalculate the computation graph (for static graphs with new inputs) (default: False) Returns: Scalar value of the expression Return type: float
-
tensor_value
(recalculate=False)¶ Returns the value of the expression as a Tensor.
This is useful if you want to use the value for other on-device calculations that are not part of the computation graph, i.e. using argmax.
Keyword Arguments: recalculate (bool) – Recalculate the computation graph (for static graphs with new inputs) (default: False) Returns: a dynet Tensor object. Return type: Tensor
-
value
(recalculate=False)¶ Gets the value of the expression in the most relevant format
this returns the same thing as
scalar_value
,vec_value
,npvalue
depending on whether the number of dimensions of the expression is 0, 1 or 2+Keyword Arguments: recalculate (bool) – Recalculate the computation graph (for static graphs with new inputs) (default: False) Returns: Value of the expression Return type: float, list, np.ndarray
-
vec_value
(recalculate=False)¶ Returns the value of the expression as a vector
In case of a multidimensional expression, the values are flattened according to a column major ordering
Keyword Arguments: recalculate (bool) – Recalculate the computation graph (for static graphs with new inputs) (default: False) Returns: Array of values Return type: list
-
Operations¶
Operations are used to build expressions
Input operations¶
-
dynet.
inputTensor
(arr, batched=False, device='', reusable_expr=False)¶ Creates a tensor expression based on a numpy array or a list.
The dimension is inferred from the shape of the input. if batched=True, the last dimension is used as a batch dimension if arr is a list of numpy ndarrays, this returns a batched expression where the batch elements are the elements of the list
Parameters: arr (list,np.ndarray) – Values : numpy ndarray OR list of np.ndarray OR multidimensional list of floats
Keyword Arguments: - batched (bool) – Whether to use the last dimension as a batch dimension (default: False)
- device (string) – Optional, device on which to create the expression.
Returns: Input expression
Return type: _vecInputExpression
Raises: TypeError – If the type is not respected
-
dynet.
sparse_inputTensor
(idxs, values, shape, batched=False, defval=0, device='')¶ Creates a tensor expression based on indices and values
The dimension is inferred from the shape of the input. if batched=True, the last dimension is used as a batch dimension if arr is a list of numpy ndarrays, this returns a batched expression where the batch elements are the elements of the list
Parameters: - idxs (tuple, list) – A tuple/list of integer arrays, one array for each dimension (including the batch dimension)
- values (list,np.ndarray) – A 1D array/list of values
- shape – The desired shape
Keyword Arguments: - batched (bool) – Whether to use the last dimension as a batch dimension (default: False). For example if
shape=(3, 3, 3)
andbatched=True
the resulting expression will be a batch of 3 3x3 matrices - defval (number) – The default value for all non specified coordinates (default: 0)
- device (string) – Optional, device on which to create the expression.
Returns: Input expression
Return type: _vecInputExpression
Raises: - TypeError – If the type is not respected
- ValueError – If the number of dimensions don’t match
-
dynet.
parameter
(*args)¶ Add parameters to the computation graph.
Get the expression objects corresponding to parameters. Gradients for parameters will be computed and used by Optimizers to update.
Parameters: - args – Parameter and LookupParameter objects to add to the computation
- graph. –
Returns: one expression for each input parameter.
Return type: Raises: NotImplementedError – Only works with Parameters and LookupParameters.
-
dynet.
const_parameter
(*args)¶ Add constant parameters to the computation graph.
Get the expression objects corresponding to parameters. Gradients for parameters will be NOT computed or used by Optimizers to update. To access parameters that should be updated (which is usually what you want), use parameter() instead.
Parameters: - args – Parameter and LookupParameter objects to add to the computation
- graph. –
Returns: one expression for each input parameter.
Return type: Raises: NotImplementedError – Only works with Parameters and LookupParameters.
-
dynet.
scalarInput
(s, device='')¶
-
dynet.
vecInput
(dim, device='')¶ Input an empty vector
Parameters: - dim (number) – Size
- device (string) – Optional, device on which to create the expression.
Returns: Corresponding expression
Return type: _vecInputExpression
-
dynet.
inputVector
(v, device='')¶ Input a vector by values
Parameters: - v (vector[float]) – Values
- device (string) – Optional, device on which to create the expression.
Returns: Corresponding expression
Return type: _vecInputExpression
-
dynet.
matInput
(d1, d2)¶ DEPRECATED : use inputTensor
TODO : remove this
Parameters: - d1 (int) – [description]
- d2 (int) – [description]
Returns: [description]
Return type:
-
dynet.
inputMatrix
(v, d)¶ DEPRECATED : use inputTensor
TODO : remove this
inputMatrix(vector[float] v, tuple d)
Create a matrix literal. First argument is a list of floats (or a flat numpy array). Second argument is a dimension. Returns: an expression. Usage example:
x = inputMatrix([1,2,3,4,5,6],(2,3)) x.npvalue() --> array([[ 1., 3., 5.], [ 2., 4., 6.]])
-
dynet.
lookup
(p, index=0, update=True)¶ Pick an embedding from a lookup parameter and returns it as a expression
param p: Lookup parameter to pick from type p: LookupParameters Keyword Arguments: - index (number) – Lookup index (default: 0)
- update (bool) – Whether to update the lookup parameter [(default: True)
Returns: Expression for the embedding
Return type: _lookupExpression
-
dynet.
lookup_batch
(p, indices, update=True)¶ Look up parameters.
The mini-batched version of lookup. The resulting expression will be a mini-batch of parameters, where the “i”th element of the batch corresponds to the parameters at the position specified by the “i”th element of “indices”
Parameters: - p (LookupParameters) – Lookup parameter to pick from
- indices (list(int)) – Indices to look up for each batch element
Keyword Arguments: update (bool) – Whether to update the lookup parameter (default: True)
Returns: Expression for the batched embeddings
Return type: _lookupBatchExpression
-
dynet.
zeros
(dim, batch_size=1)¶ Create an input full of zeros
Create an input full of zeros, sized according to dimensions
dim
Parameters: dim (tuple, int) – Dimension of the tensor Keyword Arguments: batch_size (number) – Batch size of the tensor (default: (1)) Returns: A d
dimensioned zero tensorReturn type: dynet.Expression
-
dynet.
ones
(dim, batch_size=1)¶ Create an input full of ones
Create an input full of ones, sized according to dimensions
dim
Parameters: dim (tuple, int) – Dimension of the tensor Keyword Arguments: batch_size (number) – Batch size of the tensor (default: (1)) Returns: A d
dimensioned zero tensorReturn type: dynet.Expression
-
dynet.
constant
(dim, val, batch_size=1)¶ Create an input full of
val
Create an input full of
val
, sized according to dimensionsdim
Parameters: - dim (tuple, int) – Dimension of the tensor
- val (number) – Value
Keyword Arguments: batch_size (number) – Batch size of the tensor (default: (1))
Returns: A
d
dimensioned tensor filled with valueval
Return type:
-
dynet.
random_normal
(dim, mean=0.0, stddev=1.0, batch_size=1)¶ Create a random normal vector
Create a vector distributed according to normal distribution with mean 0, variance 1.
Parameters: dim (tuple, int) – Dimension of the tensor
Keyword Arguments: - mean (float) – mean of the distribution (default: 0.0)
- stddev (float) – standard deviation of distribution (default: 1.0)
- batch_size (number) – Batch size of the tensor (default: (1))
Returns: A “d” dimensioned normally distributed tensor
Return type:
-
dynet.
random_bernoulli
(dim, p, scale=1.0, batch_size=1)¶ Create a random bernoulli tensor
Create a tensor distributed according to bernoulli distribution with parameter \(p\).
Parameters: - dim (tuple, int) – Dimension of the tensor
- p (number) – Parameter of the bernoulli distribution
Keyword Arguments: - scale (number) – Scaling factor to apply to the sampled tensor (default: (1.0))
- batch_size (number) – Batch size of the tensor (default: (1))
Returns: A “d” dimensioned bernoulli distributed tensor
Return type:
-
dynet.
random_uniform
(dim, left, right, batch_size=1)¶ Create a random uniform tensor
Create a tensor distributed according to uniform distribution with boundaries left and right.
Parameters: - dim (tuple, int) – Dimension of the tensor
- left (number) – Lower bound of the uniform distribution
- right (number) – Upper bound of the uniform distribution
Keyword Arguments: batch_size (number) – Batch size of the tensor (default: (1))
Returns: A “d” dimensioned uniform distributed tensor
Return type:
-
dynet.
random_gumbel
(dim, mu=0.0, beta=1.0, batch_size=1)¶ Create a random Gumbel sampled vector
Create a vector distributed according to a Gumbel distribution with the specified parameters. (Currently only the defaults of mu=0.0 and beta=1.0 supported.
Parameters: dim (tuple, int) – Dimension of the tensor
Keyword Arguments: - mu (number) – The \(\mu\) parameter (default: (0.0))
- beta (number) – The \(\beta\) parameter (default: (1.0))
- batch_size (number) – Batch size of the tensor (default: (1))
Returns: “d” dimensioned Gumbel distributed tensor
Return type:
-
dynet.
noise
(x, stddev)¶ Additive gaussian noise
Add gaussian noise to an expression.
Parameters: - x (dynet.Expression) – Input expression
- stddev (number) – The standard deviation of the gaussian
Returns: \(y\sim\mathcal N(x,\texttt{stddev})\)
Return type:
Arithmetic operations¶
-
dynet.
cdiv
(x, y)¶ Componentwise division
- Divide an expressions component-wise by another, broadcasting dimensions (currently only of the second expression!) if necessary as follows:
- When number of dimensions differ, we add dimensions of size 1 to make the number of dimensions match
- Now, every dimensions is required to have matching size, or the dim size of the right expression must equal 1 (in which case it will be broadcasted)
- In the same way, the batch sizes must match, or the batch size of the right expression must equal 1 in which case it will be broadcasted
- The resulting tensor’s dimensionality is thus determined as the max of both inputs at every position
Parameters: - x (dynet.Expression) – The first input expression
- y (dynet.Expression) – The second input expression
Returns: An expression where the ith element is equal to \(\frac{x_i}{y_i}\)
Return type:
-
dynet.
cmult
(x, y)¶ Componentwise multiplication
- Multiply two expressions component-wise, broadcasting dimensions if necessary as follows:
- When number of dimensions differ, we add dimensions of size 1 to make the number of dimensions match
- Now, every dimensions is required to have matching size, or one of the dimensions must equal 1 (in which case it will be broadcasted)
- In the same way, the batch dimension must match, or equal 1 in which case it will be broadcasted
- The resulting tensor’s dimensionality is thus determined as the max of both inputs at every position
Parameters: - x (dynet.Expression) – The first input expression
- y (dynet.Expression) – The second input expression
Returns: An expression where the ith element is equal to \(x_i\times y_i\)
Return type:
-
dynet.
colwise_add
(x, y)¶ Columnwise addition
Add vector \(y\) to each column of matrix \(x\)
Parameters: - x (dynet.Expression) – An MxN matrix
- y (dynet.Expression) – A length M vector
Returns: An expression where \(y\) is added to each column of \(x\)
Return type:
-
dynet.
squared_norm
(x)¶ Squared norm
The squared norm of the values of
x
: \(\Vert x\Vert_2^2=\sum_i x_i^2\).Parameters: x (dynet.Expression) – Input expression Returns: \(\Vert x\Vert_2^2=\sum_i x_i^2\) Return type: dynet.Expression
-
dynet.
l2_norm
(x)¶ L2 norm
The l2 norm of the values of
x
: \(\Vert x\Vert_2=\sqrt{\sum_i x_i^2}\).Parameters: x (dynet.Expression) – Input expression Returns: \(\Vert x\Vert_2=\sqrt{\sum_i x_i^2}\) Return type: dynet.Expression
-
dynet.
exp
(x)¶ Natural exponent
Calculate elementwise \(y_i = e^{x_i}\)
Parameters: x (dynet.Expression) – Input expression Returns: \(e^{x}\) Return type: dynet.Expression
-
dynet.
square
(x)¶ Square
Calculate elementwise \(y_i = x_i^2\)
Parameters: x (dynet.Expression) – Input expression Returns: \(y = x^2\) Return type: dynet.Expression
-
dynet.
sqrt
(x)¶ Square root
Calculate elementwise \(y_i = \sqrt{x_i}\)
Parameters: x (dynet.Expression) – Input expression Returns: \(y = \sqrt{x}\) Return type: dynet.Expression
-
dynet.
abs
(x)¶ Absolute value
Calculate elementwise \(y_i = \vert x_i\vert\)
Parameters: x (dynet.Expression) – Input expression Returns: \(y = \vert x\vert\) Return type: dynet.Expression
-
dynet.
erf
(x)¶ Gaussian error function
Elementwise calculation of the Gaussian error function \(y_i = \text{erf}(x_i)=\frac {1}{\sqrt{\pi}}\int_{-x_i}^{x_i}e^{-t^2}\mathrm{d}t\)
Parameters: x (dynet.Expression) – Input expression Returns: \(y_i = \text{erf}(x_i)\) Return type: dynet.Expression
-
dynet.
cube
(x)¶ Calculate elementwise \(y_i = x_i^3\)
Parameters: x (dynet.Expression) – Input expression Returns: \(y = x^3\) Return type: dynet.Expression
-
dynet.
log
(x)¶ Natural logarithm
Elementwise calculation of the natural logarithm \(y_i = \ln(x_i)\)
Parameters: x (dynet.Expression) – Input expression Returns: \(y_i = \ln(x_i)\) Return type: dynet.Expression
-
dynet.
log_sigmoid
(x)¶ Log sigmoid
Calculate elementwise log sigmoid function \(y_i = \ln(\frac{1}{1+e^{x_i}})\) This is more numerically stable than log(logistic(x))
Parameters: x (dynet.Expression) – Input expression Returns: \(y_i = \ln(\frac{1}{1+e^{x_i}})\) Return type: dynet.Expression
-
dynet.
lgamma
(x)¶ Log gamma
Calculate elementwise log gamma function \(y_i = \ln(\Gamma(x_i))\)
Parameters: x (dynet.Expression) – Input expression Returns: \(y_i = \ln(\Gamma(x_i))\) Return type: dynet.Expression
-
dynet.
sin
(x)¶ Sine
Elementwise calculation of the sine
Parameters: x (dynet.Expression) – Input expression Returns: \(\sin(x)\) Return type: dynet.Expression
-
dynet.
cos
(x)¶ Cosine
Elementwise calculation of the cosine
Parameters: x (dynet.Expression) – Input expression Returns: \(\cos(x)\) Return type: dynet.Expression
-
dynet.
tan
(x)¶ Tangent
Elementwise calculation of the tangent
Parameters: x (dynet.Expression) – Input expression Returns: \(\tan(x)\) Return type: dynet.Expression
-
dynet.
asin
(x)¶ Inverse sine
Elementwise calculation of the inverse sine
Parameters: x (dynet.Expression) – Input expression Returns: \(\sin^{-1}(x)\) Return type: dynet.Expression
-
dynet.
acos
(x)¶ Inverse cosine
Elementwise calculation of the inverse cosine
Parameters: x (dynet.Expression) – Input expression Returns: \(\cos^{-1}(x)\) Return type: dynet.Expression
-
dynet.
atan
(x)¶ Tangent
Elementwise calculation of the inverse tangent
Parameters: x (dynet.Expression) – Input expression Returns: \(\tan^{-1}(x)\) Return type: dynet.Expression
-
dynet.
sinh
(x)¶ Hyperbolic sine
Elementwise calculation of the hyperbolic sine
Parameters: x (dynet.Expression) – Input expression Returns: \(\sinh(x)\) Return type: dynet.Expression
-
dynet.
cosh
(x)¶ Hyperbolic cosine
Elementwise calculation of the hyperbolic cosine
Parameters: x (dynet.Expression) – Input expression Returns: \(\cosh(x)\) Return type: dynet.Expression
-
dynet.
tanh
(x)¶ Hyperbolic tangent
Elementwise calculation of the hyperbolic tangent
Parameters: x (dynet.Expression) – Input expression Returns: \(\tanh(x)\) Return type: dynet.Expression
-
dynet.
asinh
(x)¶ Inverse hyperbolic sine
Elementwise calculation of the inverse hyperbolic sine
Parameters: x (dynet.Expression) – Input expression Returns: \(\sinh^{-1}(x)\) Return type: dynet.Expression
-
dynet.
acosh
(x)¶ Inverse hyperbolic cosine
Elementwise calculation of the inverse hyperbolic cosine
Parameters: x (dynet.Expression) – Input expression Returns: \(\cosh^{-1}(x)\) Return type: dynet.Expression
-
dynet.
atanh
(x)¶ Inverse hyperbolic tangent
Elementwise calculation of the inverse hyperbolic tangent
Parameters: x (dynet.Expression) – Input expression Returns: \(\tanh^{-1}(x)\) Return type: dynet.Expression
-
dynet.
logistic
(x)¶ Logistic sigmoid function
Calculate elementwise \(y_i = \frac{1}{1+e^{-x_i}}\)
Parameters: x (dynet.Expression) – Input expression Returns: \(y_i = \frac{1}{1+e^{-x_i}}\) Return type: dynet.Expression
-
dynet.
rectify
(x)¶ Rectifier (or ReLU, Rectified Linear Unit)
Calculate elementwise recitifer (ReLU) function \(y_i = \max(x_i,0)\)
Parameters: x (dynet.Expression) – Input expression Returns: \(y_i = \max(x_i,0)\) Return type: dynet.Expression
-
dynet.
elu
(x, alpha=1.0)¶ Exponential Linear Unit (ELU)
Calculate elementwise the function
\[\begin{split}y_i = \left\{\begin{array}{lr} x_i, & \text{if } x>0\\ \alpha\times(e^{x_i} - 1), & \text{if }x\leqslant 0 \end{array}\right.\end{split}\]Reference: Clevert et al., 2015
Parameters: - x (dynet.Expression) – Input expression
- alpha (number) – \(\alpha\) parameter
Returns: \(\text{ELU}(x_i, \alpha)\)
Return type:
-
dynet.
selu
(x)¶ Scaled Exponential Linear Unit (SELU)
Calculate elementwise the function
\[\begin{split}y_i = \lambda\times\left\{ \begin{array}{lr} x_i, & \text{if } x>0\\ \alpha\times(e^{x_i} - 1), & \text{if }x\leqslant 0\\ \end{array}\right.\end{split}\]With
\[\begin{split}\begin{split} \lambda &=\texttt{1.0507009873554804934193349852946}\\ \alpha &=\texttt{1.6732632423543772848170429916717}\\ \end{split}\end{split}\]Reference: Klambaouer et al., 2017
Parameters: x (dynet.Expression) – Input expression Returns: \(\text{SELU}(x_i)\) Return type: dynet.Expression
-
dynet.
sparsemax
(x)¶ Sparsemax
The sparsemax function (Martins et al. 2016), which is similar to softmax, but induces sparse solutions where most of the vector elements are zero. Note: This function is not yet implemented on GPU.
Parameters: x (dynet.Expression) – Input expression Returns: The sparsemax of the scores Return type: dynet.Expression
-
dynet.
softsign
(x)¶ Softsign function
Calculate elementwise the softsign function \(y_i = \frac{x_i}{1+\vert x_i\vert}\)
Parameters: x (dynet.Expression) – Input expression Returns: \(y_i = \frac{x_i}{1+\vert x_i\vert}\) Return type: dynet.Expression
-
dynet.
pow
(x, y)¶ Power function
Calculate an output where the ith element is equal to \(x_i^{y}\)
Parameters: - x (dynet.Expression) – The first input expression
- y (dynet.Expression) – The second input expression(scalar expression)
Returns: \(x_i^{y}\)
Return type:
-
dynet.
bmin
(x, y)¶ Minimum
Calculate an output where the ith element is \(\min(x_i,y_i)\)
Parameters: - x (dynet.Expression) – The first input expression
- y (dynet.Expression) – The second input expression
Returns: \(\min(x_i,y_i)\)
Return type:
-
dynet.
bmax
(x, y)¶ Maximum
Calculate an output where the ith element is \(\max(x_i,y_i)\)
Parameters: - x (dynet.Expression) – The first input expression
- y (dynet.Expression) – The second input expression
Returns: \(\max(x_i,y_i)\)
Return type:
-
dynet.
cumsum
(x, d=0)¶ Cumulative sum along an arbitrary dimension
Computes the cumulative sum \(y_i=\sum_{j\leq i}x_j\) along an arbitrary dimension.
Parameters: - x (dynet.Expression) – Input expression
- d (int) – Dimension along which to compute the cumulative sums (default: 0)
Returns: An expression with the same dimension as the input
Return type:
Reduction/moment operations¶
-
dynet.
sum_elems
(x)¶ Sum all elements
Sum all the elements in an expression.
Parameters: x (dynet.Expression) – Input expression Returns: The sum of all of its elements Return type: dynet.Expression
-
dynet.
moment_elems
(x, r)¶ Statistical moment of elements of the tensor
Computes the statistical moment of order \(r\), \(\frac 1 n \sum_ix_i^r\) of all the elements of each minibatch. :param x: Input expression :type x: dynet.Expression :param r: Moment order :type r: int
Returns: A scalar expression (minibatched) Return type: dynet.Expression
-
dynet.
mean_elems
(x)¶ Mean of elements of the tensor
Computes the mean \(\frac 1 n \sum_ix_i\) of all the elements of each minibatch. :param x: Input expression :type x: dynet.Expression
Returns: A scalar expression (minibatched) Return type: dynet.Expression
-
dynet.
std_elems
(x)¶ Standard deviation of elements of the tensor
Computes the standard deviation \(\sigma=\sqrt{\frac 1 n \sum_i(x_i-\mu)^2}\) of all the elements of each minibatch. :param x: Input expression :type x: dynet.Expression
Returns: A scalar expression (minibatched) Return type: dynet.Expression
-
dynet.
sum_dim
(x, d, b=False, n=0)¶ Sum along an arbitrary dimension
Computes the sum \(\sum_ix_i\) along an arbitrary dimension or dimensions.
Parameters: - x (dynet.Expression) – Input expression
- d (list) – Dimensions along which to reduce
- b (bool) – Whether to include batch dimension
Returns: An expression with |d| less dimensions and possibly dropped batch dimension
Return type:
-
dynet.
moment_dim
(x, d, r, b, n=0)¶ Statistical moment along an arbitrary dimension
Computes the statistical moment of order \(r\), \(\frac 1 n \sum_ix_i^r\) along an arbitrary dimension.
Parameters: - x (dynet.Expression) – Input expression
- d (list) – Dimensions along which to reduce
- r (int) – Moment order
- b (bool) – Whether to include batch dimension
- n (int) – If > 0, overwrite the n in the equation by this value, useful for masking
Returns: An expression with |d| less dimensions and possibly dropped batch dimension
Return type:
-
dynet.
mean_dim
(x, d, b, n=0)¶ Mean along an arbitrary dimension
Computes the mean \(\frac 1 n \sum_ix_i\) along an arbitrary dimension.
Parameters: - x (dynet.Expression) – Input expression
- d (list) – Dimensions along which to reduce
- b (bool) – Whether to include batch dimension
- n (int) – If > 0, overwrite the n in the equation by this value, useful for masking
Returns: An expression with |d| less dimensions and possibly dropped batch dimension
Return type:
-
dynet.
std_dim
(x, d, b, n=0)¶ Standard deviation along an arbitrary dimension
Computes the standard deviation \(\sigma=\sqrt{\frac 1 n \sum_i(x_i-\mu)^2}\) along arbitrary dimensions.
Parameters: - x (dynet.Expression) – Input expression
- d (int) – Dimensions along which to reduce
- b (bool) – Whether to include batch dimension
- n (int) – If > 0, overwrite the n in the equation by this value, useful for masking
Returns: An expression with |d| less dimensions and possibly dropped batch dimension
Return type:
-
dynet.
max_dim
(x, d=0)¶ Max out through a dimension
Select out a element/row/column/sub-tensor from an expression, with maximum value along a given dimension. This will result in the dimension of the expression being reduced by 1.
Parameters: x (dynet.Expression) – Input expression Keyword Arguments: d (int) – Dimension on which to perform the maxout (default: (0)) Returns: An expression of sub-tensor with max value along dimension d
Return type: dynet.Expression
-
dynet.
min_dim
(x, d=0)¶ Min out through a dimension
Select out a element/row/column/sub-tensor from an expression, with minimum value along a given dimension. This will result in the dimension of the expression being reduced by 1.
Parameters: x (dynet.Expression) – Input expression Keyword Arguments: d (int) – Dimension on which to perform the minout (default: (0)) Returns: An expression of sub-tensor with min value along dimension d
Return type: dynet.Expression
-
dynet.
sum_batches
(x)¶ Sum over minibatches
Sum an expression that consists of multiple minibatches into one of equal dimension but with only a single minibatch. This is useful for summing loss functions at the end of minibatch training.
Parameters: x (dynet.Expression) – Input expression Returns: An expression with a single batch Return type: dynet.Expression
-
dynet.
moment_batches
(x, r)¶ Statistical moment along the batch dimension
Computes the statistical moment of order \(r\), \(\frac 1 n \sum_ix_i^r\) along the batch dimension. :param x: Input expression :type x: dynet.Expression :param r: Moment order :type r: int
Returns: An expression with a single batch Return type: dynet.Expression
-
dynet.
mean_batches
(x)¶ Mean along the batch dimension
Computes the mean \(\frac 1 n \sum_ix_i\) along the batch dimension. :param x: Input expression :type x: dynet.Expression
Returns: An expression with a single batch Return type: dynet.Expression
-
dynet.
std_batches
(x)¶ Standard deviation along the batch dimension
Computes the standard deviation \(\sigma=\sqrt{\frac 1 n \sum_i(x_i-\mu)^2}\) along the batch dimension. :param x: Input expression :type x: dynet.Expression
Returns: An expression with a single batch Return type: dynet.Expression
-
dynet.
fold_rows
(x, nrows=2)¶ [summary]
[description]
Parameters: x (dynet.Expression) – Keyword Arguments: nrows {number} (unsigned) – (default: (2)) Returns: Return type: dynet.Expression
-
dynet.
esum
(xs)¶ Sum
This performs an elementwise sum over all the expressions in
xs
Parameters: xs (list) – A list of expression of same dimension Returns: An expression where the ith element is equal to \(\sum_{j=0}\texttt{xs[}j\texttt{][}i\texttt{]}\) Return type: dynet.Expression
-
dynet.
emax
(xs)¶ Max
This performs an elementwise max over all the expressions in
xs
Parameters: xs (list) – A list of expression of same dimension Returns: An expression where the ith element is equal to \(\max_j\texttt{xs[}j\texttt{][}i\texttt{]}\) Return type: dynet.Expression
-
dynet.
logsumexp
(xs)¶ Log, sum, exp
The elementwise “logsumexp” function that calculates \(\ln(\sum_i e^{xs_i})\), used in adding probabilities in the log domain.
Parameters: xs (list) – A list of expression of same dimension Returns: An expression where the ith element is equal to \(\ln\left(\sum_{j=0}e^{\texttt{xs[}j\texttt{][}i\texttt{]}}\right)\) Return type: dynet.Expression
-
dynet.
average
(xs)¶ Average
This performs an elementwise average over all the expressions in
xs
Parameters: xs (list) – A list of expression of same dimension Returns: An expression where the ith element is equal to \(\frac{1}{\texttt{len(xs)}}\sum_{j=0}\texttt{xs[}j\texttt{][}i\texttt{]}\) Return type: dynet.Expression
Loss/Probability operations¶
-
dynet.
softmax
(x, d=0)¶ Softmax
The softmax function normalizes each column to ensure that all values are between 0 and 1 and add to one by applying \(\frac{e^{x_i}}{\sum_j e^{x_j}}\).
- Args:
- x (dynet.Expression): Input expression d (int): Dimension to normalize over
- Returns:
- dynet.Expression: \(\frac{e^{x_i}}{\sum_j e^{x_j}}\)
-
dynet.
log_softmax
(x, restrict=None)¶ Restricted log softmax
The log softmax function calculated over only a subset of the vector elements. The elements to be included are set by the
restriction
variable. All elements not included inrestriction
are set to negative infinity.Parameters: x (dynet.Expression) – Input expression Keyword Arguments: restrict (list) – List of log softmax to compute (default: (None)) Returns: A vector with the log softmax over the specified elements Return type: dynet.Expression
-
dynet.
pairwise_rank_loss
(x, y, m=1.0)¶ Pairwise rank loss
A margin-based loss, where every margin violation for each pair of values is penalized: \(\sum_i \max(m - x_i + y_i, 0)\)
Parameters: - x (dynet.Expression) – The first input expression
- y (dynet.Expression) – The second input expression
Keyword Arguments: m (number) – The margin (default: (1.0))
Returns: The pairwise rank loss
Return type:
-
dynet.
poisson_loss
(log_lambda, x)¶ Poisson loss
The negative log probability of
x
according to a Poisson distribution with parameter \(\exp\)log_lambda
. Useful in Poisson regression where, we try to predict the parameters of a Possion distribution to maximize the probability of datax
.Parameters: - log_lambda (dynet.Expression) – The log of the Poisson distribution’s lambda
- x (int) – The target value
Returns: The Poisson loss
Return type:
-
dynet.
huber_distance
(x, y, c=1.345)¶ Huber distance
The huber distance between values of
x
andy
parameterized byc
, \(\sum_i L_c(x_i, y_i)\) where:\[\begin{split}L_c(x, y) = \begin{cases} \frac{1}{2}(y - x)^2 & \textrm{for } \vert y - f(x)\vert \le c, \\ c\, \vert y - f(x)\vert - \frac{1}{2}c^2 & \textrm{otherwise.} \end{cases}\end{split}\]Parameters: - x (dynet.Expression) – The first input expression
- y (dynet.Expression) – The second input expression
Keyword Arguments: c (number) – The parameter of the huber distance parameterizing the cuttoff (default: (1.345))
Returns: The huber distance
Return type:
-
dynet.
pickneglogsoftmax
(x, v)¶ Negative softmax log likelihood
This function takes in a vector of scores
x
, and performs a log softmax, takes the negative, and selects the likelihood corresponding to the elementv
. This is perhaps the most standard loss function for training neural networks to predict one out of a set of elements.Parameters: - x (dynet.Expression) – Input scores
- v (int) – True class
Returns: \(-\log\left(\frac{e^{x_v}}{\sum_j e^{x_j}}\right)\)
Return type:
-
dynet.
pickneglogsoftmax_batch
(x, vs)¶ Negative softmax log likelihood on a batch
This function takes in a batched vector of scores
x
, and performs a log softmax, takes the negative, and selects the likelihood corresponding to the elementsvs
. This is perhaps the most standard loss function for training neural networks to predict one out of a set of elements.Parameters: - x (dynet.Expression) – Input scores
- v (list) – True classes
Returns: \(-\sum_{v\in \texttt{vs}}\log\left(\frac{e^{x_v}}{\sum_j e^{x_j}}\right)\)
Return type:
-
dynet.
hinge
(x, v, m=1.0)¶ Hinge loss
This function takes in a vector of scores
x
, and calculates a hinge loss such that the elementv
must be greater than all other elements by at leastm
, otherwise a loss is incurred.Parameters: - x (dynet.Expression) – Input scores
- v (int) – True class
- m (float) – The margin
Returns: \(\sum_{\tilde{v} != v} max(x_{\tilde{v}} - x_v + m, 0)\)
Return type:
-
dynet.
hinge_batch
(x, vs, m=1.0)¶ Hinge loss on a batch
This function takes in a batched vector of scores
xs
, and calculates a hinge loss such that the elementsvs
must be greater than all other elements by at leastm
, otherwise a loss is incurred.Parameters: - x (dynet.Expression) – Input scores
- v (list) – True classes
- m (float) – The margin
Returns: The batched hinge loss function
Return type:
-
dynet.
kmh_ngram
(x, v)¶ [summary]
[description]
Parameters: - x (dynet.Expression) –
- v (dynet.Expression) –
Returns: Return type:
-
dynet.
squared_distance
(x, y)¶ Squared distance
The squared distance between values of
x
andy
: \(\Vert x-y\Vert_2^2=\sum_i (x_i-y_i)^2\).Parameters: - x (dynet.Expression) – The first input expression
- y (dynet.Expression) – The second input expression
Returns: \(\Vert x-y\Vert_2^2=\sum_i (x_i-y_i)^2\)
Return type:
-
dynet.
l1_distance
(x, y)¶ L1 distance
L1 distance between values of
x
andy
: \(\Vert x-y\Vert_1=\sum_i \vert x_i-y_i\vert\).Parameters: - x (dynet.Expression) – The first input expression
- y (dynet.Expression) – The second input expression
Returns: \(\Vert x-y\Vert_1=\sum_i \vert x_i-y_i\vert\).
Return type:
-
dynet.
binary_log_loss
(x, y)¶ Binary log loss
The log loss of a binary decision according to the sigmoid function \(- \sum_i (y_i \ln(x_i) + (1-y_i) \ln(1-x_i))\)
Parameters: - x (dynet.Expression) – The first input expression
- y (dynet.Expression) – The second input expression
Returns: \(- \sum_i (y_i \ln(x_i) + (1-y_i) \ln(1-x_i))\)
Return type:
Flow/Shaping operations¶
-
dynet.
pick
(e, index=0, dim=0)¶ Pick element.
Pick a single element/row/column/sub-tensor from an expression. This will result in the dimension of the tensor being reduced by 1.
Parameters: e (Expression) – Expression to pick from
Keyword Arguments: - index (number) – Index to pick (default: 0)
- dim (number) – Dimension to pick from (default: 0)
Returns: Picked expression
Return type: _pickerExpression
-
dynet.
pick_batch
(e, indices, dim=0)¶ Batched pick.
Pick elements from multiple batches.
Parameters: - e (Expression) – Expression to pick from
- indices (list) – Indices to pick
- dim (number) – Dimension to pick from (default: 0)
Returns: Picked expression
Return type: _pickerBatchExpression
-
dynet.
pickrange
(x, s, e)¶
-
dynet.
pick_batch_elem
(x, v)¶ Pick batch element.
Pick batch element from a batched expression. For a Tensor with 3 batch elements:
\[\begin{split}\begin{pmatrix} x_{1,1,1} & x_{1,1,2} \\ x_{1,2,1} & x_{1,2,2} \\ \end{pmatrix}\\ \begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix}\\ \begin{pmatrix} x_{3,1,1} & x_{3,1,2} \\ x_{3,2,1} & x_{3,2,2} \\ \end{pmatrix}\end{split}\]pick_batch_elem(t, 1)
will return a Tensor of\[\begin{split}\begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix}\end{split}\]Parameters: - x (dynet.Expression) – Input expression
- v (int) – The index of the batch element to be picked.
Returns: The expression of picked batch element. The picked element is a tensor whose batch dimension equals to one.
Return type:
-
dynet.
pick_batch_elems
(x, vs)¶ Pick batch element.
Pick batch element from a batched expression. For a Tensor with 3 batch elements:
\[\begin{split}\begin{pmatrix} x_{1,1,1} & x_{1,1,2} \\ x_{1,2,1} & x_{1,2,2} \\ \end{pmatrix}\\ \begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix}\\ \begin{pmatrix} x_{3,1,1} & x_{3,1,2} \\ x_{3,2,1} & x_{3,2,2} \\ \end{pmatrix}\end{split}\]pick_batch_elems(t, [2, 3])
will return a Tensor of\[\begin{split}\begin{pmatrix} x_{2,1,1} & x_{2,1,2} \\ x_{2,2,1} & x_{2,2,2} \\ \end{pmatrix}\\ \begin{pmatrix} x_{3,1,1} & x_{3,1,2} \\ x_{3,2,1} & x_{3,2,2} \\ \end{pmatrix}\end{split}\]Parameters: - x (dynet.Expression) – Input expression
- vs (list) – A list of indices of the batch elements to be picked.
Returns: The expression of picked batch elements. The batch elements is a tensor whose batch dimension equals to the size of list v.
Return type:
-
dynet.
reshape
(x, d, batch_size=1)¶ Reshape to another size
This node reshapes a tensor to another size, without changing the underlying layout of the data. The layout of the data in DyNet is column-major, so if we have a 3x4 matrix :
\[\begin{split}\begin{pmatrix} x_{1,1} & x_{1,2} & x_{1,3} & x_{1,4} \\ x_{2,1} & x_{2,2} & x_{2,3} & x_{2,4} \\ x_{3,1} & x_{3,2} & x_{3,3} & x_{3,4} \\ \end{pmatrix}\end{split}\]and transform it into a 2x6 matrix, it will be rearranged as:
\[\begin{split}\begin{pmatrix} x_{1,1} & x_{3,1} & x_{2,2} & x_{1,3} & x_{3,3} & x_{2,4} \\ x_{2,1} & x_{1,2} & x_{3,2} & x_{2,3} & x_{1,4} & x_{3,4} \\ \end{pmatrix}\end{split}\]Note: This is O(1) for forward, and O(n) for backward.
Parameters: - x (dynet.Expression) – Input expression
- d (tuple) – New dimension
Keyword Arguments: batch_size (int) – New batch size (default: (1))
Returns: The reshaped expression
Return type:
-
dynet.
select_rows
(x, rs)¶ Select rows
Select a subset of rows of a matrix.
Parameters: - x (dynet.Expression) – Input expression
- rs (list) – The rows to extract
Returns: An expression containing the selected rows
Return type:
-
dynet.
select_cols
(x, cs)¶ Select columns
Select a subset of columns of a matrix.
Parameters: - x (dynet.Expression) – Input expression
- cs (list) – The columns to extract
Returns: An expression containing the selected columns
Return type:
-
dynet.
concatenate_cols
(xs)¶ Concatenate columns
Perform a concatenation of the columns in multiple expressions. All expressions must have the same number of rows.
Parameters: xs (list) – A list of expressions Returns: The expression with the columns concatenated Return type: dynet.Expression
-
dynet.
concatenate
(xs, d=0)¶ Concatenate
Perform a concatenation of multiple expressions along a particular dimension. All expressions must have the same dimensions except for the dimension to be concatenated (rows by default).Parameters: - xs (list) – A list of expressions
- d – The dimension along with to perform concatenation
Returns: The expression concatenated along the particular dimension
Return type:
-
dynet.
concatenate_to_batch
(xs)¶ Concatenate list of expressions to a single batched expression
Perform a concatenation of several expressions along the batch dimension. All expressions must have the same shape except for the batch dimension.
Parameters: xs (list) – A list of expressions of same dimension (except batch size) Returns: The expression with the batch dimensions concatenated Return type: dynet.Expression
-
dynet.
nobackprop
(x)¶ Prevent backprop
This node has no effect on the forward pass, but prevents gradients from flowing backward during the backward pass. This is useful when there’s a subgraph for which you don’t want loss passed back to the parameters.
Parameters: x (dynet.Expression) – Input expression Returns: An output expression containing the same as input (only effects on backprop process) Return type: dynet.Expression
-
dynet.
flip_gradient
(x)¶ Flip gradient
This node has no effect on the forward pass, but takes negative on backprop process. This operation is widely used in adversarial networks.
Parameters: x (dynet.Expression) – Input expression Returns: An output expression containing the same as input (only effects on backprop process) Return type: dynet.Expression
-
dynet.
argmax
(x, gradient_mode)¶ Argmax
This node takes an input vector \(x\) and returns a one hot vector \(y\) such that \(y_{\text{argmax} x}=1\) There are two gradient modes for this operation:
argmax(x, gradient_mode="zero_gradient")
is the standard argmax operation. Note that this almost everywhere differentiable and its gradient is 0. It will stop your gradient
argmax(x, gradient_mode="straight_through_gradient")
This gradient mode implements the straight-through estimator (Bengio et al., 2013). Its forward pass is the same as the argmax operation, but its gradient is the same as the identity function. Note that this does not technically correspond to a differentiable function (hence the name “estimator”). Tensors of order \(>1\) are not supported yet. If you really need to use this operation on matrices, tensors, etc… feel free to open an issue on github.
Parameters: - x (dynet.Expression) – The input vector (can be batched)
- gradient_mode (str) – Gradient mode for the backward pass (one of
"zero_gradient"
or"straight_through_gradient"
Returns: The one hot argmax vector
Return type:
Noise operations¶
-
dynet.
noise
(x, stddev) Additive gaussian noise
Add gaussian noise to an expression.
Parameters: - x (dynet.Expression) – Input expression
- stddev (number) – The standard deviation of the gaussian
Returns: \(y\sim\mathcal N(x,\texttt{stddev})\)
Return type:
-
dynet.
dropout
(x, p)¶ Dropout
With a fixed probability, drop out (set to zero) nodes in the input expression, and scale the remaining nodes by 1/p. Note that there are two kinds of dropout:
- Regular dropout: where we perform dropout at training time and then scale outputs by p at test time.
- Inverted dropout: where we perform dropout and scaling at training time, and do not need to do anything at test time.
DyNet implements the latter, so you only need to apply dropout at training time, and do not need to perform scaling and test time.
Parameters: - x (dynet.Expression) – Input expression
- p (number) – The dropout probability
Returns: The dropped out expression \(y=\frac{1}{1-\texttt{p}}x\circ z, z\sim\text{Bernoulli}(1-\texttt{p})\)
Return type:
-
dynet.
dropout_dim
(x, d, p)¶ Dropout along one dimension
Identical to the dropout operation except the dropout mask is the same across one dimension. Use this if you want to drop columns or lines in a matrix for example
For now this only supports tensors of order <= 3 (with or without batch dimension)
Parameters: - x (dynet.Expression) – Input expression
- d (int) – Dimension along which to drop
- p (number) – The dropout probability
Returns: The dropped expression
Return type:
-
dynet.
dropout_batch
(x, p)¶ Dropout entire elements of a minibatch
Identical to the dropout operation except entire batch elements are dropped
Parameters: - x (dynet.Expression) – Input expression
- p (number) – The dropout probability
Returns: The dropped expression
Return type:
-
dynet.
block_dropout
(x, p)¶ Block dropout
Identical to the dropout operation, but either drops out all or no values in the expression, as opposed to making a decision about each value individually.
Parameters: - x (dynet.Expression) – Input expression
- p (number) – The dropout probability
Returns: The block dropout expression
Return type:
Linear algebra operations¶
-
dynet.
affine_transform
(exprs)¶ Affine transform
This performs an affine transform over an arbitrary (odd) number of expressions held in the input initializer list xs. The first expression is the “bias,” which is added to the expression as-is. The remaining expressions are multiplied together in pairs, then added. A very common usage case is the calculation of the score for a neural network layer (e.g. \(b + Wz\)) where b is the bias, W is the weight matrix, and z is the input. In this case
xs[0] = b
,xs[1] = W
, andxs[2] = z
.Parameters: exprs (list) – A list containing an odd number of expressions Returns: An expression equal to: xs[0] + xs[1]*xs[2] + xs[3]*xs[4] + ...
Return type: dynet.Expression
-
dynet.
dot_product
(x, y)¶ Dot Product
Calculate the dot product \(x^Ty=\sum_i x_iy_i\)
Parameters: - x (dynet.Expression) – The first input expression
- y (dynet.Expression) – The second input expression
Returns: \(x^Ty=\sum_i x_iy_i\)
Return type:
-
dynet.
transpose
(x, dims=[1, 0])¶ Transpose a matrix
Get the transpose of the matrix, or if dims is specified shuffle the dimensions arbitrarily.
Note: This is O(1) if either the row or column dimension is 1, and O(n) otherwise.
Parameters: - x (dynet.Expression) – Input expression
- dims (list) – The dimensions to swap. The ith dimension of the output will be equal to the dims[i] dimension of the input. dims must have the same number of dimensions as x.
Returns: \(x^T\) / the shuffled expression
Return type:
-
dynet.
inverse
(x)¶ Matrix Inverse
Takes the inverse of a matrix (not implemented on GPU yet, although contributions are welcome: issue). Note that back-propagating through an inverted matrix can also be the source of stability problems sometimes.
Parameters: x (dynet.Expression) – Input expression Returns: Inverse of x Return type: dynet.Expression
-
dynet.
trace_of_product
(x, y)¶ Trace of Matrix Product
Takes the trace of the product of matrices. (not implemented on GPU yet, although contributions are welcome: issue).
Parameters: - x (dynet.Expression) – The first input expression
- y (Expression) – The second input expression
Returns: \(\text{Tr}(xy)\)
Return type:
-
dynet.
logdet
(x)¶ Log determinant
Takes the log of the determinant of a matrix. (not implemented on GPU yet, although contributions are welcome: issue).
Parameters: x (dynet.Expression) – Input expression Returns: \(\log(\vert x\vert)\) Return type: dynet.Expression
Convolution/Pooling operations¶
-
dynet.
conv2d
(x, f, stride, is_valid=True)¶ 2D convolution without bias
2D convolution operator without bias parameters.
VALID
andSAME
convolutions are supported.Think about when stride is 1, the distinction:
SAME
: output size is the same with input size. To do so, one needs to pad the input so the filter can sweep outside of the input maps.VALID
: output size shrinks byfilter_size - 1
, and the filters always sweep at valid positions inside the input maps. No padding needed.
In detail, assume
- Input feature maps:
XH x XW x XC x N
- Filters:
FH x FW x XC x FC
- Strides:
strides[0]
andstrides[1]
are row (h
) and col (w
) stride, respectively.
For the
SAME
convolution: the output height (YH
) and width (YW
) are computed as:YH = ceil(float(XH) / float(strides[0]))
YW = ceil(float(XW) / float(strides[1]))
and the paddings are computed as:
pad_along_height = max((YH - 1) * strides[0] + FH - XH, 0)
pad_along_width = max((YW - 1) * strides[1] + FW - XW, 0)
pad_top = pad_along_height / 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width / 2
pad_right = pad_along_width - pad_left
For the
VALID
convolution: the output height (:code`YH`) and width (YW
) are computed as:YH = ceil(float(XH - FH + 1) / float(strides[0]))
YW = ceil(float(XW - FW + 1) / float(strides[1]))
and the paddings are always zeros.
Parameters: - x (dynet.Expression) – The input feature maps: (H x W x Ci) x N (ColMaj), 3D tensor with an optional batch dimension
- f (dynet.Expression) – 2D convolution filters: H x W x Ci x Co (ColMaj), 4D tensor
- stride (list) – the row and column strides
Keyword Arguments: is_valid (bool) – ‘VALID’ convolution or ‘SAME’ convolution, default is True (‘VALID’) (default: (True))
Returns: The output feature maps (H x W x Co) x N, 3D tensor with an optional batch dimension
Return type:
-
dynet.
conv2d_bias
(x, f, b, stride, is_valid=True)¶ 2D convolution with bias
2D convolution operator with bias parameters.
VALID
andSAME
convolutions are supported.Think about when stride is 1, the distinction:
SAME
: output size is the same with input size. To do so, one needs to pad the input so the filter can sweep outside of the input maps.VALID
: output size shrinks byfilter_size - 1
, and the filters always sweep at valid positions inside the input maps. No padding needed.
In detail, assume
- Input feature maps:
XH x XW x XC x N
- Filters:
FH x FW x XC x FC
- Strides:
strides[0]
andstrides[1]
are row (h
) and col (w
) stride, respectively.
For the
SAME
convolution: the output height (YH
) and width (YW
) are computed as:YH = ceil(float(XH) / float(strides[0]))
YW = ceil(float(XW) / float(strides[1]))
and the paddings are computed as:
pad_along_height = max((YH - 1) * strides[0] + FH - XH, 0)
pad_along_width = max((YW - 1) * strides[1] + FW - XW, 0)
pad_top = pad_along_height / 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width / 2
pad_right = pad_along_width - pad_left
For the
VALID
convolution: the output height (:code`YH`) and width (YW
) are computed as:YH = ceil(float(XH - FH + 1) / float(strides[0]))
YW = ceil(float(XW - FW + 1) / float(strides[1]))
and the paddings are always zeros.
Parameters: - x (dynet.Expression) – The input feature maps: (H x W x Ci) x N (ColMaj), 3D tensor with an optional batch dimension
- f (dynet.Expression) – 2D convolution filters: H x W x Ci x Co (ColMaj), 4D tensor
- b (dynet.Expression) – The bias (1D: Ci)
- stride (list) – the row and column strides
Keyword Arguments: is_valid (bool) – ‘VALID’ convolution or ‘SAME’ convolution, default is True (‘VALID’) (default: (True))
Returns: The output feature maps (H x W x Co) x N, 3D tensor with an optional batch dimension
Return type:
-
dynet.
maxpooling2d
(x, ksize, stride, is_valid=True)¶ 2D maxpooling
2D maxpooling operator.
VALID
andSAME
maxpooling are supported.Parameters: - x (dynet.Expression) – The input feature maps: (H x W x Ci) x N (ColMaj), 3D tensor with an optional batch dimension
- ksize (list) – the max pooling 2d window size
- stride (list) – the row and column strides
Keyword Arguments: is_valid (bool) – ‘VALID’ or ‘SAME’, default is True (‘VALID’) (default: (True))
Returns: The output feature maps (H x W x Co) x N, 3D tensor with an optional batch dimension
Return type:
-
dynet.
filter1d_narrow
(x, y)¶ [summary]
[description]
Parameters: - x (dynet.Expression) – The first input expression
- y (dynet.Expression) – The second input expression
Returns: TODO
Return type:
-
dynet.
kmax_pooling
(x, k, d=1)¶ Kmax-pooling operation
Select out k maximum values along a given dimension, in the same order as they appear. This will result in the size of the given dimension being changed to k.
Parameters: - x (dynet.Expression) –
- k (unsigned) – Number of maximum values to retrieve along the given dimension
Keyword Arguments: d (unsigned) – Dimension on which to perform kmax-pooling (default: (1))
Returns: Return type:
-
dynet.
circ_conv
(u, v)¶ Circular convolution
Calculate the circular convolution \([u * v]_k=\sum_i u_iv_{(k-i) \mod d}\)
Parameters: - u (dynet.Expression) – The first input expression
- v (dynet.Expression) – The second input expression
Returns: \(u * v\)
Return type:
-
dynet.
circ_corr
(u, v)¶ Circular correlation
Calculate the circular correlation \([u \star v]_k=\sum_i u_iv_{(i + k) \mod d}\)
Parameters: - u (dynet.Expression) – The first input expression
- v (dynet.Expression) – The second input expression
Returns: \(u \star v\)
Return type:
Tensor operations¶
Remark: Compiling the contraction operations takes a lot of time with CUDA. For this reason, only the CPU implementation is compiled by default. If you need those operations, you need to un-comment this line in the source before compiling. TODO: make this simpler.
-
dynet.
contract3d_1d
(x, y)¶ Contracts a rank 3 tensor and a rank 1 tensor into a rank 2 tensor
The resulting tensor \(z\) has coordinates \(z_ij = \sum_k x_{ijk} y_k\)
Parameters: - x (dynet.Expression) – Rank 3 tensor
- y (dynet.Expression) – Vector
Returns: Matrix dynet.Expression
-
dynet.
contract3d_1d_bias
(x, y, b)¶ Same as
contract3d_1d
with an additional bias parameterThe resulting tensor \(z\) has coordinates \(z_{ij} = b_{ij}+\sum_k x_{ijk} y_k\)
Parameters: - x (dynet.Expression) – Rank 3 tensor
- y (dynet.Expression) – Vector
- b (dynet.Expression) – Bias vector
Returns: Matrix dynet.Expression
-
dynet.
contract3d_1d_1d
(x, y, z)¶ Contracts a rank 3 tensor and two rank 1 tensor into a rank 1 tensor
This is the equivalent of calling
contract3d_1d
and then performing a matrix vector multiplication.The resulting tensor \(t\) has coordinates \(t_i = \sum_{j,k} x_{ijk} y_k z_j\)
Parameters: - x (dynet.Expression) – Rank 3 tensor
- y (dynet.Expression) – Vector
- z (dynet.Expression) – Vector
Returns: Vector dynet.Expression
-
dynet.
contract3d_1d_1d_bias
(x, y, z, b)¶ Same as
contract3d_1d_1d
with an additional bias parameterThis is the equivalent of calling
contract3d_1d
and then performing an affine transform.The resulting tensor \(t\) has coordinates \(t_i = b_i + \sum_{j,k} x_{ijk} y_k z_j\)
Parameters: - x (dynet.Expression) – Rank 3 tensor
- y (dynet.Expression) – Vector
- z (dynet.Expression) – Vector
- b (dynet.Expression) – Bias vector
Returns: Vector dynet.Expression
Normalization operations¶
-
dynet.
layer_norm
(x, g, b)¶ Layer normalization
Performs layer normalization :
\[\begin{split}\begin{split} \mu &= \frac 1 n \sum_{i=1}^n x_i\\ \sigma &= \sqrt{\frac 1 n \sum_{i=1}^n (x_i-\mu)^2}\\ y&=\frac {\boldsymbol{g}} \sigma \circ (\boldsymbol{x}-\mu) + \boldsymbol{b}\\ \end{split}\end{split}\]Reference : Ba et al., 2016
Parameters: - x (dynet.Expression) – Input expression (possibly batched)
- g (dynet.Expression) – Gain (same dimension as x, no batch dimension)
- b (dynet.Expression) – Bias (same dimension as x, no batch dimension)
Returns: An expression of the same dimension as
x
dynet.Expression
-
dynet.
weight_norm
(w, g)¶ Weight normalization
Performs weight normalization :
\[\begin{split}\begin{split} \hat{w} &= g\frac{w}{\Vert w\Vert}\\ \end{split}\end{split}\]Reference : Salimans, Kingma 2016
Parameters: - w (dynet.Expression) – Input expression (weight parameter)
- g (dynet.Expression) – Gain (scalar expression, usually also a parameter)
Returns: An expression of the same dimension as
w
dynet.Expression
Recurrent Neural Networks¶
RNN Builders¶
-
class
dynet.
_RNNBuilder
¶ -
disable_dropout
()¶ [summary]
[description]
-
initial_state
(vecs=None, update=True)¶ Get a
dynet.RNNState
This initializes a
dynet.RNNState
by loading the parameters in the computation graphParameters: - vecs (list) – Initial hidden state for each layer as a list of
dynet.Expression
s (default: {None}) - update (bool) – trainer updates internal parameters (default: {True}) NOTE: subsequent calls without calling dynet.renew_cg() will not change the update behavior.
Returns: dynet.RNNState
used to feed inputs/transduces sequences, etc… dynet.RNNState- vecs (list) – Initial hidden state for each layer as a list of
-
initial_state_from_raw_vectors
(vecs, update=True)¶ Get a
dynet.RNNState
This initializes a
dynet.RNNState
by loading the parameters in the computation graphUse this if you want to initialize the hidden states with values directly rather than expressions.
Parameters: - vecs (list) – Initial hidden state for each layer as a list of numpy arrays (default: {None})
- update (bool) – trainer updates internal parameters (default: {True}) NOTE: subsequent calls without calling dynet.renew_cg() will not change the update behavior.
Returns: dynet.RNNState
used to feed inputs/transduces sequences, etc… dynet.RNNState
-
param_collection
()¶
-
set_dropout
(f)¶ [summary]
[description]
Parameters: f (float) – [description]
-
-
class
dynet.
SimpleRNNBuilder
¶ Bases:
dynet._RNNBuilder
Simple RNNBuilder with tanh as the activation. This cell runs according to the following dynamics :
\[\begin{split}\begin{split} h_t & = \tanh(W_{x}x_t+W_{h}h_{t-1}+b)\\ \end{split}\end{split}\]Parameters: - layers (int) – Number of layers
- input_dim (int) – Dimension of the input
- hidden_dim (int) – Dimension of the recurrent units
- model (dynet.ParameterCollection) – ParameterCollection to hold the parameters
-
get_parameter_expressions
()¶ Retrieve the internal parameters expressions of the RNN
The output is a list with one item per layer. Each item is a list containing \(W_{hx},W_{hh},b_h\)
Returns: List of parameter expressions for each layer list Raises: ValueError – This raises an expression if initial_state hasn’t been called because it requires thr parameters to be loaded in the computation graph. However it prevents the parameters to be loaded twice in the computation graph (compared to dynet.parameter(rnn.get_parameters()[0][0])
for example).
-
get_parameters
()¶ Retrieve the internal parameters of the RNN
The output is a list with one item per layer. Each item is a list containing \(W_{hx},W_{hh},b_h\)
Returns: List of parameters for each layer list
-
set_dropout_masks
(batch_size=1)¶ Set dropout masks at the beginning of a sequence for a specific batch size
If this function is not called on batched input, the same mask will be applied across all batch elements. Use this to apply different masks to each batch element
You need to call this __AFTER__ calling initial_state
Parameters: batch_size (int) – Batch size (default: {1})
-
set_dropouts
(d, d_h)¶ Set the dropout rates
The dropout implemented here is the variational dropout introduced in Gal, 2016
More specifically, dropout masks \(\mathbf{z_x}\sim \text{Bernoulli}(1-d)\), \(\mathbf{z_h}\sim \text{Bernoulli}(1-d_h)\) are sampled at the start of each sequence.
The dynamics of the cell are then modified to :
\[\begin{split} h_t & =\tanh(W_{x}(\frac 1 {1-d}\mathbf{z_x} \circ x_t)+W_{h}(\frac 1 {1-d}\mathbf{z_h} \circ h_{t-1})+b) \end{split}\]For more detail as to why scaling is applied, see the “Unorthodox” section of the documentation
Parameters: - d (number) – Dropout rate \(d\) for the input.
- d_h (number) – Dropout rate \(d_h\) for the hidden unit \(h_t\)
-
class
dynet.
GRUBuilder
¶ Bases:
dynet._RNNBuilder
[summary]
[description]
-
get_parameter_expressions
()¶ Retrieve the internal parameters expressions of the GRU
The output is a list with one item per layer. Each item is a list containing \(W_{zx},W_{zh},b_z,W_{rx},W_{rh},b_r,W_{hx},W_{hh},b_h\)
Returns: List of parameter expressions for each layer list Raises: ValueError – This raises an expression if initial_state hasn’t been called because it requires thr parameters to be loaded in the computation graph. However it prevents the parameters to be loaded twice in the computation graph (compared to dynet.parameter(rnn.get_parameters()[0][0])
for example).
-
get_parameters
()¶ Retrieve the internal parameters of the GRU
The output is a list with one item per layer. Each item is a list containing \(W_{zx},W_{zh},b_z,W_{rx},W_{rh},b_r,W_{hx},W_{hh},b_h\)
Returns: List of parameters for each layer list
-
-
class
dynet.
VanillaLSTMBuilder
(layers, input_dim, hidden_dim, model, ln_lstm=False, forget_bias=1.0)¶ Bases:
dynet._RNNBuilder
VanillaLSTM allows to create an “standard” LSTM, ie with decoupled input and forget gate and no peepholes connections
This cell runs according to the following dynamics :
\[\begin{split}\begin{split} i_t & =\sigma(W_{ix}x_t+W_{ih}h_{t-1}+b_i)\\ f_t & = \sigma(W_{fx}x_t+W_{fh}h_{t-1}+b_f+1)\\ o_t & = \sigma(W_{ox}x_t+W_{oh}h_{t-1}+b_o)\\ \tilde{c_t} & = \tanh(W_{cx}x_t+W_{ch}h_{t-1}+b_c)\\ c_t & = c_{t-1}\circ f_t + \tilde{c_t}\circ i_t\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split}\end{split}\]The parameters are initialized as follow: - \(W_{*x}\) (input connections): Sampled from \(\mathcal U\left([\sqrt{\frac{6}{4d_h + d_x}}]\right)\) - \(W_{*h}\) (recurrent connections): Sampled from :math:`mathcal Uleft([sqrt{
rac{6}{4d_h + d_h}}] ight)`
- \(b_{h}\) (biases): Set to \(0\) except for \(d_f\) which is set to \(1\)
- Args:
- layers (int): Number of layers input_dim (int): Dimension of the input hidden_dim (int): Dimension of the recurrent units model (dynet.ParameterCollection): ParameterCollection to hold the parameters ln_lstm (bool): Whether to use layer normalization forget_bias (float): value to use as forget gate bias(default 1.0)
-
get_parameter_expressions
()¶ Retrieve the internal parameters expressions of the VanillaLSTM
The output is a list with one item per layer. Each item is a list containing \(W_x,W_h,b\) where \(W_x,W_h\) are stacked version of the individual gates matrices:
h/x +------+ | | i | | +------+ | | f | | +------+ | | o | | +------+ | | c | | +------+
Returns: List of parameter expressions for each layer list Raises: ValueError – This raises an expression if initial_state hasn’t been called because it requires thr parameters to be loaded in the computation graph. However it prevents the parameters to be loaded twice in the computation graph (compared to dynet.parameter(rnn.get_parameters()[0][0])
for example).
-
get_parameters
()¶ Retrieve the internal parameters of the VanillaLSTM
The output is a list with one item per layer. Each item is a list containing \(W_x,W_h,b\) where \(W_x,W_h\) are stacked version of the individual gates matrices:
h/x +------+ | | i | | +------+ | | f | | +------+ | | o | | +------+ | | c | | +------+
Returns: List of parameters for each layer list
-
set_dropout_masks
(batch_size=1)¶ Set dropout masks at the beginning of a sequence for a specific batch size
If this function is not called on batched input, the same mask will be applied across all batch elements. Use this to apply different masks to each batch element
You need to call this __AFTER__ calling initial_state
Parameters: batch_size (int) – Batch size (default: {1})
-
set_dropouts
(d, d_r)¶ Set the dropout rates
The dropout implemented here is the variational dropout with tied weights introduced in Gal, 2016
More specifically, dropout masks \(\mathbf{z_x}\sim \text(1-d_x)\), \(\mathbf{z_h}\sim \text{Bernoulli}(1-d_h)\) are sampled at the start of each sequence.
The dynamics of the cell are then modified to :
\[\begin{split}\begin{split} i_t & =\sigma(W_{ix}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{ih}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_i)\\ f_t & = \sigma(W_{fx}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{fh}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_f)\\ o_t & = \sigma(W_{ox}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{oh}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_o)\\ \tilde{c_t} & = \tanh(W_{cx}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{ch}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_c)\\ c_t & = c_{t-1}\circ f_t + \tilde{c_t}\circ i_t\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split}\end{split}\]For more detail as to why scaling is applied, see the “Unorthodox” section of the documentation
Parameters: - d (number) – Dropout rate \(d_x\) for the input \(x_t\)
- d_r (number) – Dropout rate \(d_x\) for the output \(h_t\)
-
class
dynet.
CompactVanillaLSTMBuilder
(layers, input_dim, hidden_dim, model)¶ Bases:
dynet._RNNBuilder
CompactVanillaLSTM allows to create an “standard” LSTM, ie with decoupled input and forget gate and no peepholes connections
This cell runs according to the following dynamics :
\[\begin{split}\begin{split} i_t & =\sigma(W_{ix}x_t+W_{ih}h_{t-1}+b_i)\\ f_t & = \sigma(W_{fx}x_t+W_{fh}h_{t-1}+b_f+1)\\ o_t & = \sigma(W_{ox}x_t+W_{oh}h_{t-1}+b_o)\\ \tilde{c_t} & = \tanh(W_{cx}x_t+W_{ch}h_{t-1}+b_c)\\ c_t & = c_{t-1}\circ f_t + \tilde{c_t}\circ i_t\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split}\end{split}\]Parameters: - layers (int) – Number of layers
- input_dim (int) – Dimension of the input
- hidden_dim (int) – Dimension of the recurrent units
- model (dynet.ParameterCollection) – ParameterCollection to hold the parameters
-
get_parameter_expressions
()¶ Retrieve the internal parameters expressions of the CompactVanillaLSTM
The output is a list with one item per layer. Each item is a list containing \(W_x,W_h,b\) where \(W_x,W_h\) are stacked version of the individual gates matrices:
h/x +------+ | | i | | +------+ | | f | | +------+ | | o | | +------+ | | c | | +------+
Returns: List of parameter expressions for each layer list Raises: ValueError – This raises an expression if initial_state hasn’t been called because it requires thr parameters to be loaded in the computation graph. However it prevents the parameters to be loaded twice in the computation graph (compared to dynet.parameter(rnn.get_parameters()[0][0])
for example).
-
get_parameters
()¶ Retrieve the internal parameters of the CompactVanillaLSTM
The output is a list with one item per layer. Each item is a list containing \(W_x,W_h,b\) where \(W_x,W_h\) are stacked version of the individual gates matrices:
h/x +------+ | | i | | +------+ | | f | | +------+ | | o | | +------+ | | c | | +------+
Returns: List of parameters for each layer list
-
set_dropout_masks
(batch_size=1)¶ Set dropout masks at the beginning of a sequence for a specific batch size
If this function is not called on batched input, the same mask will be applied across all batch elements. Use this to apply different masks to each batch element
You need to call this __AFTER__ calling initial_state
Parameters: batch_size (int) – Batch size (default: {1})
-
set_dropouts
(d, d_r)¶ Set the dropout rates
The dropout implemented here is the variational dropout with tied weights introduced in Gal, 2016
More specifically, dropout masks \(\mathbf{z_x}\sim \text(1-d_x)\), \(\mathbf{z_h}\sim \text{Bernoulli}(1-d_h)\) are sampled at the start of each sequence.
The dynamics of the cell are then modified to :
\[\begin{split}\begin{split} i_t & =\sigma(W_{ix}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{ih}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_i)\\ f_t & = \sigma(W_{fx}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{fh}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_f)\\ o_t & = \sigma(W_{ox}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{oh}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_o)\\ \tilde{c_t} & = anh(W_{cx}(\frac 1 {1-d_x}\mathbf{z_x} \circ x_t)+W_{ch}(\frac 1 {1-d_h}\mathbf{z_h} \circ h_{t-1})+b_c)\\ c_t & = c_{t-1}\circ f_t + \tilde{c_t}\circ i_t\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split}\end{split}\]For more detail as to why scaling is applied, see the “Unorthodox” section of the documentation
Parameters: - d (number) – Dropout rate \(d_x\) for the input \(x_t\)
- d_r (number) – Dropout rate \(d_x\) for the output \(h_t\)
-
set_weightnoise
(std)¶ Set the gaussian weight noise
Parameters: std (number) – Standard deviation of weight noise
-
class
dynet.
CoupledLSTMBuilder
¶ Bases:
dynet._RNNBuilder
CoupledLSTMBuilder creates an LSTM unit with coupled input and forget gate as well as peepholes connections.
More specifically, here are the equations for the dynamics of this cell :
\[\begin{split}\begin{split} i_t & =\sigma(W_{ix}x_t+W_{ih}h_{t-1}+W_{ic}c_{t-1}+b_i)\\ \tilde{c_t} & = \tanh(W_{cx}x_t+W_{ch}h_{t-1}+b_c)\\ c_t & = c_{t-1}\circ (1-i_t) + \tilde{c_t}\circ i_t\\ & = c_{t-1} + (\tilde{c_t}-c_{t-1})\circ i_t\\ o_t & = \sigma(W_{ox}x_t+W_{oh}h_{t-1}+W_{oc}c_{t}+b_o)\\ h_t & = \tanh(c_t)\circ o_t\\ \end{split}\end{split}\]-
get_parameter_expressions
()¶ Retrieve the internal parameters expressions of the LSTM
The output is a list with one item per layer. Each item is a list containing \(W_{ix},W_{ih},W_{ic},b_i,W_{ox},W_{oh},W_{oc},b_o,W_{cx},W_{ch},b_c\)
Returns: List of parameter expressions for each layer list Raises: ValueError – This raises an expression if initial_state hasn’t been called because it requires thr parameters to be loaded in the computation graph. However it prevents the parameters to be loaded twice in the computation graph (compared to dynet.parameter(rnn.get_parameters()[0][0])
for example).
-
get_parameters
()¶ Retrieve the internal parameters of the LSTM
The output is a list with one item per layer. Each item is a list containing \(W_{ix},W_{ih},W_{ic},b_i,W_{ox},W_{oh},W_{oc},b_o,W_{cx},W_{ch},b_c\)
Returns: List of parameters for each layer list
-
-
class
dynet.
FastLSTMBuilder
¶ Bases:
dynet._RNNBuilder
[summary]
[description]
-
get_parameter_expressions
()¶ Retrieve the internal parameters expressions of the FastLSTM
The output is a list with one item per layer. Each item is a list containing \(W_{ix},W_{ih},W_{ic},b_i,W_{ox},W_{oh},W_{oc},b_o,W_{cx},W_{ch},b_c\)
Returns: List of parameter expressions for each layer list Raises: ValueError – This raises an expression if initial_state hasn’t been called because it requires thr parameters to be loaded in the computation graph. However it prevents the parameters to be loaded twice in the computation graph (compared to dynet.parameter(rnn.get_parameters()[0][0])
for example).
-
get_parameters
()¶ Retrieve the internal parameters of the FastLSTM
The output is a list with one item per layer. Each item is a list containing \(W_{ix},W_{ih},W_{ic},b_i,W_{ox},W_{oh},W_{oc},b_o,W_{cx},W_{ch},b_c\)
Returns: List of parameters for each layer list
-
-
class
dynet.
BiRNNBuilder
(num_layers, input_dim, hidden_dim, model, rnn_builder_factory, builder_layers=None)¶ Bases:
object
Builder for BiRNNs that delegates to regular RNNs and wires them together.
builder = BiRNNBuilder(1, 128, 100, model, LSTMBuilder) [o1,o2,o3] = builder.transduce([i1,i2,i3])-
add_inputs
(es)¶ returns the list of state pairs (stateF, stateB) obtained by adding inputs to both forward (stateF) and backward (stateB) RNNs. Does not preserve the internal state after adding the inputs. :param es: a list of Expression :type es: list
see also transduce(xs)
code:.transduce(xs) is different from .add_inputs(xs) in the following way:
- code:.add_inputs(xs) returns a list of RNNState pairs. RNNState objects can be
- queried in various ways. In particular, they allow access to the previous state, as well as to the state-vectors (h() and s() )
.transduce(xs)
returns a list of Expression. These are just the output- expressions. For many cases, this suffices. transduce is much more memory efficient than add_inputs.
-
transduce
(es)¶ returns the list of output Expressions obtained by adding the given inputs to the current state, one by one, to both the forward and backward RNNs, and concatenating.
@param es: a list of Expression
see also add_inputs(xs)
.transduce(xs) is different from .add_inputs(xs) in the following way:
- .add_inputs(xs) returns a list of RNNState pairs. RNNState objects can be
- queried in various ways. In particular, they allow access to the previous state, as well as to the state-vectors (h() and s() )
- .transduce(xs) returns a list of Expression. These are just the output
- expressions. For many cases, this suffices. transduce is much more memory efficient than add_inputs.
-
RNN state¶
-
class
dynet.
RNNState
¶ This is the main class for working with RNNs / LSTMs / GRUs. Request an RNNState initial_state() from a builder, and then progress from there.
-
add_input
(x)¶ This computes \(h_t = \text{RNN}(x_t)\)
Parameters: x (dynet.Expression) – Input expression Returns: New RNNState dynet.RNNState
-
add_inputs
(xs)¶ Returns the list of states obtained by adding the given inputs to the current state, one by one.
see also
transduce(xs)
.transduce(xs)
is different from.add_inputs(xs)
in the following way:.add_inputs(xs)
returns a list of RNNState. RNNState objects can be- queried in various ways. In particular, they allow access to the previous
state, as well as to the state-vectors (
h()
ands()
)
.transduce(xs)
returns a list of Expression. These are just the output- expressions. For many cases, this suffices.
transduce
is much more memory efficient thanadd_inputs
.Parameters: xs (list) – list of input expressions Returns: New RNNState dynet.RNNState
-
b
()¶ Get the underlying RNNBuilder
In case you need to set dropout or other stuff.
Returns: Underlying RNNBuilder dynet.RNNBuilder
-
h
()¶ tuple of expressions representing the output of each hidden layer of the current step. the actual output of the network is at h()[-1].
-
prev
()¶ Gets previous RNNState
In case you need to rewind
-
s
()¶ tuple of expressions representing the hidden state of the current step.
For SimpleRNN, s() is the same as h() For LSTM, s() is a series of of memory vectors, followed the series followed by the series returned by h():
(c[1],...,c[num_layers], h[1],...,h[num_layers])
-
set_h
(es=None)¶ Manually set the output \(h_t\)
Parameters: es (list) – List of expressions, one for each layer (default: {None}) Returns: New RNNState dynet.RNNState
-
set_s
(es=None)¶ Manually set the hidden states
This is different from
set_h
because, for LSTMs for instance this also sets the cell state. The format is[new_c[0],...,new_c[n],new_h[0],...,new_h[n]]
Parameters: es (list) – List of expressions, in this format : [new_c[0],...,new_c[n],new_h[0],...,new_h[n]]
(default: {None})Returns: New RNNState dynet.RNNState
-
transduce
(xs)¶ returns the list of output Expressions obtained by adding the given inputs to the current state, one by one.
see also
add_inputs(xs)
.transduce(xs)
is different from.add_inputs(xs)
in the following way:.add_inputs(xs)
returns a list of RNNState. RNNState objects can be- queried in various ways. In particular, they allow access to the previous
state, as well as to the state-vectors (
h()
ands()
)
.transduce(xs)
returns a list of Expression. These are just the output- expressions. For many cases, this suffices.
transduce
is much more memory efficient thanadd_inputs
.Parameters: xs (list) – list of input expressions Returns: New RNNState dynet.RNNState
-
Softmax Builders¶
-
class
dynet.
SoftmaxBuilder
¶ Interface for building softmax layers
A softmax layer returns a probability distribution over \(C\) classes given a vector \(h\in\mathbb R^d\), with
\[p(c)\propto \exp(W_i^Th + b_i)\ \forall i\in\{1\ldots C\}\]Where \(W\in \mathbb R^{C\times d}, b \in \mathbb R^C\)
-
full_log_distribution
(x, update=True)¶ Returns an Expression representing a vector the size of the number of classes.
The ith dimension gives \(\log p(c_i | x)\). This function may be SLOW. Avoid if possible.
Parameters: - x (dynet.Expression) – Input vector
- update (bool) – Whether to update the parameters or not (default: {True})
Returns: Vector of \(\log(p(c\mid x)\) dynet.Expression
-
full_logits
(x, update=True)¶ Returns the logits (before application of the softmax)
The ith dimension gives \(W_i^Tx + b_i\)
Parameters: - x (dynet.Expression) – Input vector
- update (bool) – Whether to update the parameters or not (default: {True})
Returns: Expression for the logits dynet.Expression
-
neg_log_softmax
(x, c, update=True)¶ Negative log probability of a class
Given class \(c\) and vector \(x\), this returns \(-\log(p(c \mid x))\)
Parameters: - x (dynet.Expression) – Input vector
- c (unsigned) – Class id
- update (bool) – Whether to update the parameters or not (default: {True})
Returns: Log probability of given class dynet.Expression
-
neg_log_softmax_batch
(x, c, update=True)¶ Batched version of
neg_log_softmax
Parameters: - x (dynet.Expression) – Input vector (batched)
- c (list) – list of class ids (one per batch element)
- update (bool) – Whether to update the parameters or not (default: {True})
Returns: Log probability of given class dynet.Expression
-
param_collection
()¶ Returns the ParameterCollection containing the softmax parameters
The first parameter in the parametercollection is the weight matrix, the second is the biases (if any)
Returns: Subcollection holding the parameters ParameterCollection
-
sample
(x)¶ Sample from the softmax distribution
Parameters: x (dynet.Expression) – Input vector Returns: Sampled class int
-
-
class
dynet.
StandardSoftmaxBuilder
¶ Bases:
dynet.SoftmaxBuilder
This class implements the standard Softmax
-
class
dynet.
ClassFactoredSoftmaxBuilder
¶ Bases:
dynet.SoftmaxBuilder
Class factored softmax
Each class is separated into a subclass, ie \(p(i\mid h)=p(i\mid h, c) p(c\mid h)\) where \(c\) is a class and \(i\) a subclass
-
class_log_distribution
(x, update=True)¶ Get log distribution over classes
Parameters: - x (dynet.Expression) – Input vector
- update (bool) – Whether to update the parameters or not (default: {True})
Returns: Vector of \(\log(p(c\mid x)\) dynet.Expression
-
class_logits
(x, update=True)¶ Returns the logits over classes
Parameters: - x (dynet.Expression) – Input vector
- update (bool) – Whether to update the parameters or not (default: {True})
Returns: Expression for the logits dynet.Expression
-
subclass_log_distribution
(x, classid, update=True)¶ Get log distribution over subclasses of class
Parameters: - x (dynet.Expression) – Input vector
- classid (int) – class index
- update (bool) – Whether to update the parameters or not (default: {True})
Returns: Vector of \(\log(p(i\mid x, \texttt{classid})\) dynet.Expression
-
subclass_logits
(x, classid, update=True)¶ Logits over subclasses of class
Parameters: - x (dynet.Expression) – Input vector
- classid (int) – class index
- update (bool) – Whether to update the parameters or not (default: {True})
Returns: Expression for the logits dynet.Expression
-
Optimizers¶
-
class
dynet.
Trainer
¶ Generic trainer
-
learning_rate
¶ Global learning rate for all parameters
Type: number
-
get_clip_threshold
()¶ Get clipping threshold
Returns: Gradient clipping threshold Return type: number
-
restart
(learning_rate=None)¶ Restarts the optimizer
Clears all momentum values and assimilate (if applicable)
Parameters: learning_rate (number) – (Optional) resets the learning rate
-
set_clip_threshold
(thr)¶ Set clipping thershold
Gradients are clipped to 5 by default. To deactivate clipping, set the threshold to be <=0
Parameters: thr (number) – Clipping threshold
-
set_sparse_updates
(su)¶ Sets updates to sparse updates
DyNet trainers support two types of updates for lookup parameters, sparse and dense. Sparse updates are the default. They have the potential to be faster, as they only touch the parameters that have non-zero gradients. However, they may not always be faster (particulary on GPU with mini-batch training), and are not precisely numerically correct for some update rules such as MomentumTrainer and AdamTrainer. Thus, if you set this variable to false, the trainer will perform dense updates and be precisely correct, and maybe faster sometimes. :param su: flag to activate/deactivate sparse updates :type su: bool
-
status
()¶ Outputs information about the trainer in the stderr
(number of updates since last call, number of clipped gradients, learning rate, etc…)
-
update
()¶ Update the parameters
The update equation is different for each trainer, check the online c++ documentation for more details on what each trainer does
-
update_epoch
(r)¶ DEPRECATED: do not use.
-
update_subset
(updated_params, updated_lookups)¶ Update a subset of parameters
Only use this in last resort, a more elegant way to update only a subset of parameters is to use the “update” keyword in dy.parameter or Parameter.expr() to specify which parameters need to be updated __during the creation of the computation graph__
Parameters: - updated_params (list) – Indices of parameters to update
- updated_lookups (list) – Indices of lookup parameters to update
-
-
class
dynet.
SimpleSGDTrainer
¶ Bases:
dynet.Trainer
Stochastic gradient descent trainer
This trainer performs stochastic gradient descent, the goto optimization procedure for neural networks.
Parameters: m (dynet.ParameterCollection) – ParameterCollection to be trained Keyword Arguments: learning_rate (number) – Initial learning rate (default: 0.1)
-
class
dynet.
CyclicalSGDTrainer
¶ Bases:
dynet.Trainer
This trainer performs stochastic gradient descent with a cyclical learning rate as proposed in Smith, 2015.
This uses a triangular function with optional exponential decay.
More specifically, at each update, the learning rate \(\eta\) is updated according to :
\[\begin{split} \begin{split} \text{cycle} &= \left\lfloor 1 + \frac{\texttt{it}}{2 \times\texttt{step_size}} \right\rfloor\\ x &= \left\vert \frac{\texttt{it}}{\texttt{step_size}} - 2 \times \text{cycle} + 1\right\vert\\ \eta &= \eta_{\text{min}} + (\eta_{\text{max}} - \eta_{\text{min}}) \times \max(0, 1 - x) \times \gamma^{\texttt{it}}\\ \end{split}\end{split}\]Parameters: m (dynet.ParameterCollection) – ParameterCollection to be trained
Keyword Arguments: - learning_rate_min (number) – Lower learning rate (default: {0.01})
- learning_rate_max (number) – Upper learning rate (default: {0.1})
- step_size (number) – Period of the triangular function in number of iterations (__not__ epochs). According to the original paper, this should be set around (2-8) x (training iterations in epoch) (default: {2000})
- gamma (number) – Learning rate upper bound decay parameter (1.0 = no decay) (default: {1.0})
-
class
dynet.
MomentumSGDTrainer
¶ Bases:
dynet.Trainer
Stochastic gradient descent with momentum
This is a modified version of the SGD algorithm with momentum to stablize the gradient trajectory.
Parameters: m (dynet.ParameterCollection) – ParameterCollection to be trained
Keyword Arguments: - learning_rate (number) – Initial learning rate (default: 0.1)
- mom (number) – Momentum (default: 0.9)
-
class
dynet.
AdagradTrainer
¶ Bases:
dynet.Trainer
Adagrad optimizer
The adagrad algorithm assigns a different learning rate to each parameter.
Parameters: m (dynet.ParameterCollection) – ParameterCollection to be trained
Keyword Arguments: - learning_rate (number) – Initial learning rate (default: 0.1)
- eps (number) – Epsilon parameter to prevent numerical instability (default: 1e-20)
-
class
dynet.
AdadeltaTrainer
¶ Bases:
dynet.Trainer
AdaDelta optimizer
The AdaDelta optimizer is a variant of Adagrad aiming to prevent vanishing learning rates.
Parameters: m (dynet.ParameterCollection) – ParameterCollection to be trained
Keyword Arguments: - eps (number) – Epsilon parameter to prevent numerical instability (default: 1e-6)
- rho (number) – Update parameter for the moving average of updates in the numerator (default: 0.95)
-
class
dynet.
RMSPropTrainer
¶ Bases:
dynet.Trainer
RMSProp optimizer
The RMSProp optimizer is a variant of Adagrad where the squared sum of previous gradients is replaced with a moving average with parameter rho.
Parameters: m (dynet.ParameterCollection) – ParameterCollection to be trained
Keyword Arguments: - learning_rate (number) – Initial learning rate (default: 0.001)
- eps (number) – Epsilon parameter to prevent numerical instability (default: 1e-8)
- rho (number) – Update parameter for the moving average (rho = 0 is equivalent to using Adagrad) (default: 0.9)
-
class
dynet.
AdamTrainer
¶ Bases:
dynet.Trainer
Adam optimizer
The Adam optimizer is similar to RMSProp but uses unbiased estimates of the first and second moments of the gradient
Parameters: m (dynet.ParameterCollection) – ParameterCollection to be trained
Keyword Arguments: - alpha (number) – Initial learning rate (default: 0.001)
- beta_1 (number) – Moving average parameter for the mean (default: 0.9)
- beta_2 (number) – Moving average parameter for the variance (default: 0.999)
- eps (number) – Epsilon parameter to prevent numerical instability (default: 1e-8)
-
class
dynet.
AmsgradTrainer
¶ Bases:
dynet.Trainer
AMSGrad optimizer
The AMSGrad optimizer is similar to Adam which uses unbiased estimates of the first and second moments of the gradient, however AMSGrad keeps the maximum of all the second moments and uses that instead
Parameters: m (dynet.ParameterCollection) – ParameterCollection to be trained
Keyword Arguments: - alpha (number) – Initial learning rate (default: 0.001)
- beta_1 (number) – Moving average parameter for the mean (default: 0.9)
- beta_2 (number) – Moving average parameter for the variance (default: 0.999)
- eps (number) – Epsilon parameter to prevent numerical instability (default: 1e-8)
MultiDevice¶
-
dynet.
to_device
(e, device_str)¶ Copy Expression’s values between devices. Creates a new expression with e’s values on device device_str.
Parameters: - e (dynet.Expression) – Expression
- device_str (string) – a device name
Returns: dynet.Expression