cl-waffe

Neural Networks

model-list

Holds submodules in a list.

Model-List it contains are properly tracked by find-variables.

Note: This Layer is exported from Package cl-waffe.

Parameters

(model-list list)

list (list)
an list of models

This model can also be created by mlist

(mlist models) ; -> [Model: MODEL-LIST]

Forward

(call (Model-List) index &rest args)

Note that index must be waffetensor.

To avoid this, mth is available.

(call (mth 0 (Model-List)) &rest args)
index (waffetensor of which data is fixnum)
an index of models
args (list)
arguments for index-th model

Example

(setq models (Model-List (list (linearlayer 10 1)(linearlayer 10 1))))
(call models (const 0)(!randn `(10 10)))
(call (mth 0 models)(!randn `(10 10)))

Linearlayer

Applies a linear transformation to the incoming data: (setq y (!add (!matmul x weight) bias))

Parameters

(LinearLayer in-features out-features &optional (bias T))
in-features (fixnum)
size of each input sample
out-features (fixnum)
size of each output sample
bias (boolean)
If set to nil, the layer will not learn an additive bias. default:t

Shape

LinearLayer: (batch-size in-features) -> (batch-size out-features)

Input
x (Tensor) where the x is the shape of (batch-size in-features)
Output
Output: an tensor that applied linearlayer, where the tensor is the shape of (batch-size out-features)

Forward

(call (LinearLayer 10 1) x)
x
the input tensor

Example

(call (LinearLayer 10 1)(!randn `(10 10)))

DenseLayer

Calling LinearLayer, and activation specified in activation.

Parameters

(DenseLayer in-features out-features &optional (bias t)(activation :relu))
in-features (fixnum)
size of each input sample
out-features (fixnum)
size of each output sample
bias (boolean)
If set to nil, the layer will not learn an additive bias. default:t
activation (keyword or function)
activation are following: :relu :sigmoid :tanh, If set to function, that is called as an activation.

Shape

DenseLayer: (batch-size in-features) -> (batch-size out-features)
Input
x (Tensor) where the x is the shape of (batch-size in-features)
Output
Output: an tensor that applied denselayer, where the tensor is the shape of (batch-size out-features)

Forward

(call (DenseLayer 10 1) x)
x
the input tensor

Example

(call (DenseLayer 10 1)(!randn `(10 10)))

Dropout

When *no-grad* is nil, dropout randomly zeroes some elements of the given tensor with sampling bernoulli tensor of dropout-rate.

Futhermore, the outputs are scaled by (/ (- 1 (self dropout-rate))), (i.e.: This is a Inverted Dropout.). This means when *no-grad* is t (i.e.: during predicting) dropout simply returns the given tensor.

Parameters

(dropout &optional (dropout-rate 0.5))
dropout-rate
Dropout samples bernoulli distribution based on dropout-rate.

Shape

Dropout: (Any) -> (The same as a input)
Input
Any is OK
Output
The same as given input's shape.

Forward

(setq x (!randn `(10 10)))
;#Const(((-0.59... -0.09... ~ 0.289... 0.390...)        
;                 ...
;        (1.447... 1.032... ~ -0.66... -0.55...)) :mgl t :shape (10 10))
(call (Dropout 0.5) x)
;#Const(((0.0 -0.19... ~ 0.0 0.0)        
;                 ...
;        (2.895... 2.064... ~ 0.0 -1.10...)) :mgl t :shape (10 10))

BatchNorm2d

Applies BatchNorm2D.

Parameters

(BatchNorm2D in-features &key (affine t)(epsilon 1.0e-7))
in-features
an excepted input of size
affine
if t, the model has trainable affine layers.
epsilon
the value used to the denominator for numerical stability. Default: 1.0e-7

Shape

(call (BatchNorm2D) x)
BatchNorm2D : (any, in-feature) -> (the same as input of shape)

Example

(setq model (BatchNorm2D 10))
(call model (!randn `(30 10)))

LayerNorm

Embedding

A simple lookup table object to store embedding vectors for NLP Models.

Parameter

(Embedding vocab-size embedding-ize &key pad-idx)
vocab-size
(fixnum) size of the dictionary of embeddings
embedding-size
(fixnum) the size of each embedding tensor
pad-idx
If specified, the entries at padding_idx do not contribute to the gradient. If nil, ignored.

Shape

(call (Embedding 10 10) x)

Embedding: (batch-size sentence-length) -> (batch-size sentence-len embedding-dim)

x
input x, where each element are single-float (like 1.0, 2.0 ...)

Example

(setq model (cl-waffe.nn:Embedding 10 20))

(call model (!ones `(1 10)))
#Const((((-0.01... -0.01... ~ 0.013... 0.002...)         
                   ...
         (-0.01... -0.01... ~ 0.013... 0.002...))) :mgl t :shape (1 10 20))

RNN

Applies a multi-layer RNN with tanh or ReLU.

The assumption is that (setf !aref)'s backward contributes to it.

Parameters

(RNN input-size hidden-size &key (num-layers 1)(activation :tanh)(bias t)(dropout nil)(biredical nil))
input-size
The number of excepted features of x
hidden-size
The number of features in hidden-layer
num-layers
Number of reccurent layers
activation
Can be either :tanh or :relu
bias
(boolean) If t, the model has a trainable bias.
dropout
(boolean) If t, the model has a dropout layer.
biredical
(boolean) If t, the model become a biredical RNN

Shape

(call (RNN 10 10) x &optional (hs nil))

RNN : (batch-size sentence-length input-size) -> (batch-size sentence-length hidden-size)

x
the input x where the shape is (batch-size sentence-length input-size)
hs
The last hidden-state. if nil, the model creates a new one.

Example

(setq model (RNN 10 20))
(setq embedding (Embedding 10 10))
(call model
  (call embedding (!one `(10 10))))

;#Const((((-1.46... -1.46... ~ -5.53... 1.766...)         
;                   ...
;         (-1.46... -1.46... ~ -5.53... 1.766...))        
;                 ...
;        ((-1.46... -1.46... ~ -5.53... 1.766...)         
;                   ...
;         (-1.46... -1.46... ~ -5.53... 1.766...))) :mgl t :shape (10 10 20))
  

LSTM

GRU

MaxPooling

AvgPooling

Conv1D

Conv2D

Transformer

TransformerEncoderLayer

TransformerDecoderLayer

CrossEntropy

cross-entropy(x y &optional (delta 1.0e-7) (epsilon 0.0))

This criterion computes the cross entropy loss between x and y.

If epsilon is greater than 0.0, smooth-labeling is enabled.

If avoid-overrflow is t, x is substracted by x's average in order to avoid overflowing.

delta is the value used for (!log (!add x delta)).

x is a probability distribution.

y is a proablitity distribution or labels.

If y is labels, y is fixed to a probability distribution.

SoftMaxCrossEntropy

softmax-cross-entropy(x y &key (avoid-overflow t) (delta 1.0e-7) (epsilon 0.0))

This criterion computes the softmax cross entropy loss between x and y.

If epsilon is greater than 0.0, smooth-labeling is enabled.

If avoid-overrflow is t, x is substracted by x's average in order to avoid overflowing.

delta is the value used for (!log (!add x delta)).

x is a probability distribution.

y is a proablitity distribution or labels.

If y is labels, y is fixed to a probability distribution.

MSE

mse(p y)

Computes MSE Loss.

mse is defined as (!mean (!pow (!sub p y) 2) 1)

L1Norm

L2Norm

BinaryCrossEntropy

KLdivLoss

CosineSimilarity