cl-waffe

Using Tensor

Basic Tensor Operations

There is a section here that explains the basics of tensors.

Building Computation Nodes

Generally, the structure WaffeTensor is used in order to use waffe's APIs, building computation nodes.

WaffeTensor's slot can store the following data structures, being accessed by (data tensor).

  1. fixnum
  2. float
  3. boolean
  4. cons
  5. simple-array (Automatically converted to mgl-mat:mat)
  6. mgl-mat:mat
  7. ratio (Automatically coerced to single-float)

Internally, the matrix of WaffeTensor is a just mgl-mat, depending on it for the most part. (that is, what mgl-mat to cl-waffe is what Numpy to Chainer.)

So it is highly recommended to check out mgl-mat's official repository before using cl-waffe.

Construct Tensors

There's three ways to create tensor depending on its purpose.

Constants

Constant is used when no gradient is required, being created with a function (const value).

(setq a (const 1.0))
;#Const(1.0)

; Using cl-waffe's APIs.
(!add a (const 2.0))
;#Const(3.0)

; Initializes a tensor with sampiling beta distribution.
(!beta `(10 10) 5.0 1.0)
;#Const(((0.866... 0.801... ~ 0.836... 1.0)        
;                 ...
;        (0.826... 1.0 ~ 1.0 0.835...)) :mgl t :shape (10 10))

Parameter Tensors

Parameter tensors is used when gradient is required, being created with a function (tensor value) or macro (parameter const).

Created tensors will be required gradients, they will be created with a function (backward out), being accessed by (grad tensor).

In each training step, we have to reset their gradients. (zero-grad) which provided by deftrainer will be useful.

(setq a (tensor 5.0))
(setq b (tensor 3.0))
(setq c (tensor 3.0))

(setq z (!add (!mul a b) c)) ; using cl-waffe's APIs will produce computation nodes.
;#Const(18.0)
(print (cl-waffe::waffetensor-state z)) ; They're stored in its state.
; [Node : ADDTENSOR]
(print (cl-waffe::waffetensor-variables z)) ; Also it contains infomations about nodes.
; (#Const(15.0) #Parameter{3.0 :device :MGL :backward NIL})
(backward z)
; NIL

(grad a)
; 3.0
(grad b)
; 5.0
(grad c)
; 1.0

Also, parameter tensors can be created like:

(setq a (parameter (!randn `(10 10))))
;#Parameter{((-1.27... 2.076... ~ 2.816... 1.285...)            
;                         ...
;            (0.837... -0.62... ~ 1.735... -0.08...)) :mgl t :shape (10 10) :device :MGL :backward NIL}

Let's check the example in the case of defining a Simple LinearLayer.

Optimizers defined by a macro defoptimizer can track the model's parameters and update them depending on their style.

Optimizers will be accesed through deftrainer.

(defmodel LinearLayer (in-features out-features &optional (bias T))
  :parameters ((weight
		(parameter (!mul 0.01 (!randn `(,in-features ,out-features))))
		:type waffetensor)
	      (bias (if bias
			(parameter (!zeros `(1 ,out-features)))
			nil)))
  :forward ((x)
	    (cl-waffe.nn:linear x (self weight)(self bias))))

(deftrainer ExampleTrainer ()
  :model          (LinearLayer 10 3)
  :optimizer      cl-waffe.optimizers:Adam
  :optimizer-args (:lr lr)
  :step-model ((x y)
	       (zero-grad)
	       (let ((out (cl-waffe.nn:softmax-cross-entropy (call (self model) x) y)))
		 (backward out)
		 (update) ; calling trainer's optimizers.
		 out))
 :predict ((x)(call (model) x)))

Sysconst

Sysconst is used to store temporary data during the calculation process.

In a macro defnode, in each process returning a result, using sysconst is a little faster ways than creating constants with (const tensor)

(defnode ExampleAddNode nil
    :forward ((x y)
              (sysconst ; Tensors created with sysconst will be cached well.
	         (+ (data x)(data y))))
    :backward ((dy)(list dy dy)))

Accessing Tensor

data(tensor)

Access tensor's data. This won't be copied.

When tensor's data is lazy evaluted, this function behave following:

  1. When tensor is transposed and lazy evaluted, directly returns function object for speed.
  2. When tensor is cached and lazy evaluted, returns mat object.
Input
WaffeTensor
Output
mgl-mat:mat, or waffetensorcontentdata

when (data tensor) is a function and is:

cached mat
Return mgl-mat, this do not make copy
lazy-evaluation or transposed
Return function itself

Note: this function is setfable and inlined

value(tensor &key (ignore-transpose nil))

Access tensor's data, but if tensor is lazy-evaluated, eval them.

Note: this is not setfable

detach(tensor)

Create a Const with all information except data and backend erased.

This macro expanded to (const (data tensor)).

Note: this macro doesn't clone data itself.

Example:

(setq a (parameter (!randn `(10 10))))
;#Parameter{((0.062... 0.716... ~ 0.088... 0.692...)            
;                         ...
;            (0.458... 0.194... ~ 0.902... 0.480...)) :mgl t :shape (10 10) :device :MGL :backward NIL}
(detach a)
;#Const(((0.062... 0.716... ~ 0.088... 0.692...)        
;                 ...
;        (0.458... 0.194... ~ 0.902... 0.480...)) :mgl t :shape (10 10))

backward and predicting mode

with-no-grad(&body body)

This macro is used in order to implict that codes below is ignored: save-for-backward, creating new node object, using backward and processes for it.

For tasks in which grads are not required, using it helps better performance.

(with-no-grad
  (call (model) x))
*no-grad*
When t, some node will be ignored. see references below for details. default: nil
backward(tensor)

Compute back propagation by traversing the Tensor's computation node.

The parameters of the model defined by (tensor) or to which (Parameter tensor) is applied, store the gradient in grad slot.

Note that: tensor must be the shape of `(1) or single value. Otherwise an error occurs.

In the process calculating backward, new backwards won't be created. (*no-grad* automatically becomes t)

Input
WaffeTensor
Output
NIL

Calling Forward of cl-waffe's objects

call(model &rest args)

Calls the forward steps which defined in: defnode, defmodel, defoptimizer.

All forward steps must be called through this function, otherwise the returned tensor doesn't have: computation nodes, thread-datum which supports performance.

Building computation nodes is ignored when *no-grad* is t.

model
Your initialized model/node/optimizer objects
args
Arguments :forward needs

Example:

(defnode Add nil
  :optimize t
  :parameters nil
  :forward  ((x y)
	     (sysconst (+ (data x)(data y))))
  :backward ((dy)(list dy dy)))

(call (Add)(const 1.0)(const 1.0))
;=>Const(2.0)

Output: Waffetensor of list which comprised of waffetensor.

with-calling-layers(input &rest layers)

This macro allows to sequentially call layers.

the argument input must be a tensor.

Refering each layers from (self) macro, destructively modifying x with the returned value.

Note: This macro supposes models to be returned a single tensor, not a list.

(defmodel MLP (activation)
   :parameters ((layer1   (denselayer (* 28 28) 512 T activation))
   	        (layer2   (denselayer 512 256 T activation))
	        (layer3   (linearlayer 256 10 T)))
   :forward ((x)
	     (with-calling-layers x
	       (layer1 x)
 	       (layer2 x)
               (layer3 x))))

For the different arguments.

(with-calling-layers x
     (layer1 x 1 1)
     (layer2 1 x 2)
     (layer3 x y))

Output: An last value of layers.

Displaying Tensors

*default-backend*
Default backend cl-waffe uses. Default: :mgl

Configs when printing tensor.

*print-char-max-len*
When printing tensor, the character displayed following this param. (e.g. When 5, in your terminal, 1.12345d0 => 1.1234...) Default: 5
*print-arr-max-size*
When printing tensor, the tensor displayed following this param. (e.g. When 5, in your terminal, (1 2 3 4 5 6 7 8 9 10) => (1 2 3 ... 4 5 6)) Default: 6
*print-mat-max-size*
When printing tensor, the tensor displayed following this param. (e.g. When 3, in your terminal, ((1)(2)(3)(4)) => ((1)(2) ... (4)))

Types

waffetensorcontenttypenil

An type of data that allowed to make tensors with (const ~) or (tensor ~).

cl-waffe automatically coerce them to arbitary types

`(or mgl-mat:mat simple-array waffesupporteddatatype)

waffesupporteddatatypenil

An type of waffe-tensor's content type,

`(or fixnum float null cons function ratio)

Lazy evaluation

In default cl-waffe produces lazy-evaluated computation nodes.

A function (!transpose tensor) is a good example to demonstrate.

(setq a (!randn `(10 5)))
;#Const(((0.483... -0.52... -1.44... -0.06... 0.185...)        
;                 ...
;        (-0.85... 1.668... -0.27... 0.016... -0.45...)) :mgl t :shape (10 5))

(setq a (!transpose a))
;#Const(#<FUNCTION (LABELS CL-WAFFE.BACKENDS.MGL::LAZYTRANSPOSE :IN CL-WAFFE.BACKENDS.MGL::LAZY-EVAL-TRANSPOSE) {100C25EBEB}>)

(!shape a) ; Shapes can be accesed correctly (5 10)

; Transpose will be used with !matmul.

(!matmul a (!randn `(10 5)))
;#Const(((-5.39... 1.782... 2.277... -6.13... -6.14...)        
;                 ...
;        (-3.24... -1.60... 4.533... -2.23... 0.736...)) :mgl t :shape (5 5))

; After being called with (value tensor), lazy-evaluate is done and a is now:

;#Const(((0.483... -0.52... -1.44... -0.06... 0.185...)        
;                 ...
;        (-0.85... 1.668... -0.27... 0.016... -0.45...)) :mgl t :shape (10 5))

; Transpose won't destruct a.

; If you don't want to use lazy-evaluation, (!transpose1 tensor) is available. See Operators.

Lazy-Evaluation will be enabled when...

  1. a function (!transpose tensor)
  2. JIT is enabled
  3. Traicing is enabled

Broadcasting

cl-waffe supports broadcasting tensors like Numpy.

Broadcasting do:

  1. If the dimension of two tensors doesn't match in specified axis, repeats them if can (the number of axes on either axis is 1.). Otherwise errors.
  2. If the two tensor's dims doesn't match, add one to the head of lesser Tensor's dim.

Broadcasting is available to these operations.

  1. !add
  2. !sub
  3. !mul
  4. !div
  5. (setf !aref)

;(!randn `(10))'s dim became: `(10) -> `(1 10) -> repeat it with (:axis = 1, :repeat-num = 10)
(!add (!randn `(10 10))(!randn `(10)))
;#Const(((-0.77... -0.32... ~ 1.563... -2.87...)        
;                 ...
;        (0.077... 3.698... ~ 1.669... -1.51...)) :mgl t :shape (10 10))

;The first argument of !add will be repeated with (:axis=1, :repeat-num=3)
(!add (!randn `(10 1 10))(!randn `(10 3 10)))
;#Const((((3.238... 0.185... ~ 2.035... -1.33...)         
;                   ...
;         (0.302... -0.20... ~ 1.731... -0.58...))        
;                 ...
;        ((-0.93... 0.992... ~ -1.50... -2.81...)         
;                   ...
;         (1.669... 0.659... ~ 1.218... -0.88...))) :mgl t :shape (10 3 10))

JIT

Currently this feature is disabled.

cl-waffe dynamically defines kernel... (its performance problems remain to be solved.)

Tracing

Currently this feature is disabled. (cuz it's unstable)

cl-waffe can optimize define-by-run style codes by tracing their usage...

Compute tensors in a destructive way

In general, the cl-waffe APIs follow the following simple rules:

Side Effect Rules:

  1. Operators whose names begin with ! will copy a base tensor and produce a new matrix every time they are called.
  2. Operators whose names begin with !! will destructively assign a result to the first argument.
  3. When the destructive operator's first argument is not a mat, assign a result to the second argument. otherwise create a new mat.

Note: Destructive Operation only supports when the tensor is a type of mgl-mat:mat.

As a numerical library, creating a new tensors each calculation step is only a waste of memory-space. In term of speed/memory, it is recommended to use destructive operations.

The code written in non-destructive APIs can be rewritten with destructive APIs in a simple way as all you have to do is follow the rules below:

First, prepare the fomula where all operations are non-destructive. In the case of (!exp (!exp x)), this operation creates a new tensor whose shape is the same as x for twice times. To rewrite it without making a new side effect, the deeper !exp should be rewritten to (!!exp ). That is, (!!exp (!exp x)).

Let's take a another example of making BatchNorm2D faster.

This is a slower ver of softmax.

(defun !average (x)
  (let ((z (!sum x 1))
	(batch-size (!shape x 0)))
    (!div z batch-size)))

(defun softmax (x)
  (let* ((x1 (!sub x (!average x)))
         (xe (!exp x1))
	 (z (!sum xe 1)))
     (!div xe z)))

Benchmarking it with time macro, it is:

(setq a (!randn `(1000 1000)))
;#Const(((0.129... -0.92... ~ -1.01... -0.86...)        
;                 ...
;        (-0.48... -0.04... ~ 0.375... 1.610...)) :mgl t :shape (1000 1000))

(time (softmax a))
;Evaluation took:
;  0.022 seconds of real time
;  0.021648 seconds of total run time (0.020327 user, 0.001321 system)
;  100.00% CPU
;  52,580,716 processor cycles
;  1 page fault
;  20,031,088 bytes consed
  
;#Const(((6.757... 2.345... ~ 2.141... 2.490...)        
;                 ...
;        (3.850... 5.976... ~ 9.139... 0.003...)) :mgl t :shape (1000 1000))

Rewriting it with a destructive API.

In !average function , z and x aren't combined because z is a new tensor produced by !sum which is a non-destructive API.

So, !div should be destructive.

!average become:

(defun !average (x)
  (let ((z (!sum x 1))
	(batch-size (!shape x 0)))
    (!!div z batch-size)))

Rewrite Softmax using similar steps.

(defun softmax (x)
  (let* ((x1 (!!mul -1.0 (!!sub (!average x) x))) ; Reversing x and (!average) in !sub, the operator returns -1.0x result.
         (xe (!!exp x1))
	 (z (!sum xe 1)))
     (!!div xe z)))

So, the whole code is:

(defun !average (x)
  (let ((z (!sum x 1))
	(batch-size (!shape x 0)))
    (!!div z batch-size)))

(defun softmax (x)
  (let* ((x1 (!!mul -1.0 (!!sub (!average x) x))) ; Reversing x and (!average) in !sub, the operator returns -1.0x result.
         (xe (!!exp x1))
	 (z  (!sum xe 1)))
     (!!div xe z)))

Benchmarking it, it is:

(time (softmax a))
;Evaluation took:
;  0.019 seconds of real time
;  0.019877 seconds of total run time (0.019687 user, 0.000190 system)
;  105.26% CPU
;  44,664,672 processor cycles
;  16,032,704 bytes consed
  
  
;#Const(((6.757... 2.345... ~ 2.141... 2.490...)        
;                 ...
;        (3.850... 5.976... ~ 9.139... 0.003...)) :mgl t :shape (1000 1000))

(print a) ; A is not destructed.
;#Const(((0.129... -0.92... ~ -1.01... -0.86...)        
;                 ...
;        (-0.48... -0.04... ~ 0.375... 1.610...)) :mgl t :shape (1000 1000))

Compared to pure mgl-mat's implementation.

(defun softmax! (x)
  (let ((result (make-mat (!shape x)))
        (tmp    (make-mat `(1 ,@(cdr (!shape x)))))
	(x      (copy-mat (data x))))
       (sum! x tmp :axis 1)
       (scal! (/ 1.0 (mat-dimension x 1)) tmp)
       (fill! 1.0 result)
       (scale-rows! tmp result)
       (axpy! -1.0 result x)
       (.exp! x)
       (sum! x tmp :axis 1)
       (fill! 1.0 result)
       (scale-rows! tmp result)
       (.inv! result)
       (const (.*! x result))))

(time (softmax! a))
;Evaluation took:
;  0.016 seconds of real time
;  0.017635 seconds of total run time (0.015160 user, 0.002475 system)
;  112.50% CPU
;  38,725,238 processor cycles
;  8,030,512 bytes consed
  
;#Const(((6.757... 2.345... ~ 2.141... 2.490...)        
;                 ...
;        (3.850... 5.976... ~ 9.139... 0.003...)) :mgl t :shape (1000 1000))

cl-waffe has a lot of challenges in terms of memory usage, but in terms of speed it comes close to writing with mgl-mat alone.

Currently(2023/2/26), the benchmark is (added type declations):

(defun !average (x)
  (declare (optimize (speed 3))
           (type waffetensor x))
  (let ((z (!sum x 1))
	(batch-size (!shape x 0)))
    (!div z batch-size)))

(defun softmax (x)
  (declare (optimize (speed 3))
           (type waffetensor x))
  (let* ((x1 (!sub x (!average x)))
         (xe (!exp x1))
	 (z (!sum xe 1)))
     (!div xe z)))

; destructive ver.
(defun !average1 (x)
  (declare (optimize (speed 3))
           (type waffetensor x))
  (let ((z (!sum x 1))
	(batch-size (!shape x 0)))
    (!!div z batch-size)))

(defun softmax1 (x)
  (declare (optimize (speed 3))
           (type waffetensor x))
  (let* ((x1 (!!mul -1.0 (!!sub (!average1 x) x)))
         (xe (!!exp x1))
	 (z  (!sum xe 1)))
     (!!div xe z)))

; mgl-mat

(defun softmax2 (x)
  (declare (optimize (speed 3))
           (type waffetensor x))
  (let ((result (make-mat (!shape x)))
        (tmp    (make-mat `(1 ,@(cdr (!shape x)))))
	(x      (copy-mat (data x))))
       (sum! x tmp :axis 1)
       (scal! (/ 1.0 (mat-dimension x 1)) tmp)
       (fill! 1.0 result)
       (scale-rows! tmp result)
       (axpy! -1.0 result x)
       (.exp! x)
       (sum! x tmp :axis 1)
       (fill! 1.0 result)
       (scale-rows! tmp result)
       (.inv! result)
       (const (.*! x result))))

(defparameter n 1000)

(time (loop for i fixnum upfrom 0 below n
            do (softmax a)))
;Evaluation took:
;  0.340 seconds of real time
;  0.326843 seconds of total run time (0.322840 user, 0.004003 system)
;  [ Run times consist of 0.003 seconds GC time, and 0.324 seconds non-GC time. ]
;  96.18% CPU
;  784,081,358 processor cycles
;  233,002,704 bytes consed

(time (loop for i fixnum upfrom 0 below n
            do (softmax1 a)))
;Evaluation took:
;  0.347 seconds of real time
;  0.335853 seconds of total run time (0.332290 user, 0.003563 system)
;  [ Run times consist of 0.003 seconds GC time, and 0.333 seconds non-GC time. ]
;  96.83% CPU
;  801,195,214 processor cycles
;  187,704,864 bytes consed

(time (loop for i fixnum upfrom 0 below n
            do (softmax2 a)))
;Evaluation took:
;  0.232 seconds of real time
;  0.219684 seconds of total run time (0.216419 user, 0.003265 system)
;  [ Run times consist of 0.003 seconds GC time, and 0.217 seconds non-GC time. ]
;  94.83% CPU
;  535,165,520 processor cycles
;  92,326,496 bytes consed

Creating Destructive Operations

Using these macros below, you can inform cl-waffe's kernel of which tensors should be destructed.

!allow-destruct(tensor)

Tensors which path through this macro are allowed to be destructed by cl-waffe's kernel.

In default, cl-waffe's operators won't make side effects.

(setq a (!randn `(3 3)))

;#Const(((0.811... -0.43... -0.91...)        
;                 ...
;        (0.959... -0.62... 1.150...)) :mgl t :shape (3 3))

(!exp a)
;#Const(((2.252... 0.645... 0.400...)        
;                 ...
;        (2.610... 0.534... 3.159...)) :mgl t :shape (3 3))

(print a)
;#Const(((0.811... -0.43... -0.91...)        
;                 ...
;        (0.959... -0.62... 1.150...)) :mgl t :shape (3 3))

However, This macro let kernel know that the given tensor is allowed to destruct(i.e.: the result is overwritten)

(setq a (!randn `(3 3)))

;#Const(((0.811... -0.43... -0.91...)        
;                 ...
;        (0.959... -0.62... 1.150...)) :mgl t :shape (3 3))

(!allow-destruct a)
; T

(!exp a)
;#Const(((2.252... 0.645... 0.400...)        
;                 ...
;        (2.610... 0.534... 3.159...)) :mgl t :shape (3 3))

(print a) ; You can see the result is overwritten.
;#Const(((2.252... 0.645... 0.400...)        
;                 ...
;        (2.610... 0.534... 3.159...)) :mgl t :shape (3 3))

Avoiding copy, destructive operations are superior in terms of memory usage.

(setq a (!randn `(100 100)))

(time (!exp a))
;Evaluation took:
;  0.000 seconds of real time
;  0.000275 seconds of total run time (0.000219 user, 0.000056 system)
;  100.00% CPU
;  498,150 processor cycles
;  31,264 bytes consed

(!allow-destruct a)

(time (!exp a))
; Evaluation took:
;  0.000 seconds of real time
;  0.000178 seconds of total run time (0.000160 user, 0.000018 system)
;  100.00% CPU
;  273,646 processor cycles
;  0 bytes consed 

See also: !disallow-destruct which does the opposite.

!disallow-destruct(tensor)
Tensors that path through this macro are not destructed.
(setq a (!randn `(3 3)))
;#Const(((1.084... -1.10... 1.406...)        
;                 ...
;        (1.044... 0.059... -0.53...)) :mgl t :shape (3 3))

(!allow-destruct a)
; T
(!disallow-destruct a)
; NIL

(!exp a)
;#Const(((2.957... 0.329... 4.080...)        
;                 ...
;        (2.840... 1.060... 0.584...)) :mgl t :shape (3 3))

(print a) ; a is kept remained.
;#Const(((1.084... -1.10... 1.406...)        
;                 ...
;        (1.044... 0.059... -0.53...)) :mgl t :shape (3 3))

Logging

with-verbose(&body body)
In the codes below, the computation nodes will be displayed when (backward out)

Backends

define-node-extension(name &key optimize backend forward backward)

Adds a new backend to the defined node.

The type of backend is managed by keywords. The backend defined in defnode is always :mgl.

Defined backends can be switched by the macro (with-backend backend).

As long as *restart-non-exist-backend* is t, when a computation node reaches a backend that is not defined, :mgl is called, otherwise the condition backend-doesnt-exists will occurs.

Example:

(define-node-extension cl-waffe::AddTensor
  :backend :test-backend
  :forward ((x y)
        (const (+ 1 1)))
  :backward ((dy)
         (list dy dy)))

(with-backend :mgl
   (print (!add 10 10))) ;=> Const(20)

(with-backend :test-backend
   (print (!add 10 10))) ;=> Const(2)

(with-backend :hogehoge
   (print (!add 10 10))) ; => Const(20)

(let ((*restart-non-exist-backend* nil))
    (with-backend :hogehoge
        (print (!add 10 10)))) ;=> Evaluation aborted on #<CL-WAFFE::BACKEND-DOESNT-EXISTS {100FA18C43}>.

with-backend(backend &body body)

Switches a backend.

See also: define-node-extension

*restart-non-exist-backend*
When t, in the case when the specified backend doesn't exist, cl-waffe calls a standard implementation backend