Using Tensor
Basic Tensor Operations
There is a section here that explains the basics of tensors.
Building Computation Nodes
Generally, the structure WaffeTensor
is used in order to use waffe's APIs, building computation nodes.
WaffeTensor's slot can store the following data structures, being accessed by (data tensor).
- fixnum
- float
- boolean
- cons
- simple-array (Automatically converted to mgl-mat:mat)
- mgl-mat:mat
- ratio (Automatically coerced to single-float)
Internally, the matrix of WaffeTensor is a just mgl-mat, depending on it for the most part. (that is, what mgl-mat to cl-waffe is what Numpy to Chainer.)
So it is highly recommended to check out mgl-mat's official repository before using cl-waffe.
Construct Tensors
There's three ways to create tensor depending on its purpose.
Constants
Constant is used when no gradient is required, being created with a function (const value).
(setq a (const 1.0))
;#Const(1.0)
; Using cl-waffe's APIs.
(!add a (const 2.0))
;#Const(3.0)
; Initializes a tensor with sampiling beta distribution.
(!beta `(10 10) 5.0 1.0)
;#Const(((0.866... 0.801... ~ 0.836... 1.0)
; ...
; (0.826... 1.0 ~ 1.0 0.835...)) :mgl t :shape (10 10))
Parameter Tensors
Parameter tensors is used when gradient is required, being created with a function (tensor value) or macro (parameter const).
Created tensors will be required gradients, they will be created with a function (backward out), being accessed by (grad tensor).
In each training step, we have to reset their gradients. (zero-grad) which provided by deftrainer
will be useful.
(setq a (tensor 5.0))
(setq b (tensor 3.0))
(setq c (tensor 3.0))
(setq z (!add (!mul a b) c)) ; using cl-waffe's APIs will produce computation nodes.
;#Const(18.0)
(print (cl-waffe::waffetensor-state z)) ; They're stored in its state.
; [Node : ADDTENSOR]
(print (cl-waffe::waffetensor-variables z)) ; Also it contains infomations about nodes.
; (#Const(15.0) #Parameter{3.0 :device :MGL :backward NIL})
(backward z)
; NIL
(grad a)
; 3.0
(grad b)
; 5.0
(grad c)
; 1.0
Also, parameter tensors can be created like:
(setq a (parameter (!randn `(10 10))))
;#Parameter{((-1.27... 2.076... ~ 2.816... 1.285...)
; ...
; (0.837... -0.62... ~ 1.735... -0.08...)) :mgl t :shape (10 10) :device :MGL :backward NIL}
Let's check the example in the case of defining a Simple LinearLayer.
Optimizers defined by a macro defoptimizer
can track the model's parameters and update them depending on their style.
Optimizers will be accesed through deftrainer.
(defmodel LinearLayer (in-features out-features &optional (bias T))
:parameters ((weight
(parameter (!mul 0.01 (!randn `(,in-features ,out-features))))
:type waffetensor)
(bias (if bias
(parameter (!zeros `(1 ,out-features)))
nil)))
:forward ((x)
(cl-waffe.nn:linear x (self weight)(self bias))))
(deftrainer ExampleTrainer ()
:model (LinearLayer 10 3)
:optimizer cl-waffe.optimizers:Adam
:optimizer-args (:lr lr)
:step-model ((x y)
(zero-grad)
(let ((out (cl-waffe.nn:softmax-cross-entropy (call (self model) x) y)))
(backward out)
(update) ; calling trainer's optimizers.
out))
:predict ((x)(call (model) x)))
Sysconst
Sysconst is used to store temporary data during the calculation process.
In a macro defnode
, in each process returning a result, using sysconst is a little faster ways than creating constants with (const tensor)
(defnode ExampleAddNode nil
:forward ((x y)
(sysconst ; Tensors created with sysconst will be cached well.
(+ (data x)(data y))))
:backward ((dy)(list dy dy)))
Accessing Tensor
data
(tensor)
Access tensor's data. This won't be copied.
When tensor's data is lazy evaluted, this function behave following:
- When tensor is transposed and lazy evaluted, directly returns function object for speed.
- When tensor is cached and lazy evaluted, returns mat object.
- Input
- WaffeTensor
- Output
- mgl-mat:mat, or waffetensorcontentdata
when (data tensor) is a function and is:
- cached mat
- Return mgl-mat, this do not make copy
- lazy-evaluation or transposed
- Return function itself
Note: this function is setfable and inlined
value
(tensor &key (ignore-transpose nil))
Access tensor's data, but if tensor is lazy-evaluated, eval them.
Note: this is not setfable
detach
(tensor)
Create a Const with all information except data and backend erased.
This macro expanded to (const (data tensor))
.
Note: this macro doesn't clone data itself.
Example:
(setq a (parameter (!randn `(10 10))))
;#Parameter{((0.062... 0.716... ~ 0.088... 0.692...)
; ...
; (0.458... 0.194... ~ 0.902... 0.480...)) :mgl t :shape (10 10) :device :MGL :backward NIL}
(detach a)
;#Const(((0.062... 0.716... ~ 0.088... 0.692...)
; ...
; (0.458... 0.194... ~ 0.902... 0.480...)) :mgl t :shape (10 10))
backward and predicting mode
with-no-grad
(&body body)
This macro is used in order to implict that codes below is ignored: save-for-backward, creating new node object, using backward and processes for it.
For tasks in which grads are not required, using it helps better performance.
(with-no-grad
(call (model) x))
*no-grad*
backward
(tensor)
Compute back propagation by traversing the Tensor's computation node.
The parameters of the model defined by (tensor) or to which (Parameter tensor) is applied, store the gradient in grad slot.
Note that: tensor must be the shape of `(1) or single value. Otherwise an error occurs.
In the process calculating backward, new backwards won't be created. (*no-grad* automatically becomes t)
- Input
- WaffeTensor
- Output
- NIL
Calling Forward of cl-waffe's objects
call
(model &rest args)
Calls the forward steps which defined in: defnode, defmodel, defoptimizer.
All forward steps must be called through this function, otherwise the returned tensor doesn't have: computation nodes, thread-datum which supports performance.
Building computation nodes is ignored when *no-grad* is t.
- model
- Your initialized model/node/optimizer objects
- args
- Arguments :forward needs
Example:
(defnode Add nil
:optimize t
:parameters nil
:forward ((x y)
(sysconst (+ (data x)(data y))))
:backward ((dy)(list dy dy)))
(call (Add)(const 1.0)(const 1.0))
;=>Const(2.0)
Output: Waffetensor of list which comprised of waffetensor.
with-calling-layers
(input &rest layers)
This macro allows to sequentially call layers.
the argument input
must be a tensor.
Refering each layers from (self) macro, destructively modifying x with the returned value.
Note: This macro supposes models to be returned a single tensor, not a list.
(defmodel MLP (activation)
:parameters ((layer1 (denselayer (* 28 28) 512 T activation))
(layer2 (denselayer 512 256 T activation))
(layer3 (linearlayer 256 10 T)))
:forward ((x)
(with-calling-layers x
(layer1 x)
(layer2 x)
(layer3 x))))
For the different arguments.
(with-calling-layers x
(layer1 x 1 1)
(layer2 1 x 2)
(layer3 x y))
Output: An last value of layers.
Displaying Tensors
*default-backend*
Configs when printing tensor.
*print-char-max-len*
*print-arr-max-size*
*print-mat-max-size*
Types
waffetensorcontenttype
nil
An type of data that allowed to make tensors with (const ~) or (tensor ~).
cl-waffe automatically coerce them to arbitary types
`(or mgl-mat:mat simple-array waffesupporteddatatype)
waffesupporteddatatype
nil
An type of waffe-tensor's content type,
`(or fixnum float null cons function ratio)
Lazy evaluation
In default cl-waffe produces lazy-evaluated computation nodes.
A function (!transpose tensor) is a good example to demonstrate.
(setq a (!randn `(10 5)))
;#Const(((0.483... -0.52... -1.44... -0.06... 0.185...)
; ...
; (-0.85... 1.668... -0.27... 0.016... -0.45...)) :mgl t :shape (10 5))
(setq a (!transpose a))
;#Const(#<FUNCTION (LABELS CL-WAFFE.BACKENDS.MGL::LAZYTRANSPOSE :IN CL-WAFFE.BACKENDS.MGL::LAZY-EVAL-TRANSPOSE) {100C25EBEB}>)
(!shape a) ; Shapes can be accesed correctly (5 10)
; Transpose will be used with !matmul.
(!matmul a (!randn `(10 5)))
;#Const(((-5.39... 1.782... 2.277... -6.13... -6.14...)
; ...
; (-3.24... -1.60... 4.533... -2.23... 0.736...)) :mgl t :shape (5 5))
; After being called with (value tensor), lazy-evaluate is done and a is now:
;#Const(((0.483... -0.52... -1.44... -0.06... 0.185...)
; ...
; (-0.85... 1.668... -0.27... 0.016... -0.45...)) :mgl t :shape (10 5))
; Transpose won't destruct a.
; If you don't want to use lazy-evaluation, (!transpose1 tensor) is available. See Operators.
Lazy-Evaluation will be enabled when...
- a function (!transpose tensor)
- JIT is enabled
- Traicing is enabled
Broadcasting
cl-waffe supports broadcasting tensors like Numpy.
Broadcasting do:
- If the dimension of two tensors doesn't match in specified axis, repeats them if can (the number of axes on either axis is 1.). Otherwise errors.
- If the two tensor's dims doesn't match, add one to the head of lesser Tensor's dim.
Broadcasting is available to these operations.
- !add
- !sub
- !mul
- !div
- (setf !aref)
;(!randn `(10))'s dim became: `(10) -> `(1 10) -> repeat it with (:axis = 1, :repeat-num = 10)
(!add (!randn `(10 10))(!randn `(10)))
;#Const(((-0.77... -0.32... ~ 1.563... -2.87...)
; ...
; (0.077... 3.698... ~ 1.669... -1.51...)) :mgl t :shape (10 10))
;The first argument of !add will be repeated with (:axis=1, :repeat-num=3)
(!add (!randn `(10 1 10))(!randn `(10 3 10)))
;#Const((((3.238... 0.185... ~ 2.035... -1.33...)
; ...
; (0.302... -0.20... ~ 1.731... -0.58...))
; ...
; ((-0.93... 0.992... ~ -1.50... -2.81...)
; ...
; (1.669... 0.659... ~ 1.218... -0.88...))) :mgl t :shape (10 3 10))
JIT
Currently this feature is disabled.
cl-waffe dynamically defines kernel... (its performance problems remain to be solved.)
Tracing
Currently this feature is disabled. (cuz it's unstable)
cl-waffe can optimize define-by-run style codes by tracing their usage...
Compute tensors in a destructive way
In general, the cl-waffe APIs follow the following simple rules:
Side Effect Rules:
- Operators whose names begin with ! will copy a base tensor and produce a new matrix every time they are called.
- Operators whose names begin with !! will destructively assign a result to the first argument.
- When the destructive operator's first argument is not a mat, assign a result to the second argument. otherwise create a new mat.
Note: Destructive Operation only supports when the tensor is a type of mgl-mat:mat.
As a numerical library, creating a new tensors each calculation step is only a waste of memory-space. In term of speed/memory, it is recommended to use destructive operations.
The code written in non-destructive APIs can be rewritten with destructive APIs in a simple way as all you have to do is follow the rules below:
First, prepare the fomula where all operations are non-destructive. In the case of (!exp (!exp x)), this operation creates a new tensor whose shape is the same as x for twice times. To rewrite it without making a new side effect, the deeper !exp should be rewritten to (!!exp ). That is, (!!exp (!exp x)).
Let's take a another example of making BatchNorm2D faster.
This is a slower ver of softmax.
(defun !average (x)
(let ((z (!sum x 1))
(batch-size (!shape x 0)))
(!div z batch-size)))
(defun softmax (x)
(let* ((x1 (!sub x (!average x)))
(xe (!exp x1))
(z (!sum xe 1)))
(!div xe z)))
Benchmarking it with time macro, it is:
(setq a (!randn `(1000 1000)))
;#Const(((0.129... -0.92... ~ -1.01... -0.86...)
; ...
; (-0.48... -0.04... ~ 0.375... 1.610...)) :mgl t :shape (1000 1000))
(time (softmax a))
;Evaluation took:
; 0.022 seconds of real time
; 0.021648 seconds of total run time (0.020327 user, 0.001321 system)
; 100.00% CPU
; 52,580,716 processor cycles
; 1 page fault
; 20,031,088 bytes consed
;#Const(((6.757... 2.345... ~ 2.141... 2.490...)
; ...
; (3.850... 5.976... ~ 9.139... 0.003...)) :mgl t :shape (1000 1000))
Rewriting it with a destructive API.
In !average function , z and x aren't combined because z is a new tensor produced by !sum which is a non-destructive API.
So, !div should be destructive.
!average become:
(defun !average (x)
(let ((z (!sum x 1))
(batch-size (!shape x 0)))
(!!div z batch-size)))
Rewrite Softmax using similar steps.
(defun softmax (x)
(let* ((x1 (!!mul -1.0 (!!sub (!average x) x))) ; Reversing x and (!average) in !sub, the operator returns -1.0x result.
(xe (!!exp x1))
(z (!sum xe 1)))
(!!div xe z)))
So, the whole code is:
(defun !average (x)
(let ((z (!sum x 1))
(batch-size (!shape x 0)))
(!!div z batch-size)))
(defun softmax (x)
(let* ((x1 (!!mul -1.0 (!!sub (!average x) x))) ; Reversing x and (!average) in !sub, the operator returns -1.0x result.
(xe (!!exp x1))
(z (!sum xe 1)))
(!!div xe z)))
Benchmarking it, it is:
(time (softmax a))
;Evaluation took:
; 0.019 seconds of real time
; 0.019877 seconds of total run time (0.019687 user, 0.000190 system)
; 105.26% CPU
; 44,664,672 processor cycles
; 16,032,704 bytes consed
;#Const(((6.757... 2.345... ~ 2.141... 2.490...)
; ...
; (3.850... 5.976... ~ 9.139... 0.003...)) :mgl t :shape (1000 1000))
(print a) ; A is not destructed.
;#Const(((0.129... -0.92... ~ -1.01... -0.86...)
; ...
; (-0.48... -0.04... ~ 0.375... 1.610...)) :mgl t :shape (1000 1000))
Compared to pure mgl-mat's implementation.
(defun softmax! (x)
(let ((result (make-mat (!shape x)))
(tmp (make-mat `(1 ,@(cdr (!shape x)))))
(x (copy-mat (data x))))
(sum! x tmp :axis 1)
(scal! (/ 1.0 (mat-dimension x 1)) tmp)
(fill! 1.0 result)
(scale-rows! tmp result)
(axpy! -1.0 result x)
(.exp! x)
(sum! x tmp :axis 1)
(fill! 1.0 result)
(scale-rows! tmp result)
(.inv! result)
(const (.*! x result))))
(time (softmax! a))
;Evaluation took:
; 0.016 seconds of real time
; 0.017635 seconds of total run time (0.015160 user, 0.002475 system)
; 112.50% CPU
; 38,725,238 processor cycles
; 8,030,512 bytes consed
;#Const(((6.757... 2.345... ~ 2.141... 2.490...)
; ...
; (3.850... 5.976... ~ 9.139... 0.003...)) :mgl t :shape (1000 1000))
cl-waffe has a lot of challenges in terms of memory usage, but in terms of speed it comes close to writing with mgl-mat alone.
Currently(2023/2/26), the benchmark is (added type declations):
(defun !average (x)
(declare (optimize (speed 3))
(type waffetensor x))
(let ((z (!sum x 1))
(batch-size (!shape x 0)))
(!div z batch-size)))
(defun softmax (x)
(declare (optimize (speed 3))
(type waffetensor x))
(let* ((x1 (!sub x (!average x)))
(xe (!exp x1))
(z (!sum xe 1)))
(!div xe z)))
; destructive ver.
(defun !average1 (x)
(declare (optimize (speed 3))
(type waffetensor x))
(let ((z (!sum x 1))
(batch-size (!shape x 0)))
(!!div z batch-size)))
(defun softmax1 (x)
(declare (optimize (speed 3))
(type waffetensor x))
(let* ((x1 (!!mul -1.0 (!!sub (!average1 x) x)))
(xe (!!exp x1))
(z (!sum xe 1)))
(!!div xe z)))
; mgl-mat
(defun softmax2 (x)
(declare (optimize (speed 3))
(type waffetensor x))
(let ((result (make-mat (!shape x)))
(tmp (make-mat `(1 ,@(cdr (!shape x)))))
(x (copy-mat (data x))))
(sum! x tmp :axis 1)
(scal! (/ 1.0 (mat-dimension x 1)) tmp)
(fill! 1.0 result)
(scale-rows! tmp result)
(axpy! -1.0 result x)
(.exp! x)
(sum! x tmp :axis 1)
(fill! 1.0 result)
(scale-rows! tmp result)
(.inv! result)
(const (.*! x result))))
(defparameter n 1000)
(time (loop for i fixnum upfrom 0 below n
do (softmax a)))
;Evaluation took:
; 0.340 seconds of real time
; 0.326843 seconds of total run time (0.322840 user, 0.004003 system)
; [ Run times consist of 0.003 seconds GC time, and 0.324 seconds non-GC time. ]
; 96.18% CPU
; 784,081,358 processor cycles
; 233,002,704 bytes consed
(time (loop for i fixnum upfrom 0 below n
do (softmax1 a)))
;Evaluation took:
; 0.347 seconds of real time
; 0.335853 seconds of total run time (0.332290 user, 0.003563 system)
; [ Run times consist of 0.003 seconds GC time, and 0.333 seconds non-GC time. ]
; 96.83% CPU
; 801,195,214 processor cycles
; 187,704,864 bytes consed
(time (loop for i fixnum upfrom 0 below n
do (softmax2 a)))
;Evaluation took:
; 0.232 seconds of real time
; 0.219684 seconds of total run time (0.216419 user, 0.003265 system)
; [ Run times consist of 0.003 seconds GC time, and 0.217 seconds non-GC time. ]
; 94.83% CPU
; 535,165,520 processor cycles
; 92,326,496 bytes consed
Creating Destructive Operations
Using these macros below, you can inform cl-waffe's kernel of which tensors should be destructed.
!allow-destruct
(tensor)
Tensors which path through this macro are allowed to be destructed by cl-waffe's kernel.
In default, cl-waffe's operators won't make side effects.
(setq a (!randn `(3 3)))
;#Const(((0.811... -0.43... -0.91...)
; ...
; (0.959... -0.62... 1.150...)) :mgl t :shape (3 3))
(!exp a)
;#Const(((2.252... 0.645... 0.400...)
; ...
; (2.610... 0.534... 3.159...)) :mgl t :shape (3 3))
(print a)
;#Const(((0.811... -0.43... -0.91...)
; ...
; (0.959... -0.62... 1.150...)) :mgl t :shape (3 3))
However, This macro let kernel know that the given tensor is allowed to destruct(i.e.: the result is overwritten)
(setq a (!randn `(3 3)))
;#Const(((0.811... -0.43... -0.91...)
; ...
; (0.959... -0.62... 1.150...)) :mgl t :shape (3 3))
(!allow-destruct a)
; T
(!exp a)
;#Const(((2.252... 0.645... 0.400...)
; ...
; (2.610... 0.534... 3.159...)) :mgl t :shape (3 3))
(print a) ; You can see the result is overwritten.
;#Const(((2.252... 0.645... 0.400...)
; ...
; (2.610... 0.534... 3.159...)) :mgl t :shape (3 3))
Avoiding copy, destructive operations are superior in terms of memory usage.
(setq a (!randn `(100 100)))
(time (!exp a))
;Evaluation took:
; 0.000 seconds of real time
; 0.000275 seconds of total run time (0.000219 user, 0.000056 system)
; 100.00% CPU
; 498,150 processor cycles
; 31,264 bytes consed
(!allow-destruct a)
(time (!exp a))
; Evaluation took:
; 0.000 seconds of real time
; 0.000178 seconds of total run time (0.000160 user, 0.000018 system)
; 100.00% CPU
; 273,646 processor cycles
; 0 bytes consed
See also: !disallow-destruct which does the opposite.
!disallow-destruct
(tensor)
(setq a (!randn `(3 3)))
;#Const(((1.084... -1.10... 1.406...)
; ...
; (1.044... 0.059... -0.53...)) :mgl t :shape (3 3))
(!allow-destruct a)
; T
(!disallow-destruct a)
; NIL
(!exp a)
;#Const(((2.957... 0.329... 4.080...)
; ...
; (2.840... 1.060... 0.584...)) :mgl t :shape (3 3))
(print a) ; a is kept remained.
;#Const(((1.084... -1.10... 1.406...)
; ...
; (1.044... 0.059... -0.53...)) :mgl t :shape (3 3))
Logging
with-verbose
(&body body)
Backends
define-node-extension
(name &key optimize backend forward backward)
Adds a new backend to the defined node.
The type of backend is managed by keywords. The backend defined in defnode is always :mgl.
Defined backends can be switched by the macro (with-backend backend)
.
As long as *restart-non-exist-backend* is t, when a computation node reaches a backend that is not defined, :mgl is called, otherwise the condition backend-doesnt-exists will occurs.
Example:
(define-node-extension cl-waffe::AddTensor
:backend :test-backend
:forward ((x y)
(const (+ 1 1)))
:backward ((dy)
(list dy dy)))
(with-backend :mgl
(print (!add 10 10))) ;=> Const(20)
(with-backend :test-backend
(print (!add 10 10))) ;=> Const(2)
(with-backend :hogehoge
(print (!add 10 10))) ; => Const(20)
(let ((*restart-non-exist-backend* nil))
(with-backend :hogehoge
(print (!add 10 10)))) ;=> Evaluation aborted on #<CL-WAFFE::BACKEND-DOESNT-EXISTS {100FA18C43}>.
with-backend
(backend &body body)
Switches a backend.
See also: define-node-extension
*restart-non-exist-backend*