caten/gguf

GGUF

Looking to work with quantized models? The caten/gguf package enables you to read quantized model weights directly from GGUF files, construct a StateDict, and create tokenizers for caten/llm. It's a convenient tool for integrating quantized models into your projects.

Please note that the quantization functionality currently supports only a limited number of bit depths. We appreciate your understanding and are continuously working to expand this feature.

(As of this writing, fp32/fp16 only, Int8 Quant will be added soon!)

[class] GGUF

A class that represents the GGUF file format.

(gguf-version gguf) returns a fixnum indicating the version of the GGUF file.
(gguf-tensor-count gguf) returns the number of tensors in the GGUF file.
(gguf-metadata-kv-count gguf) returns the number of metadata key-value pairs in the GGUF file.
(gguf-metadata gguf) returns a list of metadata key-value pairs.
(gguf-tensor-info gguf) returns a list of tensor information.
(gguf-metadata-get gguf key) to get the corresponding value of the key where key is a string.

[function] make-gguf

Creates GGUF from the given stream.

The definition is described in the following link:

https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#file-structure

In short, this function accepts the following format:

[Magic Number (4 Byte)] | [GGUF Version (4 Byte)] | [Tensor_Count (8 Byte)] | [Metadata_KV_Count (8 Byte)] | [Rest_data]

[function] load-gguf

(load-gguf pathname)

Creates a gguf file from the given pathname.

[function] load-gguf-url

(load-gguf-url url filename &optional output-directory)

Creates a gguf file from the given URL. The downloaded file will be saved in the output-directory named as filename. If the file already exists, it will not download the file again.

[function] gguf->state-dict

(gguf->state-dict gguf)

Creates a caten/state-dict from the given gguf file's tensor-info.

[function] gguf->bpe-tokenizer

(gguf->bpe-tokenizer gguf &key (metadata-tokens "tokenizer.ggml.tokens") (metadata-merges "tokenizer.ggml.merges"))

Creates a BPE tokenizer (which is caten/llm:Tokenizer) from the given gguf file's metadata.

[struct] Metadata

A structure to represent metadata in GGUF file.

(metadata-key metadata) to access the key typed as string. (metadata-value-type metadata) to access the type of value typed as a keyword. (metadata-value metadata) to access the value typed as a number, boolean, or simple-array.

[struct] Tensor-Info

Tensor-Info stores the information of a tensor in the gguf file. (tensor-info-name tensor-info) returns the name of the tensor which is a string, (tensor-info-n-dimension tensor-info) returns the rank of the tensor. (tensor-info-dimensions tensor-info) returns the shape of the tensor. (tensor-info-ggml-type tensor-info) returns the data type of the tensor which is a keyword. (tensor-info-relative-offset tensor-info) returns the offset of the tensor in the gguf file. (tensor-info-absolute-offset tensor-info) returns the absolute offset of the tensor in the stream, (inconviniently buffers are stored with this offset but caten will precompute them). (tensor-info-buffer tensor-info) returns the parsed buffer of the tensor.