What is GGML Tensor library?


GGML is a C library that enables you to perform fast and flexible tensor operations and machine learning tasks. Currently, the combination between GGML and llama.cpp is the best option for running LLaMa based model like Alpaca, Vicuna, or Wizard on your personal computer’s CPU.

You can use GGML converted weights (GGML or GGUF file format) and use llama.cpp to run the model with your CPU or GPU, or combination between the two.

GGML supports 16-bit float and integer quantization formats, which can reduce the memory footprint and computational cost of your models.

GGML also provides automatic differentiation and gradient-based optimization algorithms, such as ADAM and L-BFGS, to help you train your models efficiently.

GGML is optimized for Apple M1 and M2 processors and x86 architectures, using AVX / AVX2 to speed up the process.

GGML has no third-party dependencies and does not allocate any memory during runtime, making it easy to integrate and deploy.

It is currently under active development, and some of the features are being developed in the llama.cpp and whisper.cpp repos.

GGML file format

*The GGML file format is now superseded by the GGUF file format

The GGML file packages up all the model data – like vocab, architecture, weights – in a binary layout following a defined versioned structure. This allows trained models to be conveniently deployed.