llama.cpp - daiizfeel 2022

llama.cpp
https://github.com/ggerganov/llama.cpp

Inference of LLaMA model in pure C/C++
	推論処理がC++で実装されている

量子化（Quantisation）
	強いGPUを積んでいないマシンでも動かせるようになる
	MacBookでの推論も可能になる
	適切なサイズのモデルであればRaspberry Pi 4Bでも動いた