gemma-4-12B-it-GGUF

Quantized GGUF versions of google/gemma-4-12B-it, created with llama.cpp.

Usage

With llama.cpp

llama-cli -hf MoMonir/gemma-4-12B-it-GGUF

With llama-cpp-python

from llama_cpp import Llama
llm = Llama.from_pretrained(repo_id="MoMonir/gemma-4-12B-it-GGUF", filename="google_gemma-4-12B-it-Q8_0.gguf")
response = llm("Hello, world!", max_tokens=100)
print(response["choices"][0]["text"])

License

Please refer to the original model license.

Downloads last month
1,652
GGUF
Model size
12B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MoMonir/gemma-4-12B-it-GGUF

Quantized
(127)
this model