mradermacher/Reflector-Internalizing-Safety-Llama-3.1-8B-RL-GGUF Reinforcement Learning • 8B • Updated 19 days ago • 885 • 1
mradermacher/Reflector-Internalizing-Safety-Llama-3.1-8B-RL-i1-GGUF Reinforcement Learning • 8B • Updated 19 days ago • 2.46k • 1
Wenboz/SACD-Qwen2.5-3B-ALFWorld-k1-tau0.75-beta1.0-plain-pipeline Reinforcement Learning • 3B • Updated 1 day ago • 1